BERT-based models for phishing detection
Author | Affiliation | |
---|---|---|
Centre for Applied Research and Development (CARD) | ||
Centre for Applied Research and Development (CARD) | ||
Centre for Applied Research and Development (CARD) | ||
Centre for Applied Research and Development (CARD) |
Date | Volume | Start Page | End Page |
---|---|---|---|
2023 | 3575 | 34 | 44 |
In this paper we report the application of BERT-based models for phishing detection in emails. We fine-tuned 3 BERT-based models (DistilBERT, TinyBERT and RoBERTa) for the task.All the fine-tuned models attained scores above 0.985 for each metric (accuracy, precision,recall and F1-score). Nevertheless, the RoBERTa model demonstrated the highest classification scores across all metrics, indicating that it can classify the selected phishing data with the utmost accuracy. The models from each BERT architecture have then been assessed more deeply via using them in pseudo-real-life situation. For this purpose, we created an entirely new dataset from the actual phishing emails and used text augmentation techniques to increase their quantity. DistilBERT and RoBERTa models produced very similar outcomes, i.e., most of the emails were classified correctly. However, as DistilBERT uses fewer resources and performs better than the RoBERTa model, it has been regarded as the best model for detecting phishing emails in our case. The TinyBERT variant had the worst results as its size was insufficient for learning to categorize emails and detect phishing.
Journal | Cite Score | SNIP | SJR | Year | Quartile |
---|---|---|---|---|---|
CEUR Workshop Proceedings | 1.1 | 0.235 | 0.191 | 2023 | Q4 |