Javascript must be enabled to continue!
Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection
View through CrossRef
Phishing emails pose a significant threat to cybersecurity worldwide. There are already tools that mitigate the impact of these emails by filtering them, but these tools are only as reliable as their ability to detect new formats and techniques for creating phishing emails. In this paper we investigated how traditional models and transformer models work on the classification task of identifying if an email is phishing or not. We realized that transformer models, in particular DistilBERT, BERT, and RoBERTa had a significantly higher performance compared to traditional models like Logistic Regression, Random Forest, Support Vector Machine, and Naive Bayes.
The process consisted in using a large and robust dataset of emails and applying preprocessing and optimization techniques to maximize the best result possible. roBERTa showed its outstanding capacity to identify phishing emails by achieving the maximum accuracy of 0.9943. Even though they were free successful, traditional models performed marginally worse; SVM performed the best, with an accuracy of 0.9854. The results emphasize the value of sophisticated text processing methods and the possibility of transformer models to improve email security by thwarting phishing attempts.
Title: Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection
Description:
Phishing emails pose a significant threat to cybersecurity worldwide.
There are already tools that mitigate the impact of these emails by filtering them, but these tools are only as reliable as their ability to detect new formats and techniques for creating phishing emails.
In this paper we investigated how traditional models and transformer models work on the classification task of identifying if an email is phishing or not.
We realized that transformer models, in particular DistilBERT, BERT, and RoBERTa had a significantly higher performance compared to traditional models like Logistic Regression, Random Forest, Support Vector Machine, and Naive Bayes.
The process consisted in using a large and robust dataset of emails and applying preprocessing and optimization techniques to maximize the best result possible.
roBERTa showed its outstanding capacity to identify phishing emails by achieving the maximum accuracy of 0.
9943.
Even though they were free successful, traditional models performed marginally worse; SVM performed the best, with an accuracy of 0.
9854.
The results emphasize the value of sophisticated text processing methods and the possibility of transformer models to improve email security by thwarting phishing attempts.
Related Results
Phishing Cyber Security Threats
Phishing Cyber Security Threats
Phishing is a growing threat in the realm of cybersecurity, where cybercriminals use various phishing techniques to steal sensitive information from individuals and organizations. ...
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
AI-Based Phishing Attack Detection And Prevention Using Natural Language Processing (NLP)
AI-Based Phishing Attack Detection And Prevention Using Natural Language Processing (NLP)
Phishing attacks remain one of the most prevalent and damaging cybersecurity threats, targeting users across various communication channels such as email, social media, and SMS. Tr...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Deep Learning Based Phishing Websites Detection
Deep Learning Based Phishing Websites Detection
Phishing is a crime that involves the theft of confidential user information. Those targeted by phishing websites include individuals, small businesses, cloud storage providers, an...
Identification of Phishing Urls Using Machine Learning
Identification of Phishing Urls Using Machine Learning
Abstract
Phishing is a typical assault on unsuspecting individuals by making them to reveal their one-of-a-kind data utilizing fake sites. The target of phishing sit...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
The need for education on phishing: a survey comparison of the UK and Qatar
The need for education on phishing: a survey comparison of the UK and Qatar
PurposeThis paper seeks to focus on identifying the need for education to enhance awareness of the e‐mail phishing threat as the most effective way to reduce the risk of e‐mail phi...

