Javascript must be enabled to continue!

Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection

Phishing emails pose a significant threat to cybersecurity worldwide. There are already tools that mitigate the impact of these emails by filtering them, but these tools are only as reliable as their ability to detect new formats and techniques for creating phishing emails. In this paper we investigated how traditional models and transformer models work on the classification task of identifying if an email is phishing or not. We realized that transformer models, in particular DistilBERT, BERT, and RoBERTa had a significantly higher performance compared to traditional models like Logistic Regression, Random Forest, Support Vector Machine, and Naive Bayes. The process consisted in using a large and robust dataset of emails and applying preprocessing and optimization techniques to maximize the best result possible. roBERTa showed its outstanding capacity to identify phishing emails by achieving the maximum accuracy of 0.9943. Even though they were free successful, traditional models performed marginally worse; SVM performed the best, with an accuracy of 0.9854. The results emphasize the value of sophisticated text processing methods and the possibility of transformer models to improve email security by thwarting phishing attempts.

MDPI AG

René Meléndez Michal Ptaszynski Masui Fumito

2024

Title: Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection

Description:

Phishing emails pose a significant threat to cybersecurity worldwide.

There are already tools that mitigate the impact of these emails by filtering them, but these tools are only as reliable as their ability to detect new formats and techniques for creating phishing emails.

In this paper we investigated how traditional models and transformer models work on the classification task of identifying if an email is phishing or not.

We realized that transformer models, in particular DistilBERT, BERT, and RoBERTa had a significantly higher performance compared to traditional models like Logistic Regression, Random Forest, Support Vector Machine, and Naive Bayes.

The process consisted in using a large and robust dataset of emails and applying preprocessing and optimization techniques to maximize the best result possible.

roBERTa showed its outstanding capacity to identify phishing emails by achieving the maximum accuracy of 0.

9943.

Even though they were free successful, traditional models performed marginally worse; SVM performed the best, with an accuracy of 0.

9854.

The results emphasize the value of sophisticated text processing methods and the possibility of transformer models to improve email security by thwarting phishing attempts.

Back

Related Results

Phishing Cyber Security Threats

Phishing is a growing threat in the realm of cybersecurity, where cybercriminals use various phishing techniques to steal sensitive information from individuals and organizations. ...

Primerjalna književnost na prelomu tisočletja

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

AI-Based Phishing Attack Detection And Prevention Using Natural Language Processing (NLP)

Phishing attacks remain one of the most prevalent and damaging cybersecurity threats, targeting users across various communication channels such as email, social media, and SMS. Tr...

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

Deep Learning Based Phishing Websites Detection

Phishing is a crime that involves the theft of confidential user information. Those targeted by phishing websites include individuals, small businesses, cloud storage providers, an...

Identification of Phishing Urls Using Machine Learning

Abstract Phishing is a typical assault on unsuspecting individuals by making them to reveal their one-of-a-kind data utilizing fake sites. The target of phishing sit...

High frequency modeling of power transformers under transients

This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...

The need for education on phishing: a survey comparison of the UK and Qatar

PurposeThis paper seeks to focus on identifying the need for education to enhance awareness of the e‐mail phishing threat as the most effective way to reduce the risk of e‐mail phi...

Email:
Password:

Email: