ABSTRACT :
The phishing email is one of the significant threats in the world today and has caused tremendous financial losses. Although the methods of confrontation are continually being updated, the results of those methods are not very satisfactory at present. Moreover, phishing emails are growing at an alarming rate in recent years. Therefore, more effective phishing detection technology is needed to curb the threat of phishing emails. In this paper, we first analysed the email structure. Then, based on an improved recurrent convolutional neural networks (RCNN) model with multilevel vectors and attention mechanism, we proposed a new phishing email detection model named THEMIS, which is used to model emails at the email header, the email body, the character level, and the word level simultaneously. To evaluate the effectiveness of THEMIS, we use an unbalanced dataset that has realistic ratios of phishing and legitimate emails. The experimental results show that the overall accuracy of THEMIS reaches 99.848%. Meanwhile, the false positive rate (FPR) is 0.043%. High accuracy and low FPR ensure that the filter can identify phishing emails with high probability and filter out legitimate emails as little as possible. This promising result is superior to the existing detection methods and verifies the effectiveness of THEMIS in detecting phishing emails.
EXISTING SYSTEM :
With the emergence of email, the convenience of communication has led to the problem of massive spam, especially phishing attacks through email. Various anti-phishing technologies have been proposed to solve the problem of phishing attacks. Sheng et al. [10] studied the effectiveness of phishing blacklists. Blacklists mainly include sender blacklists and link blacklists. This detection method extracts the sender’s address and link address in the message and checks whether it is in the blacklist to distinguish whether the email is a phishing email. The update of a blacklist is usually reported by users, and whether it is a phishing website or not is manually identified. At present, the two well-known phishing websites are Phish Tank and Open Phish. To some extent, the perfection of the blacklist determines the effectiveness of this method based on the blacklist mechanism for phishing email detection.
DISADVANTAGES OF EXISTING SYSTEM :
1) Less accuracy
2)low Efficiency
PROPOSED SYSTEM :
Emails are divided into two categories, legitimate emails and phishing emails. Naturally, the detection for phishing emails is also a binary classification problem. We mathematize the problem and split an email into two parts, the header and the body. We define a binary variable y to represent the attributes of an email; that is, y = 1 means that the email is a phishing email and y = 0 means that the email is legitimate. In other words, y is the label of an email. We follow the following steps to determine whether the email is a phishing email. To begin this process, we calculate the probability that the email is a phishing email, that is, P(y = 1). Then, the probability value is compared with the classification threshold, and if it is greater than the classification threshold, it is judged as a phishing email. Our goal is to detect whether the target email is legitimate or phishing quickly and accurately. In this section, we will present the details of our proposed model.
ADVANTAGES OF PROPOSED SYSTEM :
1) High accuracy
2)High efficiency
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
Processor : Core I3
RAM Capacity : 2 GB
Hard Disk : 250 GB
Monitor : 15″ Color
Mouse : 2 or 3 Button Mouse
Key Board : Windows 08/10