ABSTRACT :
Email Spam has become a major problem nowadays, with Rapid growth of internet users, Email spams is also increasing. People are using them for illegal and unethical conducts, phishing and fraud. Sending malicious link through spam emails which can harm our system and can also seek in into your system. Creating a fake profile and email account is much easy for the spammers, they pretend like a genuine person in their spam emails, these spammers target those peoples who are not aware about these frauds. So, it is needed to Identify those spam mails which are fraud, this project will identify those spam by using techniques of machine learning, this paper will discuss the machine learning algorithms and apply all these algorithm on our data sets and best algorithm is selected for the email spam detection having best precision and accuracy .
EXISTING SYSTEM :
When the data is considered, always a very large data sets with large no. of rows and columns will be noted. But it is not always the case the data could be in many forms such as Images, Audio and Video files Structured tables etc.
DISADVANTAGES OF EXISTING SYSTEM :
1) Less accuracy
2)low Efficiency
PROPOSED SYSTEM :
Visual studio code platform is used to implement the model and, in this module, a dataset from “Kaggle” website is used as a training dataset. The inserted dataset is first checked for duplicates and null values for better performance of the machine. Then, the dataset is split into 2 sub-datasets; say “train dataset” and “test dataset” in the proportion of 70:30. Then the “train” and “test” dataset is then passed as parameters for text-processing. In text-processing, punctuation symbols and words that are in the stop words list are removed and returned as clean words. These clean words are then passed for “Feature Transform”. In feature transform, the clean words which are returned from the text-processing are then used for ‘fit’ and ‘transform’ to create a vocabulary for the machine. The dataset is also passed for “hyperparameter tuning” to find optimal values for the classifier to use according to the dataset.
ADVANTAGES OF PROPOSED SYSTEM :
1) High accuracy
2)High efficiency
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10
HARDWARE REQUIREMENTS:
Processor : Core I3
RAM Capacity : 2 GB
Hard Disk : 250 GB
Monitor : 15″ Color
Mouse : 2 or 3 Button Mouse
Key Board : Windows 08/10