Optimizing Liver disease prediction with Random Forest by various Data balancing Techniques

ABSTRACT :

liver disease is a prominent disease other than heart attack, which is taking a lot of lives. Since most of the time liver disease is detected at a later stage leading to death. Number of liver patients is increasing because of several reasons like over consumption of alcohol, breathing in injurious gas, consuming polluted water and so on which can affect health parameters. Using a machine learning prediction models, liver diseases can be predicted using those health parameters in early stages. In this work to build the machine-learning model, Indian Liver Patient Dataset (ILPD) hosted at UCI.edu [1] is used, which is based on Indian patient and Random Forest (RF) algorithm is used to predict the disease with different pre-processing techniques. Data set is checked for skewness, outliers and imbalance using univariate and bivariate analysis and then suitable algorithms used to remove outliers and various oversampling and under sampling techniques are used to balance the data. Further refinement of model is done through hyper parameter tuning using grid search and feature selection. The final model provides 100% accuracy and also good score across different metrics.

EXISTING SYSTEM :

The liver is an important organ of the human body and it is located beneath the rib cage in the right upper abdomen. It removes toxins from the body and maintains healthy blood sugar level in the body. Though body organs have self healing capacity, over consumption of alcohol and exposure to impure air and water affects the liver which leads to higher rate of Liver failure. Liver transplantation is the solution but with higher cost and lower rate of success. Identifying the liver damage at the earliest can reduce the chance of liver failure. The machine-learning model is capable of predicting diseases, based on a data set, which is built in combination of key health parameters of a person with diseases and without diseases. For building models, an effective data set is needed, with proper representation of disease classifications.

PROPOSED SYSTEM :

Paper explains how to improve the classification accuracy of the model, if data set is imbalanced. The proposed method combines Complementary Neural Network (CMTNN) and Synthetic Minority Over-sampling Technique (SMOTE) to handle the problem of classifying imbalanced data. Technique (SMOTE) to handle the problem of classifying imbalanced data. In paper [9], Rus Boost and SMOOT Boost are used for sampling and boosting the data. RUS Boost and SMOTE Boost both outperform the other procedures.

SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10

HARDWARE REQUIREMENTS:

 Processor : Core I3
 RAM Capacity : 2 GB
 Hard Disk : 250 GB
 Monitor : 15″ Color
 Mouse : 2 or 3 Button Mouse
 Key Board : Windows 08/10

For More Details of Project Document, PPT, Screenshots and Full Code
Call/WhatsApp – 9966645624
Email – info@srithub.com