Liver disease includes a wide range of conditions that can harm the liver, from mild swelling to severe conditions like cirrhosis and liver cancer. The liver is a vital organ that removes harmful substances, makes bile, stores essential nutrients, and keeps the metabolism in check. According to World Health Organization, 2021, an early detection of liver disease is crucial, as it can halt its progression, enhance treatment effectiveness, reduce complications, and improve patients’ overall well-being. This article delves into the development of a machine-learning model that can predict liver disease, discussing its execution, effectiveness, and potential impact, which could revolutionize the field of liver disease diagnosis.
Table of Content
- Importance of the Liver and Early Detection
- Developing the Machine Learning Model
- Data Preprocessing
- Model Selection
- Model Training and Evaluation
- Feature Importance
- Conclusion
- Full Notebook
- Interactive Application
- References
Importance of the Liver and Early Detection
The liver is a vital organ that does many things, such as cleaning the blood, keeping the metabolism in check, storing nutrients, making proteins, and making bile. The American Liver Foundation says that any problems with these functions can negatively affect health. Finding liver disease early is significant because it lets people take action quickly, which can slow the disease’s progression, make treatment more effective, and stop severe problems like liver failure and liver cancer.
Developing the Machine Learning Model
To predict liver disease, we employed a dataset encompassing a range of attributes, including alcohol consumption, liver function test outcomes, BMI, physical activity, genetic predisposition, and additional factors. Using these characteristics, the objective was to create a model that categorizes individuals as having or not having liver disease.
Data Preprocessing
- Data Collection: The dataset was sourced from reputable platforms such as Kaggle Repository located here.
- Data Cleaning: Missing values were already handled by the source dataset, and irrelevant features were removed to ensure a clean dataset.
- Feature Scaling: Continuous variables were scaled to ensure that the model could process them effectively.
Model Selection
We conducted experiments using various machine learning algorithms, such as Logistic Regression, K-Nearest Neighbors, Gaussian Naive Bayes, Support Vector Classifier, Random Forest Classifier, Decision Tree Classifier, and XGBoost Classifier. The models were assessed using cross-validation to ensure their resilience.
Model Training and Evaluation
The RandomForestClassifier model stood out as the best due to its high accuracy, precision, recall, F1-score, and AUC-ROC values. Here is a summary of its performance:
- Accuracy: 90.71% (± 2.24%)
- Precision: 91.94% (± 2.46%)
- Recall: 90.28% (± 2.48%)
- F1-Score: 91.00% (± 1.86%)
- AUC-ROC: 94.68% (± 1.44%)
Feature Importance
Gaining insight into the primary features that significantly influence the model’s predictions is essential for comprehensibility and the possibility of extracting medical knowledge. The RandomForestClassifier offers feature importances, which indicate the relative significance of each feature. The primary factors influencing the prediction of liver disease were:
- Alcohol Consumption: 25.76%
- Liver Function Test: 24.46%
- BMI: 11.90%
- Physical Activity: 11.34%
- Age Range: 6.78%
These results show that lifestyle choices like how much alcohol you drink and how active you are, along with medical tests like liver function tests, are essential ways to find out if someone is at high risk for liver disease.
Conclusion
Machine learning’s promise in healthcare has been demonstrated by creating a model to anticipate the onset of liver disease. By utilizing large datasets and sophisticated algorithms, we can achieve high accuracy in disease prediction to aid in early diagnosis and treatment. Further work could include validating the model’s performance in real-world clinical settings, improving it, and integrating more diverse datasets.
Full Notebook
The notebook implementation that shows the complete workflow can be found on GitHub
Interactive Application
The iterative application is hosted in streamlit cloud and can be found by clicking here
References
Your Liver. (n.d.). American Liver Foundation. https://liverfoundation.org/for-patients/about-the-liver/
Détail. (2024, April 9). https://www.who.int/news-room/fact-sheets/detail/hepatitis