Predicting Liver Disease with Machine Learning

Liver disease includes a wide range of conditions that can harm the liver, from mild swelling to severe conditions like cirrhosis and liver cancer. The liver is a vital organ that removes harmful substances, makes bile, stores essential nutrients, and keeps the metabolism in check. According to World Health Organization, 2021, an early detection of liver disease is crucial, as it can halt its progression, enhance treatment effectiveness, reduce complications, and improve patients’ overall well-being. This article delves into the development of a machine-learning model that can predict liver disease, discussing its execution, effectiveness, and potential impact, which could revolutionize the field of liver disease diagnosis.

Table of Content

Importance of the Liver and Early Detection
Developing the Machine Learning Model
Data Preprocessing
Model Selection
Model Training and Evaluation
Feature Importance
Conclusion
Full Notebook
Interactive Application
References

Importance of the Liver and Early Detection

The liver is a vital organ that does many things, such as cleaning the blood, keeping the metabolism in check, storing nutrients, making proteins, and making bile. The American Liver Foundation says that any problems with these functions can negatively affect health. Finding liver disease early is significant because it lets people take action quickly, which can slow the disease’s progression, make treatment more effective, and stop severe problems like liver failure and liver cancer.

Developing the Machine Learning Model

To predict liver disease, we employed a dataset encompassing a range of attributes, including alcohol consumption, liver function test outcomes, BMI, physical activity, genetic predisposition, and additional factors. Using these characteristics, the objective was to create a model that categorizes individuals as having or not having liver disease.

Data Preprocessing

Data Collection: The dataset was sourced from reputable platforms such as Kaggle Repository located here.
Data Cleaning: Missing values were already handled by the source dataset, and irrelevant features were removed to ensure a clean dataset.
Feature Scaling: Continuous variables were scaled to ensure that the model could process them effectively.

Model Selection

We conducted experiments using various machine learning algorithms, such as Logistic Regression, K-Nearest Neighbors, Gaussian Naive Bayes, Support Vector Classifier, Random Forest Classifier, Decision Tree Classifier, and XGBoost Classifier. The models were assessed using cross-validation to ensure their resilience.

Model Training and Evaluation

The RandomForestClassifier model stood out as the best due to its high accuracy, precision, recall, F1-score, and AUC-ROC values. Here is a summary of its performance:

Accuracy: 90.71% (± 2.24%)
Precision: 91.94% (± 2.46%)
Recall: 90.28% (± 2.48%)
F1-Score: 91.00% (± 1.86%)
AUC-ROC: 94.68% (± 1.44%)

Feature Importance

Gaining insight into the primary features that significantly influence the model’s predictions is essential for comprehensibility and the possibility of extracting medical knowledge. The RandomForestClassifier offers feature importances, which indicate the relative significance of each feature. The primary factors influencing the prediction of liver disease were:

Alcohol Consumption: 25.76%
Liver Function Test: 24.46%
BMI: 11.90%
Physical Activity: 11.34%
Age Range: 6.78%

These results show that lifestyle choices like how much alcohol you drink and how active you are, along with medical tests like liver function tests, are essential ways to find out if someone is at high risk for liver disease.

Conclusion

Machine learning’s promise in healthcare has been demonstrated by creating a model to anticipate the onset of liver disease. By utilizing large datasets and sophisticated algorithms, we can achieve high accuracy in disease prediction to aid in early diagnosis and treatment. Further work could include validating the model’s performance in real-world clinical settings, improving it, and integrating more diverse datasets.

Full Notebook

The notebook implementation that shows the complete workflow can be found on GitHub

Interactive Application

The iterative application is hosted in streamlit cloud and can be found by clicking here

References

Your Liver. (n.d.). American Liver Foundation. https://liverfoundation.org/for-patients/about-the-liver/

‌Détail. (2024, April 9). https://www.who.int/news-room/fact-sheets/detail/hepatitis

Exploring data or have a specific use case? Let’s talk, and I will help turn your insights into actionable solutions.

Service

Resources

Newsletter