Early Sepsis prediction with machine learning model

Introduction

The Early sepsis predictor project is one of my Master’s degree thesis group project. The objective of this project is to build a predictive model that will predict sepsis 6 hours before its onset in order for death-related cases to be significantly reduced. The outcome of the model could help physicians to prepare a proper treatment for ICU patients on time and reduce the mortality rate.

This project consists of 2 main parts. The first part is Model training that include Data wrangling, Data exploration, Data Cleaning, Missing value imputation, Feature Engineering, Model training with 4 algorithm (Logistic Regression, Decision Tree, Random Forest, XGBoost), and model evaluation.

Model training

Dataset

The Dataset we used is from the physionet challenge in 2019 This dataset consist of over 100+ features but what we focused are time sensitive features so we used vital signs, laboratory, and demographics values of 40 time-dependent variables with over 40,000 patients. After we labeled sepsis for each patients in data. The first problem we found is the classes label are imbalanced, only about 2% of the patterns result in sepsis.

Missing value imputation

For this project, the last observation carried forward (LOCF) and the next observation carried backward (NOCB) mechanisms are used to fill in missing values. LOCF and NOCB are two of the most popular methods in clinical trials, because they induce less bias to datasets compared to mean or median imputation.

Models training

Four algorithms are used to train different models, including Logistic Regression, Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost). Each model is trained with all feaures and with feature selection. The model with the best performance on the test dataset is used as the model for implementation for a web application.

Models evaluation

All models were trained using a 10-fold cross-validation method that validated the results achieved by the models. XGBoost with feature selection trained at 98:02 class distribution outperformed other models with an F1 score of 0.81, Kappa of 0.69, AUROC of 0.94, and no false positives on the test dataset.

All the detailed explanation and source code are in this GitHub repository.

Web dashboard application

The last part is building a web dashboard application that implemented the finalized model to an actual concrete application and deploy to production server.

At first, I was using Python Flask as a web development tool to develop this project but later I’ve found DASH by plotly which is quite similar to Flask because It’s 100% python. So the advantage of DASH by plotly is, It actually a web framework that built on top of python flask, Basically DASH by plotly environment is quite similar to Flask but It particularly designed for the web dashboard development. It has built in html and chart components ready to use, That helped me a lot to develop this project in a short period of time.

All Web app source code are in this Github repository

App Demo : Early sepsis prediction with machine learning model

Github : Model training

Github : Web dashboard application