This work is part of the paper titled - “Minimal Vital Sensor Architectures for Early Warning of Sepsis in Telehealth Patients” (under review)

Vital-SEP

A Sepsis Prediction Engine that employs Gradient Boosted Decision Tree (XGBoost) on features extracted from vitals obtained from wearable sensors

Programming Platforms Used

Python 3
MatplotLib
NumPy
Pandas

Input Dataset

Sequence of hourly measurements of the following vital signs:

heart rate
respiratory rate
SpO2 (blood oxygen)
temperature

These measurements obtained from patients of two different hospitals are contained in the following zip files. Each zip file when extracted generates the individual patient data files.

The raw files refer to Physionet CinC 2019 database, which are then preprocessed (as per inclusion exclusion criteria etc.) to generate the curated datasets used for this study.

The input should be formatted so that the measurements span a minimum of 3 hours and a maximum of 6 hours.

Input data files are zipped and can be accessed from the repository: Raw Dataset

Curated dataset for this study

Usage

The Algorithm is implemented as a set of following three python modules:

1: Building and training the XGBoost models using Hospital A datasets.

Module: Tele-SEP-train-model.py

Parameters: Each of the 15 sensor configurations (S_i) Each of the 16 timing tuples (W,L)

Output: AUROC for each (S_i,W,L)

For each sensor configuration the highest AUC yielding model is chosen to be validated in the next function

2: Validating the model using Hospital B datasets.

Module: Tele-SEP-ModelLoadRunOnly.py

Parameters: each of the 15 sensor configurations (S_i) Best performing timing tuple (W_AUC,L_AUC) corresponding to S_i.

import pickle

model_filename = 'trained-models/XGBoost/XGB-Model-PPG-RR-Temp-L6-M4-verified.sav'

# load the model from disk
loaded_model = pickle.load(open(model_filename, 'rb'))

# make predictions for test data
y_pred = loaded_model.predict(X_test)

# print classification report 
print(classification_report(y_test, y_pred)) 

#confusion matrix
cnf_matrix = confusion_matrix(y_test, y_pred)
print(cnf_matrix)

Output: AUC and its difference from that obtained in function 1 (for each sensor configuration)

3: Choosing the best performing minimal sensor configuration

Module automatation being implemented

Parameters: AUROC threshold value AUC_min Lead time threshold value L_min

Output: From the list of Sensor configurations arranged in ascending order based on number and complexity of vitals, choose the first configuration S_min for which AUROC obtained in module 1 and corresponding lead time are greater than or equal to their respective threshold values AUC_min and L_min.

Setup and runtime

Modules 1,2 and 3 are run once at the setup time and a subset of the best performing pre-trained and validated models corresponding to various sensor configurations are also provided in the repository. During runtime, the following algorithm is used to predict sepsis for a new patient.

Parameters: Patient’s wearable sensor configuration S_p Patient_vitals = new patient data Lead time = 3,4,5,6 hours

Subroutines: Choose the Tele-SEP model that satisfies the patient’s wearable sensor configuration S_p. For the sensor configuration S_p, retrieve four sets of models M_p3, M_p4, M_p5, M_p6 corresponding to the four lead times. From each set choose the best performing model M_p3^*, M_p4^*, M_p5^*, M_p6^*. Run these on Patient_vitals to compute the sepsis probabilities.

Output: The maximum of the four sepsis probabilities and the corresponding lead time resulting from the above computation