Abstract
Objectives We validate a machine learning-based sepsis prediction algorithm (InSight) for detection and prediction of three sepsis-related gold standards, using only six vital signs. We evaluate robustness to missing data, customization to site-specific data using transfer learning, and generalizability to new settings.
Design A machine learning algorithm with gradient tree boosting. Features for prediction were created from combinations of only six vital sign measurements and their changes over time.
Setting A mixed-ward retrospective data set from the University of California, San Francisco (UCSF) Medical Center (San Francisco, CA) as the primary source, an intensive care unit data set from the Beth Israel Deaconess Medical Center (Boston, MA) as a transfer learning source, and four additional institutions’ datasets to evaluate generalizability.
Participants 684,443 total encounters, with 90,353 encounters from June 2011 to March 2016 at UCSF.
Interventions none
Primary and secondary outcome measures Area under the receiver operating characteristic curve (AUROC) for detection and prediction of sepsis, severe sepsis, and septic shock.
Results For detection of sepsis and severe sepsis, InSight achieves an area under the receiver operating characteristic (AUROC) curve of 0.92 (95% CI 0.90 - 0.93) and 0.87 (95% CI 0.86 - 0.88), respectively. Four hours before onset, InSight predicts septic shock with an AUROC of 0.96 (95% CI 0.94 -0.98), and severe sepsis with an AUROC of 0.85 (95% CI 0.79 - 0.91).
Conclusions InSight outperforms existing sepsis scoring systems in identifying and predicting sepsis, severe sepsis, and septic shock. This is the first sepsis screening system to exceed an AUROC of 0.90 using only vital sign inputs. InSight is robust to missing data, can be customized to novel hospital data using a small fraction of site data, and retained strong discrimination across all institutions.
Machine learning is applied to the detection and prediction of three separate sepsis standards in the emergency department, general ward and intensive care settings.
Only six commonly measured vital signs are used as input for the algorithm.
The algorithm is robust to randomly missing data.
Transfer learning successfully leverages large dataset information to a target dataset.
Retrospective nature of the study does not predict clinician reaction to information.