Feature Selection and Machine Learning for Predicting Sepsis Mortality Using MIMIC and eICU Data

Sunday, March 22, 2026

2:30 PM - 3:30 PM Central Time

Location: Connections Central - RST 14

EY

Eunie Yook, D.O.

Cleveland Clinic

Disclosure(s): No relevant financial relationship(s) to disclose.

Introduction: Early detection and intervention are critical for improving outcomes of septic patients in the intensive care unit (ICU); however, established severity scoring systems often fail to facilitate real-time applications. Rapidly advancing machine learning (ML) models using large, de-identified public datasets have enhanced the development of predictive models, improving accuracy and interpretability. This focused literature review evaluates feature selection methods, ML model performance, and interpretability techniques for predicting sepsis-related mortality using three large public ICU databases: MIMIC-III, MIMIC-IV, and eICU-CRD.

Methods: A literature search through the PubMed and Embase databases identified fifteen cohort studies that met the inclusion criteria, requiring the use of ML models with feature selection and interpretability to predict sepsis-related mortality using any of three large ICU datasets. Artificial intelligence (ChatGPT, Grammarly) was used for grammar and clarity; all content was reviewed and verified by the author.

Results: Ensemble learning models, such as eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), generally outperform classic models, including Support Vector Machines (SVM) and Logistic Regression (LR), with reported AUC values ranging from 0.79 to 0.99. Feature selection methods (Recursive Feature Elimination (RFE), the Least Absolute Shrinkage and Selection Operator (LASSO), and entropy filtering) identified key predictors such as age, Glasgow Coma Scale (GCS) score, urine output, blood urea nitrogen (BUN), lactate, and ventilation status. Thirteen of fifteen studies incorporated Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), improving clinical relevance and enhancing clinician trust.

Conclusions: ML models built using large ICU data with interpretability techniques have the potential to improve sepsis mortality. The retrospective nature of studies and reliance on single-center datasets limit their generalizability. The focus of future studies should be on external validation, prospective cohort evaluation, and integration into real-time clinical operations to optimize early intervention strategies for treating septic patients.