Machine Learning Prediction of Post-Extubation Failure in Critically Ill Patients Using MIMIC-IV

Monday, March 23, 2026

9:00 AM - 10:00 AM Central Time

Location: Connections Central - RST 03

GC

Giorgi Chilingarashvili, MD, MSc

Disclosure(s): No relevant financial relationship(s) to disclose.
KP

Krunal Patel, MD

PCCM Fellow
Temple University Hospital

Disclosure information not submitted.
CB

Cauvery Balhara Taank, MD

IM Resident
Nazareth Hospital

Disclosure information not submitted.
AG

Adam Green, MD, MBA (he/him/his)

Pennsylvania

Disclosure(s): No relevant financial relationship(s) to disclose.
JP

Joshua Pregnar, DO

Critical Care Intensivist
Sound Physicians

Disclosure information not submitted.
SP

Sharad Patel, MD

Cooper University Health Care

Disclosure information not submitted.

Introduction: Post-extubation failure (PEF)—reintubation within 48 h of extubation—affects 5 %–10 % of ICU patients and portends higher mortality, longer stays, and greater resource use. Accurate risk stratification remains elusive. We sought to develop and validate machine-learning (ML) models leveraging high-resolution temporal data from the MIMIC-IV Respiratory Support benchmark to predict PEF and elucidate key predictors.

Methods: In this retrospective cohort study, we identified 17 476 adult ICU patients in MIMIC-IV who underwent invasive mechanical ventilation followed by an extubation event. For each patient, we extracted hourly ventilator settings, vital signs, and laboratory values over the 12 h preceding the first extubation. Missing data were addressed via simple imputation, k-nearest neighbors (KNN), and multiple imputation by chained equations (MICE), with KNN and MICE performance compared using Kolmogorov–Smirnov tests and kernel density estimation. To mitigate class imbalance (5 % PEF), we applied SMOTE oversampling. Four classifiers—elastic-net logistic regression, Random Forest, XGBoost, and LightGBM—were trained on 80 % of the data with 5-fold cross-validation for hyperparameter tuning and evaluated on a held-out 20 % test set. Model discrimination was assessed by area under the receiver-operating characteristic curve (AUC); calibration and learning curves were also examined. SHAP (Shapley Additive Explanations) were used to interpret model outputs.

Results: PEF occurred in 4.97 % of patients. After SMOTE, the training set comprised 313 100 samples (50 % PEF). On the test set, logistic regression achieved AUC=0.95; Random Forest, AUC=0.98; LightGBM, AUC=0.997; and XGBoost, AUC=0.998. Learning curves plateaued at ~15,000 samples, indicating model stability. SHAP analysis identified length of stay, heart rate variability, and peripheral oxygen saturation as the strongest predictors of PEF.

Conclusions: ML models—particularly tree-based boosters—can accurately predict post-extubation failure using temporal ICU data. SHAP-derived insights highlight modifiable clinical parameters that may guide extubation readiness and resource allocation. External validation is warranted to confirm generalizability.