A Machine Learning Algorithm using Clinical and Demographic Data for All-Cause Preterm Birth Prediction.
Publication/Presentation Date
5-1-2024
Abstract
OBJECTIVE: Preterm birth remains the predominant cause of perinatal mortality throughout the United States and the world, with well-documented racial and socioeconomic disparities. To develop and validate a predictive algorithm for all-cause preterm birth using clinical, demographic, and laboratory data using machine learning.
STUDY DESIGN: We performed a cohort study of pregnant individuals delivering at a single institution using prospectively collected information on clinical conditions, patient demographics, laboratory data, and health care utilization. Our primary outcome was all-cause preterm birth before 37 weeks. The dataset was randomly divided into a derivation cohort (70%) and a separate validation cohort (30%). Predictor variables were selected amongst 33 that had been previously identified in the literature (directed machine learning). In the derivation cohort, both statistical (logistic regression) and machine learning (XG-Boost) models were used to derive the best fit (C-Statistic) and then validated using the validation cohort. We measured model discrimination with the C-Statistic and assessed the model performance and calibration of the model to determine whether the model provided clinical decision-making benefits.
RESULTS: The cohort includes a total of 12,440 deliveries among 12,071 individuals. Preterm birth occurred in 2,037 births (16.4%). The derivation cohort consisted of 8,708 (70%) and the validation cohort consisted of 3,732 (30%). XG-Boost was chosen due to the robustness of the model and the ability to deal with missing data and collinearity between predictor variables. The top five predictor variables identified as drivers of preterm birth, by feature importance metric, were multiple gestation, number of emergency department visits in the year prior to the index pregnancy, initial unknown body mass index, gravidity, and prior preterm delivery. Test performance characteristics were similar between the two populations (derivation cohort area under the curve [AUC] = 0.70 vs. validation cohort AUC = 0.63).
CONCLUSION: Clinical, demographic, and laboratory information can be useful to predict all-cause preterm birth with moderate precision.
KEY POINTS: · Machine learning can be used to create models to predict preterm birth.. · In our model, all-cause preterm birth can be predicted with moderate precision.. · Clinical, demographic, and laboratory information can be useful to predict all-cause preterm birth..
Volume
41
Issue
S 01
First Page
3115
Last Page
3115
ISSN
1098-8785
Published In/Presented At
Bitar, G., Liu, W., Tunguhan, J., Kumar, K. V., & Hoffman, M. K. (2024). A Machine Learning Algorithm using Clinical and Demographic Data for All-Cause Preterm Birth Prediction. American journal of perinatology, 41(S 01), e3115–e3123. https://doi.org/10.1055/s-0043-1776917
Disciplines
Medicine and Health Sciences
PubMedID
38049100
Department(s)
Department of Obstetrics and Gynecology
Document Type
Article