Risk prediction modelling for gestational diabetes

Study type
Protocol
Date of Approval
Study reference ID
22_002383
Lay Summary

Gestational diabetes mellitus (GDM) is high blood sugar that can happen during pregnancy. It's the most common pregnancy complication, affecting about 1 in 7 pregnancies. GDM can increase the risk of pregnancy or delivery complications, and lead to health problems later in life for both the mother and the baby, like being overweight or getting heart disease.

While it's important to detect and treat GDM to avoid complications for both the mother and baby, it can be difficult for healthcare systems to keep up with the demand, especially since it's not always clear who is at risk of developing GDM. If we could accurately predict which women are most likely to develop GDM and experience related complications, we could focus our resources on these groups. This would ensure that everyone gets the attention they need and avoid overwhelming the healthcare system.

The overall aim of the project is to develop tools to identify women who are at risk of developing GDM and facing negative outcomes from it. Data of women with GDM will be extracted from the Clinical Practice Research Datalink electronic health records. These data will be used to identify the factors that contribute to a higher risk of GDM and its negative outcomes, and to build and validate predictive models using different statistical and machine learning methods. The project will help us understand how well we can predict who is at the highest risk using only data available in clinical records.

Technical Summary

This project aims to develop and validate risk prediction models for gestational diabetes mellitus (GDM) and its adverse outcomes using electronic healthcare records. To develop the prognosis models, a retrospective cohort study design will be used, where information about predictors and outcomes are collected from existing electronic health records. The primary model will determine GDM risk in women with at least one singleton delivery recorded in the healthcare records. The secondary models will focus on the risk of adverse outcomes in women with a GDM diagnosis.

The project will use generalised linear models, support vector machine, random forest, extreme gradient boosting ensemble learning, and deep neural network learning approaches for binary outcomes such as the GDM occurrence; while Cox regression, Cox model with gradient boosting, survival support vector machine, and random survival forest will be used to predict the time-to-event outcomes. A data cleaning pipeline will be developed to filter the outliners and handle missing values. In model development, we will use feature ranking methods such as LASSO regularisation for the GLM, Shapley values for tree-based methods and wrapper methods (backward feature elimination, recursive feature elimination) to increase the interpretability of the models. We will split the dataset into 80:10:10 for model training, testing and validation. The performance of the models will be evaluated using performance metrics for classification and regression models. Bootstrapping and fold-based methods will be used to handle unbalanced data during model testing and hyperparameter tuning. Model calibration, discrimination, and clinical utility will be taken into consideration during the final model selection.

Effective risk stratification is crucial for resource allocation and secondary prevention in the face of the growing prevalence of GDM, which poses a burden on overwhelmed health systems. Accurate detection, treatment, and monitoring can mitigate complications, but identifying at-risk populations remains a challenge.

Health Outcomes to be Measured

The primary outcome is a diagnosis of GDM recorded either in the primary care record or in the HES APC data assocated with the delivery.
Secondary outcomes include maternal and neonatal adverse outcomes associated with GDM including:
Outcomes that happen during pregnancy and/or delivery:
GDM Treatment method (including lilfestyle, metformin and insulin)
Gestational weight gain
Hypertensive disorders of pregnancy
Number of antenatal visits/admissions
Still birth

Delivery related outcomes:
Mode of delivery (emergency C-section, elective C-section, instrumental delivery)
Perineal trauma/tearing
Induction of labour
Preterm delivery
Length of postnatal stay
Longer-term outcomes:
Development of type 2 diabetes
Return to pre‐pregnancy weight
Postnatal depression.
Cardiovascular morbidity
Obesity
Recurrent GDM

Collaborators

Nerys Astbury - Chief Investigator - University of Oxford
Nerys Astbury - Corresponding Applicant - University of Oxford
Cynthia Wright Drakesmith - Collaborator - University of Oxford
Huiqi Yvonne Lu - Collaborator - University of Oxford
Margaret Smith - Collaborator - University of Oxford
Subhashisa Swain - Collaborator - University of Oxford
Yasmina Al Ghadban - Collaborator - University of Oxford

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation