Enhancing Autism Spectrum Disorder (ASD) Research in the United Kingdom: Prevalence Estimation and Comparative Assessment of Conventional Regression and Machine Learning Models for ASD Risk Prediction

Study type
Protocol
Date of Approval
Study reference ID
23_002937
Lay Summary

Autism spectrum disorder (ASD) is a condition that affects the development of the brain and can cause difficulties in social interaction, communication, and behaviour. It is becoming more common globally, with approximately one in 160 children worldwide being affected. In the United Kingdom, it is estimated that 0.68% of the population had ASD in 2018. ASD can result in lifelong challenges and other health conditions.

Since there is currently no effective treatment for the main symptoms of ASD, it is crucial to focus on identifying and addressing factors that can increase the risk of developing ASD and related health conditions. Detecting and intervening early is also important to support the progress and functioning of children with ASD. By using information from electronic health records, we can develop strategies that help identify risk factors and individuals who are at a higher risk of ASD and associated health conditions. These strategies will assist in creating interventions to minimize risks and provide early support for individuals with ASD.

Previous studies have examined specific factors that may contribute to the development of ASD, such as maternal risk factors and characteristics of the child. However, it is likely that ASD and associated health conditions result from the combination of multiple factors. Therefore, our study aims to determine the prevalence of ASD in the UK and investigate how different factors interact to influence the development of ASD and associated health conditions.

Technical Summary

This is a retrospective cohort study using data from the CPRD GOLD and CPRD Aurum databases in the UK. The aim of the study is to first estimate the prevalence of ASD in the UK then to develop and compare the predictive performance of machine learning based models with conventional regression models.

We will first estimate the annual prevalence of clinically diagnosed ASD in the UK. The annual prevalence of ASD will be calculated by summing up the number of people with ASD during each study year in the total population and targeted age and sex groups. The summed number will then be divided by the total number of populations of the targeted age/sex group in the middle (July) of that particular year. The annual prevalence will be expressed per 100 persons with a 95% confidence interval, estimated by Poisson method. A linear regression model will be used to test for time trends in the annual prevalence in average annual percentage change throughout the study period.

Conventional multivariable regression model will be used to develop a risk model to predict ASD. The choice of maternal modifiable risk factors and early childhood characteristics will be derived from the existing influential literature. We will fit a full model with all variables. Penalised logistic and Cox proportional hazards model will then be used to develop a risk model to predict ASD respectively. Machine learning approaches, for example neural networks, XGBoost, XGBoost-surv, random forests, and random survival forests, will be used to develop the ASD prediction algorithm. Results will be used to construct a prognostic index for ASD.

Predictions obtained from the machine learning techniques will be compared to those from the conventional regression approaches using measures such as overall accuracy, sensitivity, specificity, precision, and the area under the receiver operating characteristic curve.

Health Outcomes to be Measured

Autism spectrum disorder (ASD) in offspring

Collaborators

Kenneth Man - Chief Investigator - University College London ( UCL )
Adrienne Chan - Corresponding Applicant - UCL School Of Pharmacy
Ian Wong - Collaborator - University College London ( UCL )

Linkages

HES Accident and Emergency;HES Admitted Patient Care;HES Outpatient;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;CPRD Aurum Mother-Baby Link;CPRD Aurum Pregnancy Register;CPRD GOLD Mother-Baby Link;CPRD GOLD Pregnancy Register