Developing a predictive Tool using data linkage and machine learning, to promote Earlier Diagnosis of Type 1 diabetes in childhood for use in primary care – the TED study

Study type
Protocol
Date of Approval
Study reference ID
20_023
Lay Summary

Early diagnosis of Type 1 Diabetes (T1D) is critical to avoid children developing life-threatening diabetic ketoacidosis (DKA). DKA occurs when blood sugars are very high and acidic substances reach dangerous levels. In the UK, one in four of children who develop T1D are in DKA when diagnosed. These children have worse outcomes, with increased NHS costs. Delayed and misdiagnosis have been reported as risk factors for children developing DKA at diagnosis. Previous studies illustrate the challenges General Practitioners (GPs) face recognising a child who may have developed T1D. This is because it is a rare condition, and children may not have the traditional symptoms, or symptoms may be mistaken for more common childhood complaints.

The TED study aims to develop and assess the usefulness of a tool to detect children who may have undiagnosed T1D. Using the number and details of GP consultations from routinely collected data in England and Wales, the tool will assess the risk a child has of being diagnosed with T1D. Predictive tools have been developed and shown to be successful in primary care to recognise those who have Type 2 diabetes but this will be the first time that a tool has been developed to detect T1D.

Results will show how successful the tool is at distinguishing between children who went on to develop T1D and those who did not. If successful, the tool could be used by GPs during consultations in primary care. This may mean that children will be diagnosed with T1D earlier.

Technical Summary

The training phase will use SAIL-BRECON data, which includes information on children’s primary care consultations and those with T1D during the study period (01/01/2000 – 31/12/2016). At each unique date at which a patient interaction with primary care occurs, information on a collection of flags pertaining to that interaction will be extracted, along with gender, age, and the date a T1D diagnosis occurred, or the patient was censored (whichever is earlier), together with the accompanying event indicator. We will use machine learning algorithms (using SuperLearner in R) to predict time to T1D diagnosis from the information available at the “current” primary care interaction, also taking the number, timing and information from past interactions into account. We will use V-fold cross-validation to find an optimal balance between under- and over-fitting, ultimately selecting the “best” linear combination of all algorithms based on the V-fold cross-validated estimate of the area under the Receiver Operating Characteristic (ROC) curve.

Once the algorithm is developed using SAIL-BRECON data it will be adapted, tested and validated using CPRD-HES data. Hospital Episode Statistics (HES) data will be used to detect when a child was diagnosed with T1D and whether they presented in DKA. The CPRD-HES linked data will provide all primary care interactions with children between 01/01/2000 – 31/12/2016; aged 0-14; date of T1D diagnosis; and DKA status. We will apply our trained diagnosis algorithm with a suitable range of thresholds for prompting blood glucose (BG) testing, to mimic outcomes had our algorithm been in operation. Using a number of assumptions (the sensitivity to which we will assess), will allow us to estimate the proportion of diagnoses that could be anticipated; average number of days that diagnosis could be anticipated; number of DKAs that would be avoided; and the number of BG tests (including negative ones) performed to achieve this.

Health Outcomes to be Measured

This application relates to testing the utility of a predictive tool, to be used in primary care, to detect children who have developed T1D. The outcome measures listed relate to this part of the study and this application.
Overall our outcomes to be measured are:-
1. The number of earlier diagnoses our algorithm might be able to achieve
2. The average number of days by which the diagnosis might have been anticipated in these cases
3. The number of DKA cases in which an earlier diagnosis might be facilitated
4. The number of finger prick tests that would have been performed (including the false positive ones)

Collaborators

Julia Townson - Chief Investigator - Cardiff University
Julia Townson - Corresponding Applicant - Cardiff University
Ambika Shetty - Collaborator - Cardiff and Vale University Health Board
Hywel M. Jones - Collaborator - Cardiff University
John Gregory - Collaborator - Cardiff University
Nick Francis - Collaborator - University of Southampton
Rhian Daniel - Collaborator - Cardiff University
Shantini Paranjothy - Collaborator - University of Aberdeen

Linkages

HES Admitted Patient Care;Practice Level Index of Multiple Deprivation