Development of a prediction model in Pompe disease, Fabry disease, and Gaucher disease based on UK electronic primary healthcare records and linked secondary care data.

Study type
Protocol
Date of Approval
Study reference ID
20_184
Lay Summary

Pompe disease, Fabry disease, and Gaucher disease are rare genetic disorders which are chronically debilitating and can lead to reduced life expectancy if left untreated. Some forms can be difficult to diagnose as they are uncommon and present with a range of symptoms seen that can be mistaken for other conditions. However, if patients can get an earlier diagnosis they can then get treatment which may reduce the occurrence and severity of disease symptoms and extend their life.
This study plans to report which combination of symptoms and other characteristics are common in people with these conditions compared to those without the disease. Using research from published literature, the data from this study together with input from clinicians who treat the disease an algorithm will be developed using a predictive mathematical model. This algorithm will then be validated by expert clinicians. Once validated, the algorithm will be made available to hospitals and GPs, who will be able to run the algorithm across their own data. This will help patients get earlier testing and referrals to specialist centres so that diagnoses can be confirmed, and treatment started.

Technical Summary

This study aims to find predictors to detect those at highest risk of a diagnosis of late onset Pompe disease (LOPD), Fabry disease, and Gaucher disease (Type I and III).
The disease predictor machine learning modelling will identify predictors of a diagnosis of a Liposomal Storage Disease (LSD). Cases of each disease will be matched to non-cases by age and sex with a target ratio of 1 case to 20,000 controls. These training cases and controls will be used to build a predictive model for those at risk of the disease. Patient characteristics indicative of the disease will be pre-specified based on published literature and key opinion leader (KOL) input. The training dataset will build and train various implementations of the predictive algorithm using i) logistic regression and ii) machine learning using random forest methodology including pre-specified clinical variables.
By simultaneously processing a large number of predictors, using random forest provides a measure of each variable’s importance in the context of multivariate interactions with other predictors that might have gone unnoticed in traditional analysis subgroup models, where the selection of variables and their interactions are driven by pre-specified rules regarding statistical significance. Random forest methodology employs a ‘systematic’ approach to the development of subgroups, which are constructed sequentially through repeated, binary splits of the population of interest, one explanatory variable at a time. In other words, each ‘parent’ group is divided into two ‘child’ groups, with the objective of creating increasingly homogeneous subgroups. Random forest variable importance may reveal higher importance scores for variables working in complex interactions, which may have gone unnoticed in parametric regression models.
The resultant algorithm will then be tested and validated on re-finding the diagnosed population.

Health Outcomes to be Measured

Pompe disease; Fabry disease; Gaucher disease (Type I and III); survival; co-morbidities (respiratory, ocular, auditory, cardiovascular, musculoskeletal, renal, sleep disturbance, neurological and gastrointestinal); wheelchair use, ventilator use; secondary care and primary care resource use.

Collaborators

Craig Currie - Chief Investigator - Cardiff University
Nick Denholm - Corresponding Applicant - Harvey Walsh Ltd
Ben Van Hout - Collaborator - Harvey Walsh Ltd
Clare Halcro - Collaborator - Genzyme Limited (UK)
Eleanor Saunders - Collaborator - Genzyme Limited (UK)
Kinga Malottki - Collaborator - Genzyme Limited (UK)
Matthew O'Connell - Collaborator - Harvey Walsh Ltd
Rachel Lawson - Collaborator - Genzyme Limited (UK)
Viktor Chirikov - Collaborator - OPEN Health Group

Former Collaborators

Michael Wallington - Collaborator - OPEN VIE