Investigating early diagnoses of patients with ankylosing spondylitis using machine learning: a predictive modelling study

Study type
Protocol
Date of Approval
Study reference ID
20_028
Lay Summary

Ankylosing spondylitis (AS) is a debilitating disease that affects the spine. Symptoms usually start in the teens/twenties. A form of arthritis, this disease causes inflammation in the spinal joints (or vertebrae), and may also affect other joints. In the more advanced cases, the inflammation may cause new bone formation where sections/joints of the spine fuse together. The symptoms of AS vary, but most people experience back pain and stiffness in the early stages of the disease.

The estimated delay between symptom onset and diagnosis of AS is approximately 7-12 years. The existence of common symptoms (such as back pain) and the absence of remarkable physical findings contribute to its under and delayed diagnosis(Fallahi & Jamshidi, 2016). This in turn leads to a social, economic, physical and psychological burden on patients and the healthcare system. Because electronic health records (EHRs) include a sequence of measurements (clinical visits) over a period of time, they contain important information about the progression of disease, which can be particularly useful for uncovering patterns in challenging and poorly diagnosed disorders such as AS.

The aim of this project is to aid in an early diagnosis of AS by analysing real world evidence data to identify early indicators of the disease. Determining patterns across medical features (signs, symptoms, outpatient and emergency visits, procedures, referrals, tests) and utilizing those patterns to enable decision making (i.e. a more rapid specialist referral by the GP) at an early stage of the disease, may prove highly beneficial for AS management in the future.

Technical Summary

Objectives:
Although ~200,000 people in the UK have Ankylosing spondylitis (AS), a spinal inflammatory disease, it is an underdiagnosed condition(Dean et al., 2014). Symptoms usually start in the teens/twenties but it can take ~10 years for patients to receive an accurate diagnosis and treatment plan. This is due to a number of factors such as the existence of symptoms, which have a broad prevalence in the general population (like back pain), a gradual onset of the disease and the lack of unique biomarkers and clear guidelines for rheumatology referrals. Delayed diagnosis is a major impediment to treatment, disease characterization, policy-making, and resource allocation. This may be overcome by the analysis of electronic health records via predictive modelling.

Building predictive models may be particularly useful to assimilate vast information and detect patterns across patient health records in order to provide valuable insight into disease management. We will use the CPRD, Hospital Episode Statistics (HES) and Office for National Statistics (ONS) data to identify predictors among the AS patient population that differentiate them from a non-AS control population. HES and ONS data will be used in conjunction with CPRD to gather insights into the diagnostic journey of AS patients by analysing patterns in hospitalisations, outpatient consultations, referrals, clinical procedures performed, accident and emergency visits, cause of death and treatment specialty.

Methods:
Machine learning will be used to capture features and patterns in patient characteristics that differentiate AS patients from controls across the database, and then predict an AS diagnosis, based on these features. Accuracy, precision, Receiver Operating Characteristic (ROC) curves will be computed for the trained and tested models.

Data analysis:
Patients recorded with a diagnosis of AS from 01/01/2005 to 31/12/2018 will be the cohort analysed in this study.

Health Outcomes to be Measured

The outcome of interest in this analysis is a precise and early diagnosis of Ankylosing Spondylitis (and axSpA). Read codes for this are provided in Appendix A. Since this is a predictive analysis study, outcomes to be measured include (but are not limited to) accuracy, sensitivity, specificity, precision-recall, F1-score, positive and negative predictive values of the model as applied to the categorized patient and control population.

Collaborators

Matic Meglic - Chief Investigator - Novartis Pharma AG ( Switzerland )
Shruti Narasimham - Corresponding Applicant - Novartis Ireland Limited
Abigail White - Collaborator - Novartis Pharmaceuticals UK Limited
Borja Mato - Collaborator - Novartis Pharma AG ( Switzerland )
Chiara Perella - Collaborator - Novartis Pharma AG ( Switzerland )
Jonathan Doogan - Collaborator - Novartis Pharma AG ( Switzerland )
Mark Taylor - Collaborator - Novartis Pharmaceuticals UK Limited
Mark Tomlinson - Collaborator - Novartis Pharmaceuticals UK Limited
Paul Emery - Collaborator - University of Leeds
Paula Pamies - Collaborator - Novartis Pharmaceuticals UK Limited
Raj Sengupta - Collaborator - Royal National Hospital For Rheumatic Diseases
Ruediger Merkel - Collaborator - Novartis Pharma AG ( Switzerland )

Former Collaborators

Brendan Lynch - Collaborator - Novartis Ireland Limited
Brian Buckley - Collaborator - Novartis Pharmaceuticals UK Limited

Linkages

2011 Rural-Urban Classification at LSOA level;HES Accident and Emergency;HES Admitted Patient Care;HES Outpatient;ONS Death Registration Data