Do features occurring prior to MS diagnosis vary according to background: a nested case control study

Date of Approval
Application Number
Technical Summary

Previous work using large healthcare datasets has identified an “MS prodrome” up to 10 years prior to MS diagnosis. Nothing is known about how ethnicity and deprivation interact with or influence features associated with these earliest manifestations. Previous studies have matched and/or corrected for these factors, or have not examined them.

The specific aims of this research project are:
To identify specific constellations of symptoms and/or medical diagnoses occurring prior to MS onset
To establish whether factors vary or interact differently according to gender, deprivation, ethnicity and/or urban/rural classification.
To use the above to identify potentially modifiable aspects of the MS prodrome, which may be limited to one or more groups

This study will be a combination of descriptive, hypothesis testing, and ideally hypothesis-generating. We hope to generate a ‘fingerprint’ defining the earliest stages of MS, robust to ethnicity, socio-economic status and rural/urban classification, which could be used prospectively to either predict MS or select high-risk participants in prevention/prediction cohorts.

All medical diagnoses and prescription data present within read codes will be grouped according to clinical symptom constellation using established read code dictionaries. The first recorded diagnosis or symptom for each group of diagnoses will be established for each record. Phenome-wide association testing will be used to determine the association of symptom groups, prescription medications, and diagnoses with a subsequent diagnosis of Multiple Sclerosis.

To determine whether specific prescriptions, symptoms, or diagnoses are associated with subsequent MS, a variety of statistical approaches will be used: multivariable Cox regression, multivariable logistic regression, and machine learning approaches (including penalised logistic regression, gradient-boosted trees, and random forest classifiers). For the standard statistical approaches (Cox and logistic regression), all regression analyses will control for age and sex as confounding covariables. All exposures with sufficient data quality will be tested for association.

Health Outcomes to be Measured

Primary outcome: multiple sclerosis
This primary outcome will be defined by the occurrence of at least 1 diagnostic code for multiple sclerosis, clinically isolated syndrome, demyelinating disease as used in other primary care data studies (see details below).

Secondary outcomes: MS age at first diagnostic code, frailty score (electronic frailty index as defined in [1])


Ruth Dobson - Chief Investigator - Queen Mary University of London
Ruth Dobson - Corresponding Applicant - Queen Mary University of London
Benjamin Jacobs - Collaborator - Queen Mary University of London


HES Accident and Emergency;HES Admitted Patient Care;HES Outpatient;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation;Pregnancy Register;Rural-Urban Classification