Identification and validation of novel and overlapping phenotypes in Asthma, Bronchiectasis and Chronic obstructive pulmonary disease using cluster analysis.

Study type
Protocol
Date of Approval
Study reference ID
22_001747
Lay Summary

Chronic obstructive pulmonary disease, asthma and bronchiectasis are all common obstructive respiratory diseases, in which patients have various symptoms that make breathing challenging. In addition to day-to-day symptoms, they are also at risk of chest infections. Recent research suggests that these diseases overlap with respect to symptoms and causes and that none is just a single disease but a group of diseases. Diagnosis of these diseases can be challenging, and when diagnosed, each disease is thought of as part of “one size fits all” with respect to treatment.

Using health information recorded during GP and hospital visits, this study will describe subtypes of chronic obstructive pulmonary disease, asthma and bronchiectasis using an approach called cluster analysis. This approach puts patients into groups based on individual patient clinical characteristics (such as their smoking status or drugs they are taking). Patients in the same group are more similar between them than with patients in other groups. The grouping is done by computer methods which group things together in more advanced ways than scientists could by themselves.

In this study the patient dataset is very large and best clinical characteristics will be selected by a Respiratory specialist and statistical methods. Furthermore, the study will span many years in the lifetime of patients, to better capture their journey of managing and living with respiratory disease.

More effective diagnosis and management of obstructive airways disease subtypes will enable doctors to diagnose and treat patients with these diseases in a more accurate and efficient manner.

Technical Summary

Background
Asthma, bronchiectasis and chronic obstructive pulmonary disease (COPD) are chronic, heterogeneous obstructive airways diseases, with groups of clinical, pathophysiological and demographic characteristics within them called phenotypes. Additionally, all three diseases can overlap. Currently not much is known about the relationship between pathological features, clinical patterns of disease and response to treatment regimen which would be helpful in order to enable precision medicine approaches for diagnosing, managing and treating these conditions.

Aims and objecrtives
The overarching objective of the proposed research is to identify novel respiratory disease subtypes using unsupervised machine learning (ML) / cluster analysis in electronic health records. We will achieve this through two interwoven aims:

1. Systematically apply unsupervised machine ML clustering methods to characterize and validate novel disease subtypes in patients with distinct or overlapping diagnoses of COPD, asthma and bronchiectasis;
2. Define longitudinal disease progression phenotypes by elucidating the evolution of patient subgroups through time with regards to complications, disease progression and treatment.

Exposure
The primary exposure of interest which will be used to define the cohort is the diagnosis of chronic obstructive airways disease (asthma, bronchiectasis, or COPD).

Outcomes
The outcomes of interest include all-cause mortality, cardiovascular and respiratory-specific mortality, exacerbations in primary care, and exacerbations leading to hospitalisations.

Study design
This is a retrospective cohort study

Methods
We will apply clustering algorithms using a set of core variables which cover a broad range of routine assessments including demographic information, variables linked to disease severity, variables indicative of impairment, physiological measures and confounders.

Main statistical tests
We will quantify the effect of cluster membership to disease outcomes by calculating hazard ratios using cluster-specific adjusted Cox-regression models.

Public Health benefit
Results from this study could lead to improved diagnosis, characterisation and management of asthma, bronchiectasis, COPD, and their overlap.

Health Outcomes to be Measured

The following outcomes will be measured: Death (all cause, respiratory, cardiovascular disease), number and rate of exacerbations in primary and secondary care.

Collaborators

Jennifer Quint - Chief Investigator - Imperial College London
Maria Pikoula - Corresponding Applicant - University College London ( UCL )
Constantinos Kallis - Collaborator - Imperial College London
Spiros Denaxas - Collaborator - University College London ( UCL )

Linkages

HES Accident and Emergency;HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation