Artificial Intelligence and Multimorbidity: Clustering in Individuals, Space and Clinical Context (AIM-CISC)

Study type
Protocol
Date of Approval
Study reference ID
21_000542
Lay Summary

People are living longer. One consequence of this is that the number of people living with multiple long-term conditions (multimorbidity) is increasing. The exact patterns of conditions that people with multimorbidity have is very variable. Research to understand these patterns is important. It can help us better understand how conditions are related to each other, and help ensure that healthcare systems are organised to deal with common patterns. This study will examine patterns of multimorbidity, and has five elements.

(1) We will use several different ways of identifying patterns of multimorbidity. We will see if they give us similar answers, and answers that make sense to patients and doctors.

(2) We will use different methods of identifying patterns of multimorbidity, which look for conditions which together cause worse outcomes (for example, which together increase the risk of dying).

(3) We will compare the patterns identified in (1) and (2) in terms of which people have different patterns (for example, different age groups; men vs women), and in terms of which patterns are most strongly associated with worse outcomes.

(4) We will extend the analysis to examine change in patterns over time. This is more difficult because the number of possible patterns is much larger.

(5) We will examine methods used by other people for clustering to see if we get the same answers as them.

We will publish the results in a number of ways to ensure that as many people as possible hear about them.

Technical Summary

Multimorbidity is increasingly common, but is very heterogenous. Previous research examining clustering of conditions in individuals has typically used single methods in single datasets, with limited consistency of cluster solutions between studies and minimal replication or validation. The aim of this study is to use machine learning to examine clustering of morbidities in individuals with replication of cluster solutions across methods in CPRD, and examination of associations of cluster membership with subsequent outcomes. Choice of conditions to cluster is driven by the recommendations of an HDR-UK Delphi consensus study.

Objective 1: We will use unsupervised cross-sectional clustering methods to identify morbidity clusters which are consistent (across methods), stable (to minor perturbation in the data) and explainable.

Objective 2: We will use supervised cross-sectional clustering methods to identify clusters that (in addition to being consistent, stable and explainable) may be more operationalisable in applied research because more strongly associated with particular subsequent adverse or other events.

Objective 3: We will examine associations between individual characteristics and morbidity cluster membership, and between cluster membership and subsequent adverse or other events.

Objective 4: We will explore longitudinal trajectories of morbidity accrual for predicting multimorbidity development in the future, clustering these trajectories in groups that satisfy similar characteristics of being consistent, stable, explainable, and potentially more operationalisable, and comparing identified cross-sectional and longitudinal clusters.

We will publish and disseminate our findings in a variety of ways to ensure we reach multiple audiences, and publish all our code to facilitate replication by others. Outside the scope of this application and working with other research collaborations, we will examine whether cluster solutions derived by us in CPRD are replicable in other datasets.

We have removed previous objective 5 (replication of analyses from other NIHR AI and Multimorbidity collaborations) and will submit separate protocols for this work.

Health Outcomes to be Measured

For the analysis to identify clusters: the outcomes are the emergent clusters of morbidities from unsupervised or supervised clustering methods.

Rationale: Morbidity data is used to classify individuals in terms of the presence of clusters of morbidities, and it is this classification/clustering which is the focus of analysis. A critical issue is the choice of conditions to examine, and how those conditions are defined in CPRD data using Read codes in GP data, ICD-10 codes in HES and ONS data, and other GP data such as laboratory findings or clinical values. We have led Health Data Research UK (HDR-UK) work developing consensus on choice of conditions to use in multimorbidity research [reference 6 and follow-up Delphi consensus study currently under review with a journal], and participate in HDR-UK and NIHR work in relation to codeset development and validation. Within the wider NIHR AI and Multiple Long-Term Conditions programme that funds this work, the individual collaboration PIs have agreed that all studies will use an identical core set of conditions and codesets to improve replicability, plus additional conditions using published codesets as appropriate to each collaboration’s purpose. Our choice of conditions is therefore driven by a structured and explicit process (appendix 1) and includes all the conditions recommended by the HDR-UK consensus study plus additional chronic conditions which are part of known clusters (eg autoimmune diseases) and/or are common and/or have high impact on the patient. The codesets we will use are those created by the CALIBER project at University College London with some modification where required, and some additional bespoke codesets.

For the analysis examining associations between individual characteristics and cluster membership, the outcomes are morbidity cluster membership.

Rationale: Understanding who experiences different clusters is important for both face validity (eg ‘atopy’ should be observable in younger people) and to explore clinical implications).

For the analysis to examine associations between unsupervised cluster membership and subsequent adverse or other events, the outcomes are subsequent adverse or other events, namely mortality; hospital admission (any admission; any emergency admission; ambulatory care sensitive admission; admission with specified conditions); critical care admission, primary care utilisation; additional morbidity accrual; polypharmacy (prescribed 5+ long-term medicines; prescribed 10+; prescribed 15+)).

Rationale: The selected outcomes have all been used in previous research, and are recognised as important in this context, and important to measure to provide comparability with previous research and to help evaluate whether observed clusters have face validity/importance.

Collaborators

Bruce Guthrie - Chief Investigator - University of Edinburgh
Bruce Guthrie - Corresponding Applicant - University of Edinburgh
Anirban Chakraborty - Collaborator - University of Edinburgh
Atul Anand - Collaborator - University of Edinburgh
Clare MacRae - Collaborator - University of Edinburgh
Guillermo Romero Moreno - Collaborator - University of Edinburgh
Imane Guellil - Collaborator - University of Edinburgh
Iris Ho - Collaborator - University of Edinburgh
Jacques Fleuriot - Collaborator - University of Edinburgh
Jake Palmer - Collaborator - University of Edinburgh
Kieran Richards - Collaborator - University of Edinburgh
Lucy Stirland - Collaborator - University of Edinburgh
Luna De Ferrari - Collaborator - University of Edinburgh
Marcus Lyall - Collaborator - University of Edinburgh
Maxmillan Ries - Collaborator - University of Edinburgh
Nazir Lone - Collaborator - University of Edinburgh
Paola Galdi - Collaborator - University of Edinburgh
Regina Prigge - Collaborator - University of Edinburgh
Sohan Seth - Collaborator - University of Edinburgh
Valerio Restocchi - Collaborator - University of Edinburgh

Former Collaborators

Chima Eke - Collaborator - University of Edinburgh
Marcus Lyall - Collaborator - University of Edinburgh

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Rural-Urban Classification