The development and internal validation of population clusters for multiple long-term conditions using artificial intelligence approaches

Study type
Protocol
Date of Approval
Study reference ID
21_001667
Lay Summary

An estimated 14 million people in England are living with multiple long-term health conditions (MLT-C). Efforts to improve care mainly focus on biological markers of disease such as blood pressure or cholesterol, without adequately addressing other factors that contribute to good health. A shift towards integrated care that considers the ‘whole person’’ and their environment is essential in addressing the complex, diverse and individual needs of people living with MLT-C. One approach to delivering more personalised care is to ‘cluster’ or group people based on similarities in their medical and non-medical needs. This approach has been adopted in other countries but not in the UK due to uncertainty about how to develop clusters, and there is no evidence linking this approach to improved health or reduced costs.

In this study we aim to generate evidence on how to cluster people by health and social need using machine learning, i.e., we use a computer system to help us to identify and group people with similar needs together. We will use anonymised patient records to test machine learning and generate clusters that will be compared to those developed through patient/professional opinions to see which are better at predicting outcomes. We will study clusters to learn what happens over time in terms of health/social costs, and to understand the profiles of people within each cluster.

We will use this information to develop tailored approaches for clusters that join up health and social care, and improve the lives of people with MLT-C.

Technical Summary

Background: Multiple long-term health conditions (MLT-C) are increasingly prevalent and associated with high rates of morbidity, mortality, and health-care expenditure. Strategies to tackle this have primarily focused on addressing biological aspects of disease, but MLT-C are also the result of and associated with additional psycho-social, economic, and environmental barriers. A shift towards more personalised, holistic, and integrated care could be effective. This could be made more efficient by identifying groups of the population based on their health and social need. Evidence is needed on how to generate clusters based on health and social need and to quantify the impact of clusters on long-term health and costs.

Aim: To develop and internally validate population clusters that consider determinants of health and social care need for people with MLT-C using data-driven machine-learning methods compared to expert-driven approaches within primary care national databases, followed by evaluation of cluster trajectories and their association with health outcomes and costs.

Study Design: Retrospective open cohort

Setting: Primary care settings within the UK

Participants: Patients aged 18 years and over with at least two or more different incident chronic condition (from a defined list of 59 conditions) recorded during 01-Jan-1997 to 31-Dec-2020.

Primary outcome: All-cause mortality

Methods: Two unsupervised clustering approaches will be used to develop (identify) clusters of patients with MLT-C within CPRD data: an expert-driven segmentation (using variables agreed on by stakeholders through a Modified Delphi study) and an entirely data-driven approach. Subsequently:
• identified cluster outputs from the 2 approaches will be characterised and compared.
• cluster trajectories over time will be examined.
• the association between clusters and health outcomes and costs.

Outputs:
• Identify clusters of patients with MLT-C, for external validation in other datasets.
• Identify patient cluster(s) associated with worse health/social care outcomes.

Health Outcomes to be Measured

Primary outcome: All-cause mortality

Secondary outcomes:
• Development of additional MLT-C
• Cause-specific mortality
• Frailty score
• Costs calculated in UK pounds and monetary values transformed to the UK national level using the Hospital and Community Health Services Pay and Prices Index and an additive method to sum the annual costs of multiple complications will be used. Costs will be discounted at 3.5% per annum in line with current guidelines. Including:
o Health and social service utilisation, cumulative costs of treating LTC:
 Inpatient costs (of admissions to hospital as a day case or as an inpatient for ≥1 night), outpatient and A&E costs. National tariff costs [52] from NHS Digital will be used for all hospital utilisations.
 Non-inpatient costs (costs of all GP contacts and outpatient clinics). For all other utilisations, including primary care and social care, PSSRU unit costings [53] will be used.
 Referrals recorded in notes to social care/physiotherapy etc., nursing home/care home costs and medication.

Collaborators

Hajira Dambha-Miller - Chief Investigator - University of Southampton
Hajira Dambha-Miller - Corresponding Applicant - University of Southampton
Beth Stuart - Collaborator - University of Southampton
Christos Chalitsios - Collaborator - University of Southampton
Francesco Zaccardi - Collaborator - University of Southampton
Hilda Hounkpatin - Collaborator - University of Southampton
Mazen Ahmed - Collaborator - University of Southampton
Michael Boniface - Collaborator - University of Southampton
Nazrul Islam - Collaborator - University of Southampton
Nusrat Khan - Collaborator - University of Southampton
Yvonne Nartey - Collaborator - University of Southampton
Zlatko Zlatev - Collaborator - University of Southampton

Former Collaborators

Mazen Ahmed - Collaborator - University of Southampton
Nusrat Khan - Collaborator - University of Southampton
Yvonne Nartey - Collaborator - University of Southampton

Linkages

HES Accident and Emergency;HES Admitted Patient Care;HES Outpatient;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation