Unsupervised clustering and topological analysis of Type-2 Diabetes and prediabetes patients in CALIBER using clinical data from electronic medical records

Study type
Protocol
Date of Approval
Study reference ID
16_119
Lay Summary

Type-2 Diabetes Mellitus is a complex condition which is increasing in prevalence in the UK. Previous evidence has shown that by subtyping type-2 Diabetes into groups of patients, the different groups may have different risk factors, complications and outcomes to treatments. By targeting treatments to a group of patients we can potentially offer more personalised care whereby different groups are treated differently to have better outcomes.

In order to identify what makes up different groups of patients diagnosed with Type-2 Diabetes we can use patient information collected from Electronic Health Records (EHR's) in the form of CALIBER. CALIBER is a programme which includes EHR's from GP surgeries (CPRD), Hospitals (HES), a database registry on heart attacks (MINAP) and the Office of National Statistics (ONS). The different sources of information are linked to produce CALIBER.

By applying mathematical patterns to EHR's it is then possible to identify similar patients who have similar patterns of disease and group them together. The groups of patients are analysed for what makes them different including differences in long term health outcomes, complications and blood tests. Treatments can then be tailored to improve the long term health outcomes of groups and reduce their specific complications.

Technical Summary

Precision medicine is a new branch of treatment that considers the variability in genetics and clinicopathological architecture of a disease when considering the clinical management of different patients. One recent study utilised a data driven approach using Electronic Health Records (EHR's) to understand Type-2 Diabetes by identifying distinct clusters of patients. Each cluster was characterised by different clinicopathological phenotypes as well as a distinct genetic architecture.

This project proposes a similar analysis using unsupervised topological analysis on the CALIBER dataset (CArdiovascular research using Linked Bespoke studies and Electronic health Records), a research platform of linked EHR's and administrative data providing rich data on comorbidities and treatments, a large number of clinical variables for cluster analysis and a large sample size of patients.

The identified clusters can then be compared for their clinical and demographic characteristics and visualised using topology. Multinomial logistic regression is used to identify which clinical variables are most important in determining clusters and cox regression to identify possible differences between clusters for prognosis and long term outcomes such as cardiovascular disease. This would demonstrate a clearer understanding and better management of different groups of patients diagnosed with Type-2 Diabetes in the UK.

Health Outcomes to be Measured

The outcome for objective one will be the identified clusters and their group membership. Subsequent outcomes to be compared include mortality and complications associated with Type-2 Diabetes (including kidney disease, diabetic retinopathy, diabetic neuropathy microvascular complications and macrovascular complications). Macrovascular and microvascular complications have previously been phenotyped by Anoop et, al. using CPRD data as part of CALIBER.

Collaborators

Juan Pablo Casas Romero - Chief Investigator - University College London ( UCL )
Mustafa Ghafouri - Corresponding Applicant - University College London ( UCL )
Anoop Shah - Collaborator - University College London ( UCL )
David Prieto-Merino - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Jorge Garcia-Hernandez - Collaborator - University College London ( UCL )
Juan M Garcia-Gomez - Collaborator - Technical University of Valencia
Ketan Patel - Collaborator - Astra Zeneca Inc - USA
Sajan Khosla - Collaborator - AstraZeneca Ltd - UK Headquarters
Spiros Denaxas - Collaborator - University College London ( UCL )

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation;MINAP