Unsupervised clustering and topological analysis of Type-2 Diabetes and prediabetes patients in CALIBER using clinical data from electronic medical records

Date of Approval
Application Number
Technical Summary

Precision medicine is a new branch of treatment that considers the variability in genetics and clinicopathological architecture of a disease when considering the clinical management of different patients. One recent study utilised a data driven approach using Electronic Health Records (EHR's) to understand Type-2 Diabetes by identifying distinct clusters of patients. Each cluster was characterised by different clinicopathological phenotypes as well as a distinct genetic architecture.

This project proposes a similar analysis using unsupervised topological analysis on the CALIBER dataset (CArdiovascular research using Linked Bespoke studies and Electronic health Records), a research platform of linked EHR's and administrative data providing rich data on comorbidities and treatments, a large number of clinical variables for cluster analysis and a large sample size of patients.

The identified clusters can then be compared for their clinical and demographic characteristics and visualised using topology. Multinomial logistic regression is used to identify which clinical variables are most important in determining clusters and cox regression to identify possible differences between clusters for prognosis and long term outcomes such as cardiovascular disease. This would demonstrate a clearer understanding and better management of different groups of patients diagnosed with Type-2 Diabetes in the UK.

Health Outcomes to be Measured

The outcome for objective one will be the identified clusters and their group membership. Subsequent outcomes to be compared include mortality and complications associated with Type-2 Diabetes (including kidney disease, diabetic retinopathy, diabetic neuropathy microvascular complications and macrovascular complications). Macrovascular and microvascular complications have previously been phenotyped by Anoop et, al. using CPRD data as part of CALIBER.


Juan Pablo Casas Romero - Chief Investigator - University College London ( UCL )
Mustafa Ghafouri - Corresponding Applicant - University College London ( UCL )
Anoop Shah - Collaborator - University College London ( UCL )
David Prieto-Merino - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Jorge Garcia-Hernandez - Collaborator - University College London ( UCL )
Juan M Garcia-Gomez - Collaborator - Technical University of Valencia
Ketan Patel - Collaborator - Astra Zeneca Inc - USA
Sajan Khosla - Collaborator - Astra Zeneca Ltd - UK Headquarters
Spiros Denaxas - Collaborator - University College London ( UCL )


HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation;MINAP