Cardiometabolic disease prediction using general practice consultation pattern: Use of Machine Learning (ML)

Study type
Protocol
Date of Approval
Study reference ID
20_129
Lay Summary

Heart diseases and diabetes are leading causes of death and poor health in the UK and worldwide, often described in a broad umbrella term “cardiometabolic diseases”. Most medical care is reactive in nature, with a focus on treating cardiometabolic diseases rather than its early identification and possible prevention. Research has shown some common risk factors contribute to the development of cardiometabolic diseases, such as, socioeconomic deprivation, smoking, obesity, lack of physical activity, poor dietary habits, high cholesterol and high blood pressure. However, it is unclear if novel risk factors (beyond the recognised heart disease risk factors mentioned above), such as GP consultation patterns or prescribing patterns may have a role, if any, in predicting cardiometabolic diseases. We want to apply machine learning methods to routinely collected data in general practice (GP) records to identify new factors that help predict cardiometabolic disease early. Machine learning is the process of building computer algorithms in order to make predictions, using the sample data or training data. This in turn could help with development of a prediction model for early identification of cardiometabolic diseases and potentially reduce the risk of associated complications.

In addition, understanding and communicating risks of future adverse health events to patients is important, especially for healthcare policy makers, is how long people can expect to live for and how much of their remaining lifetimes will be spent in good health.

Technical Summary

Cardiometabolic disease is a broad umbrella term often used to describe a cluster of conditions (diabetes, hypertension, CHD, atrial fibrillation (AF), heart failure (HF), stroke, chronic kidney disease (CKD) and vascular dementia) with shared risk factors. Cardiovascular disease is a leading cause of death in the UK, especially among those with diabetes (1). There is a growing interest in primary prevention of cardiometabolic disease by early identification and risk assessment, thereby reducing the health burden associated with cardiometabolic diseases.
Machine Learning (ML) is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world. We propose to apply various machine learning methods to general practice records available in CPRD to develop algorithms for risk prediction of incident cardiometabolic diseases. We will consider eight different cardiometabolic diseases as primary outcomes: diabetes, CHD, atrial fibrillation (AF), heart failure (HF), stroke, vascular dementia, hypertension and chronic kidney disease. We will use following classes of machine learning algorithms: 1. support vector machines (linear and non-linear), 2. penalised logistic regression, 3. boosting ensemble methods, 4. tree-based ensemble methods and 5. neural network approaches. We will use internal cross-validation with various performance measures to evaluate the accuracy of prediction models derived from different ML approaches. We will compare the risk prediction models developed from this work to existing models for cardiovascular risk prediction (e.g. Framingham scores https://www.framinghamheartstudy.org/fhs-risk-functions/ and Q risk scores https://www.qresearch.org/ ) where applicable.

In addition, we will derive transition probabilities to be used in a transition modelling framework involving first ‘competing’ cardiometabolic events and secondary events states with most complications and associated costs; estimate life expectancies and quality adjusted life expectancies. These will form the basis of a web-application tool for decision-making.

Health Outcomes to be Measured

Primary Outcomes
Incidence of Atrial Fibrillation
Incidence of Diabetes
Incidence of Coronary Heart Disease
Incidence of Heart Failure
Incidence of Stroke (ischaemic and haemorrhagic)
Incidence of Vascular Dementia
Incidence of Chronic Kidney Disease
Incidence of Hypertension

Secondary Outcomes
Development of other cardiometabolic conditions after developing one of the eight conditions described above
All-cause mortality
Cardiovascular mortality

Collaborators

Bhautesh Jani - Chief Investigator - University of Glasgow
Bhautesh Jani - Corresponding Applicant - University of Glasgow
Ahmed Zoha - Collaborator - University of Glasgow
Ahsen Tahir - Collaborator - University of Glasgow
Barbara Nicholl - Collaborator - University of Glasgow
Christian Delles - Collaborator - University of Glasgow
Claudia Geue - Collaborator - University of Glasgow
Daniel Mackay - Collaborator - University of Glasgow
Desmond Campbell - Collaborator - University of Glasgow
Donald Lyall - Collaborator - University of Glasgow
Frances Mair - Collaborator - University of Glasgow
Frederick Ho - Collaborator - University of Glasgow
Giorgio Ciminata - Collaborator - University of Glasgow
Hasan Abbas - Collaborator - University of Glasgow
Jennifer Lees - Collaborator - University of Glasgow
Jill Pell - Collaborator - University of Glasgow
Jim Lewsey - Collaborator - University of Glasgow
Kia Dashtipour - Collaborator - University of Glasgow
Michael Sullivan - Collaborator - University of Glasgow
Muhammad Aurangzeb Khan - Collaborator - University of Glasgow
Naveed Sattar - Collaborator - University of Glasgow
Patrick Mark - Collaborator - University of Glasgow
Qammer Abbasi - Collaborator - University of Glasgow
Septiara Putri - Collaborator - University of Glasgow
Srinivasa Vittal Katikireddi - Collaborator - University of Glasgow

Former Collaborators

Septiara Putri - Collaborator - University of Glasgow
Claudia Geue - Collaborator - University of Glasgow
Desmond Campbell - Collaborator - University of Glasgow
Giorgio Ciminata - Collaborator - University of Glasgow

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation