Use of machine learning to improve the prediction of 10-years Cardiovascular Disease risk using data from Electronic Health Records

Study type
Protocol
Date of Approval
Study reference ID
21_000346
Lay Summary

The cardiovascular disease (CVD) prediction algorithm (e.g. QRISK score and ACC/AHA ASCVD risk calculator) predicts the 10-years risk of developing cardiovascular events in patients. These algorithms are been adopted by the UK and US primary care. Based on the algorithm's score and the medical judgment, doctors can decide whether to deliver the statin prescription or other blood pressure-lowering therapies to prevent CVD for the patients. QRISK and ACC/AHA are the current cardiovascular disease prediction algorithms used in the UK and US. The external source has well validated these algorithms. Both are developed using the conventional method such as regression models. This study aims to use deep learning models (one approach of machine learning, which use statistics to find the hidden patterns in a massive amount of data) instead to improve the performance further.
Moreover, several other machine learning models will also be investigated. The optimal models will be considered to replace the current algorithms to achieve better practice in cardiovascular disease prediction. The contemporary studies constructed several machine learning models in the same area, but none has a fair comparison with others or adopted by the healthcare providers. Furthermore, optimal cardiovascular disease prediction models may also have the potential to 'transfer' to other disease prediction. In this case, the medical resources can be saved, and research costs will be vastly reduced.

Technical Summary

This study focuses on improving the cardiovascular risk prediction algorithm using the deep learning approach in the electronic health records data linkage. Contemporary cardiovascular prediction models use the conventional method such as Cox proportional hazard models (e.g. QRISK, Framingham risk score, SCORE, and ACC/AHA) or use an advanced technique without sufficient model comparison and external validating. Cardiovascular risk scores like QRISK and ACC/AHA are comparably more robust and have been widely used in the UK and USA clinical practice. This study will compare the models comprehensively to see whether using deep learning models can truly improve accuracy. To achieve that, several previously studied models are selected including naïve Bayes, decision tree, random forest, K-nearest neighbours, logistic regression, support vector machine, gradient boosting algorithms, and neural networks. CPRD, HES and ONS datasets will provide the baseline characteristic, in/outpatient information, and death registration of the patients. 10-years cardiovascular risk will be predicted. The predictor variables selection will be based on the well-accepted risk factors of cardiovascular disease and variables that show a significant association with previous studies' cardiovascular disease outcome. Models based on different approaches will be developed in the training dataset and generate the AUC curve in the testing dataset. The final model selection will be carried out in a manner of prediction performance comparison. After deciding the selected models, we will also investigate the performance of it in the different sub-types of cardiovascular diseases. This process is used to further discuss the possibility of transferring the 'knowledge' to other related disease prediction (e.g. vascular dementia).

Health Outcomes to be Measured

1. Cardiovascular events:
• Coronary (Ischaemic) Heart Disease; chronic/unstable angina; myocardial infarction (MI)
• Cerebrovascular disease; transient ischaemic attack (TIA); stroke
• Peripheral arterial disease; aortic disease; abdominal aortic aneurysms (AAA)
2. Cardiovascular disease (CVD) mortality
3. Composite major cardiovascular disease event: a composite of cardiovascular death, fatal and non-fatal stroke, Myocardial infarction, and heart failure

Collaborators

Vasa Curcin - Chief Investigator - King's College London (KCL)
Tianyi Liu - Corresponding Applicant - King's College London (KCL)
Abdel Douiri - Collaborator - King's College London (KCL)
M. Jorge Cardoso - Collaborator - King's College London (KCL)

Former Collaborators

Vasa Curcin - Collaborator - King's College London (KCL)

Linkages

HES Admitted Patient Care;HES Outpatient;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation