Use of machine learning to improve the prediction of 10-years Cardiovascular Disease risk using data from Electronic Health Records

Date of Approval
Application Number
Technical Summary

This study focuses on improving the cardiovascular risk prediction algorithm using the deep learning approach in the electronic health records data linkage. Contemporary cardiovascular prediction models use the conventional method such as Cox proportional hazard models (e.g. QRISK, Framingham risk score, SCORE, and ACC/AHA) or use an advanced technique without sufficient model comparison and external validating. Cardiovascular risk scores like QRISK and ACC/AHA are comparably more robust and have been widely used in the UK and USA clinical practice. This study will compare the models comprehensively to see whether using deep learning models can truly improve accuracy. To achieve that, several previously studied models are selected including naïve Bayes, decision tree, random forest, K-nearest neighbours, logistic regression, support vector machine, gradient boosting algorithms, and neural networks. CPRD, HES and ONS datasets will provide the baseline characteristic, in/outpatient information, and death registration of the patients. 10-years cardiovascular risk will be predicted. The predictor variables selection will be based on the well-accepted risk factors of cardiovascular disease and variables that show a significant association with previous studies' cardiovascular disease outcome. Models based on different approaches will be developed in the training dataset and generate the AUC curve in the testing dataset. The final model selection will be carried out in a manner of prediction performance comparison. After deciding the selected models, we will also investigate the performance of it in the different sub-types of cardiovascular diseases. This process is used to further discuss the possibility of transferring the 'knowledge' to other related disease prediction (e.g. vascular dementia).

Health Outcomes to be Measured

1. Cardiovascular events:
• Coronary (Ischaemic) Heart Disease; chronic/unstable angina; myocardial infarction (MI)
• Cerebrovascular disease; transient ischaemic attack (TIA); stroke
• Peripheral arterial disease; aortic disease; abdominal aortic aneurysms (AAA)
2. Cardiovascular disease (CVD) mortality
3. Composite major cardiovascular disease event: a composite of cardiovascular death, fatal and non-fatal stroke, Myocardial infarction, and heart failure


Vasa Curcin - Chief Investigator - King's College London (KCL)
Tianyi Liu - Corresponding Applicant - King's College London (KCL)
Abdel Douiri - Collaborator - King's College London (KCL)
M. Jorge Cardoso - Collaborator - King's College London (KCL)


HES Admitted Patient Care;HES Outpatient;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation