Cardiovascular risk prediction in daily practice: can machine-learning based risk prediction models improve the accuracy of risk prediction for individual patients?

Study type
Protocol
Date of Approval
Study reference ID
19_054
Lay Summary

Cardiovascular disease (CVD), which is a disease of the heart and the system that pumps the blood around in the body, is the leading reason for death globally, and risk prediction models such as QRISK (that use the data as collected by general practitioners) are widely used to predict the risk for individual patients. In the UK, statins (medicines that lower cholesterol and reduces the risk of heart attacks) are recommended if a patient's 10-year risk of developing CVD is above 10% as estimated with QRISK. Accurate risk prediction for all patients is therefore very important. Previous research in our group has found that risk prediction models such as QRISK do not consistently predict the risk for individual patients but vary widely depending on simple changes to the model. This project will evaluate whether a new class of risk prediction models (called machine learning models) can improve the robustness of risk prediction for individuals or whether they suffer from the same limitations as models such as QRISK. This project aims to evaluate whether a technique called machine learning could improve the accuracy of individual risk prediction. Machine learning is a method in which a computer analyses large amounts of data and then finds patterns in these data. This project will extend a previously approved project that is evaluating the uncertainty of individual risk prediction in models that use routinely collected data.

Technical Summary

Cardiovascular disease (CVD) is the leading reason for death globally, and risk prediction models such as QRISK are being used to predict risk for individual patients. This protocol will use inclusion criteria and outcome definitions similar to the previously approved ISAC agreement (17_125RMn2). The protocol will use machine-learning based methods of model fitting rather than Cox regression as was done in this previous study. The machine-learning based methods considered in this protocol will be based on statistical learning theory rather than artificial intelligence, which have similarities to conventional statistical methods such as Cox proportional hazard model. This study will replicate the modelling process as used in a recent machine-learning paper that used CPRD data and compare machine-learning models' individual risk prediction and conventional model performance to those predicted by QRISK3. The objective of this study will be to determine the level of uncertainty with the prediction of CVD risk for individual patients when using risk scores such as QRISK3. We will evaluate whether statistical learning theory based machine-learning methods could improve the accuracy of risk prediction for individual patients. Statistical learning based machine-learning methods will be used to model patients' data to acquire risk prediction model. These methods will include support vector machine (SVM) and random Forest. We will replicate the process of the recent paper to tune the machine-learning model. Performance metrics (calibration and discrimination) will be compared to QRISK3. To further assess the model's accuracy on individual risk prediction, practice variability will be incorporated into the models by including general practices as random effects. We then compare individual risk prediction from machine-learning models to the random effects models to assess whether it improves accuracy of individual risk prediction in the context of heterogeneity in populations and data completeness between practices.

Health Outcomes to be Measured

1.      Conventional model performance of QRISK3, machine-learning models and random effects machine-learning models.
2. Whether QRISK3, machine-learning models and random effects machine learning model predict consistent CVD risks for the same individual patients.

Collaborators

Tjeerd van Staa - Chief Investigator - University of Manchester
Yan Li - Corresponding Applicant - University of Manchester
Darren Ashcroft - Collaborator - University of Manchester

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Townsend Score