Data-driven approaches for detecting fast progressors to end-stage renal disease (ESRD) among all stage chronic kidney disease (CKD) patients powered by a large electronic-health record (EHR) dataset

Study type
Protocol
Date of Approval
Study reference ID
22_001793
Lay Summary

Chronic kidney disease (CKD) is an insidious disease with a dramatically increasing prevalence of almost 850 million across the globe. About 4% of the CKD patients progress to end stage renal disease (ESRD) and kidney failure which is a leading cause of mortality and is associated with a high economic burden. One of the key challenges for researchers and physicians is the early detection of CKD patients with fast progression to ESRD, referred to as fast progressors. This provides the foundation for further research and implementation of existing and future measures for timely intervention and better patient stratification with the aim to delay or - in early stages - even prevent the need for dialysis in the future. This translates into improved quality of life and outcomes for CKD patients and substantial cost savings for the healthcare system. The goal of this study is to develop novel data-driven approaches able to properly utilize the valuable information in large electronic health records (EHR) datasets and provide state-of-the-art detection of CKD fast progressors at all stages of the disease. In this study we will develop in close collaboration with clinicians, a CKD risk prediction algorithm using a patient population from the UK. To assess the generalisability of the algorithm, its ability to predict the risk of CKD progression will be assessed using a large, independent EHR dataset from the US. Findings from this study will provide opportunities of using data driven approaches to enable physicians optimizing the management of CKD patients.

Technical Summary

The study aims to develop a machine-learning (ML)
approach based on laboratory test results, vital sign measurements, comorbidities, and medications, to predict the risk of chronic kidney disease (CKD) progression to end stage renal disease (ESRD) in the primary care setting. This is a retrospective cohort study that uses secondary data from the CPRD GOLD and CPRD Aurum databases in the UK to train and develop the ML models. The patient population of interest consists of patients with a CKD or ESRD diagnosis recorded in CPRD.

We will consider different time periods to onset of ESRD relevant for the clinical application of the developed approaches to define fast progressors. For example, considering longer time periods to ESRD onset such as 5 years can enable timely prevention actions. To build a CKD risk prediction algorithm that enables early detection of fast CKD progressors, multivariate machine learning models, namely gradient boosting trees ensembles and neural network based approaches will be developed and trained on the CPRD data. These approaches were shown to effectively capture complex non-linear dependencies and offer high-quality results in terms of prediction accuracy and generalization ability. In addition, the Machine Learning (ML) approaches will employ feature engineering approaches, which enable the capturing of temporal information relevant for the problem, as well as automatic feature selection integrated into the cross-validation workflow, which ensures a minimal and optimal set of features necessary for successful application in clinical practice.
In a detailed evaluation study, we will compare the predictive performance of our approaches with relevant state-of-the-art published benchmarks for CKD risk prediction on a large, independent EHR (electronic-health record) dataset from the US.

Health Outcomes to be Measured

We will consider different, clinically relevant time frames to define disease progression to ESRD (fast progressors). For example, a short-term prediction of progression within 2 years can be useful to prepare the patient for a smooth transition to dialysis. A longer-term ESRD progression within 5 years can be used for nephrologists to focus prevention efforts on higher-risk patients with the ambition to delay or even avoid dialysis. In an effort to keep them in higher kidney function stages. Progression to ESRD will be mostly determined via codes (see Appendix) but possibly complemented by inference based on consistently low eGFR values (based on at least two eGFR measurements one month apart).

The outcomes are given as follows:
Overall performance evaluation of the introduced algorithms to assess their ability to detect CKD fast progressors in times frames determined as relevant in clinical practice on a large, independent EHR dataset;
Disease stage specific performance evaluation;
Comparison with relevant, state-of-the-art reference algorithms in all evaluation settings;
Analysis of the feature importance for the identification of CKD fast progressors.

Collaborators

Carsten Danzer - Chief Investigator - Roche Diagnostics International Ltd
Christian Wohlfart - Corresponding Applicant - Roche Diagnostics International Ltd
Ashton Harper - Collaborator - Roche Diagnostics International Ltd
Christine Remy - Collaborator - Roche Diagnostics International Ltd
Giovanni d'Ario - Collaborator - Roche Diagnostics International Ltd
Martin Klammer - Collaborator - Roche Diagnostics International Ltd
Nicolas Sillitoe - Collaborator - Roche Diagnostics International Ltd

Former Collaborators

Jasmina Bogojeska - Collaborator - Roche Diagnostics International Ltd

Linkages

HES Admitted Patient Care