Validation of a standardised algorithm to estimate the prevalence and incidence of selected conditions in the Clinical Practice Research Datalink

Study type
Protocol
Date of Approval
Study reference ID
22_002001
Lay Summary

In order to get a better understanding of the impact that diseases can have within a population a key requirement is to understand how many people have the disease at any one time (prevalence) and how many people are newly diagnosed with the disease in a defined time-period (incidence). Both measure of prevalence and incidence are essential to allow the National Health Service and other organisations to plan service provision. It is therefore important that these measures should be accurate and trustworthy. In this study we wish to compare estimates from previous studies using routine United Kingdom electronic healthcare databases such as CPRD with estimates derived from using our internal algorithm. A review of published studies using CPRD that estimate prevalence and incidence will be undertaken. The relevant diseases will then be selected and the estimates compared.

Technical Summary

An internal algorithm has been developed to define a disease and generate epidemiology reports consisting of incidence and prevalence outputs. We wish to validate the estimates of prevalence and incidence algorithms against published data and to report factors that should be considered when estimating these metrics. A systematic selection of scientific papers recorded on PubMed reporting prevalence and incidence measures from the Clinical/General Practice Research Datalink (CPRD/GPRD), The Health Improvement Network (THIN) or Q database from 1st January 2016 onwards produced 76 potential papers. All acceptable papers will be included in the study. From these studies, code lists, study periods and other constraints will be replicated using CPRD GOLD and Aurum to calculate each disease’s incidence and prevalence based on our algorithm and assumptions. The start of CPRD follow-up will be defined as the latter of the patient’s registration date and, practice up-to-standard date (GOLD); the end of CPRD data follow-up will be defined as the earliest of the patient’s transfer-out date, date of death, and the last data-collection date for their practice. The presentation date will be defined as that of the patient’s first ever record with a code indicative of the disease. The prevalence of the condition will be defined as the proportion of a population who have the condition in a given year. Point prevalence will be calculated for chronic conditions, whereas period prevalence will be calculated acute conditions. The incidence is defined as the proportion of a population who develop said condition within a particular time period. The results will be compared against each publication and their concordance evaluated. Plots will visually check values produced by our algorithm and those collected from studies. This will benefit public health by helping to ensure greater accuracy of estimates of incidence and prevalence using routine data sources.

Health Outcomes to be Measured

Prevalence, incidence and concordance of prevalence and incidence estimates

Collaborators

Craig Currie - Chief Investigator - Pharmatelligence Limited t/a Human Data Sciences
Benjamin Heywood - Corresponding Applicant - Pharmatelligence Limited t/a Human Data Sciences
Bethan Jones - Collaborator - Pharmatelligence Limited t/a Human Data Sciences
Christopher Morgan - Collaborator - Pharmatelligence Limited t/a Human Data Sciences

Former Collaborators

Craig Currie - Collaborator - Pharmatelligence Limited t/a Human Data Sciences