Reproducibility of estimates of prevalence of rare blood cancers from DARWIN EU in the Clinical Practice Research Datalink from primary care sources and assessing the impact of including hospital derived data

Study type
Protocol
Date of Approval
Study reference ID
23_003134
Lay Summary

In order to get a better understanding of the impact that diseases can have within a population a key requirement is to understand how many people have the disease at any one time (prevalence). Measures of prevalence are essential to allow the National Health Service and other organisations to plan service provision. It is therefore important that these measures should be accurate and well estimated. In this study we wish to compare estimates from the Data Analysis and Real World Interrogation Network (DARWIN EU) on six rare blood cancers with estimates derived from using our internal algorithm. In comparison to DARWIN, we aim to use a larger data source which is linked to hospital data in order to obtain more accurate estimates of each rare blood cancer. We will reproduce the study by DARWIN as close as possible by replicating code lists, study periods and definitions of disease periods. Accurate prevalence estimates are vital in assessing disease burden and are essential for informed public health decision-making.

Technical Summary

We have developed an internal algorithm to define a disease and generate epidemiology reports, including estimates of prevalence. In order to validate these estimates, we aim to investigate a study conducted by DARWIN EU which analyses the prevalence of six rare blood cancers in Europe. The cancers of interest are Acute lymphocytic leukaemia (ALL), Acute myeloid leukaemia (AML), Chronic lymphocytic leukaemia (CLL), Diffuse Large B-Cell Lymphoma (DLBCL), Follicular lymphoma (FL) and Multiple myeloma (MM). The Clinical Practice Research Datalink (CPRD) GOLD database was utilised for their lifelong, 5-year and 2-year prevalence estimates. In order to reproduce these estimates as consistently as possible, we will use the code lists provided for each cancer, replicate the study periods and other constraints described in the study. However, we use a combined data source of CPRD GOLD and Aurum to replicate the estimates in primary care. We will also explore the use of linked Hospital Episode Statistics (HES) and investigate prevalence using a larger combined CPRD and linked HES data source, allowing for a wider scope and better estimation of prevalence for each rare blood cancer.

For 5-year and 2-year partial prevalence, patients will be defined as a prevalent case for the corresponding duration after first diagnosis. We will define the prevalence of the condition as the proportion of the population who have the condition in a given year. Point prevalence will be calculated for each of the 3 disease periods. We will then compare the values produced by our algorithm to those reported by DARWIN using the Z-test.

By conducting this validation study, we aim to improve the accuracy of estimates of prevalence using routine data sources. This will ultimately benefit public health by ensuring more reliable and accurate information for epidemiological analysis and decision-making.

Health Outcomes to be Measured

Prevalence of six rare blood cancers: Acute lymphocytic leukaemia (ALL), Acute myeloid leukaemia (AML), Chronic lymphocytic leukaemia (CLL), Diffuse Large B-Cell Lymphoma (DLBCL), Follicular lymphoma (FL) and Multiple myeloma (MM).

Collaborators

Craig Currie - Chief Investigator - Pharmatelligence Limited t/a Human Data Sciences
Benjamin Heywood - Corresponding Applicant - Pharmatelligence Limited t/a Human Data Sciences
Christopher Morgan - Collaborator - Pharmatelligence Limited t/a Human Data Sciences
Elgan Mathias - Collaborator - Pharmatelligence Limited t/a Human Data Sciences
Leah Fisher - Collaborator - Pharmatelligence Limited t/a Human Data Sciences

Linkages

HES Admitted Patient Care