Producing epidemiologic estimates by cancer type in England: A cohort study using linked primary care to secondary data sources

Study type
Protocol
Date of Approval
Study reference ID
20_000070
Lay Summary

Electronic health records such as the ones provided by Clinical Practice Research Datalink (CPRD) in the UK are a highly utilised resource for research on diseases of large populations (epidemiology). However, there are concerns regarding the completeness of case ascertainment and the accuracy of recording of cancer diagnoses by CPRD in England (1). This could potentially have an impact on the number of patients diagnosed within these databases and hence, estimates of measures (such as N of patients with new disease over total population-incidence) obtained using CPRD (2). Since the introduction of international standards for data collection, it has been suggested that disease-specific Registry data, such as the Cancer Registry (CR) as well as the use of hospital data should ideally be used as the gold standard to identify cancer patients (3).
This study aims to (1) cross-validate the cancer diagnoses between primary care data (CPRD- GOLD/ Aurum) and secondary administrative data sources [the Cancer Registries (CR) and Hospital Episode Statistics(HES)], (2) to measure and compare incidence/prevalence and mortality by cancer type between the different UK data sources, (3) to cross-validate the cancer-related deaths between primary care data (CPRD GOLD and Aurum) and the Office for National Statistics (ONS) Death Registration Data in England, (4) to estimate survival for people diagnosed between 2011-2018 and finally (5) to produce multiplicative factor tables based on calculations of epidemiological parameters derived by linked datasets and explore the feasibility of using these as adjustments to future CPRD data ONLY estimates for incidence and prevalence.

Technical Summary

Patient demographics and route of diagnosis impact the accuracy of cancer diagnosis in CPRD. To circumvent potential systematic deficiencies and biases in primary care diagnoses records we will use a combination of Cancer Registries and HES as the “gold standard for diagnoses. We will link primary care data (CPRD-GOLD/Aurum) with secondary administrative data sources and Registries (HES/ONS/CR) to estimate potential differences and agreement in cancer cases identification between the data sources. We will then assess diagnosis accuracy markers i.e sensitivity, specificity, positive/negative predicted value), as well as incidence and prevalence of cancers in each one of the sources. Based on these estimations we will then explore the feasibility of calculation of a single correction factors for the different measures of morbidity and diagnostic accuracy and these calculations will be produced for each one of the cancer indications and stratified by gender and age group. These corrected results could potentially be implemented in future CPRD research projects. Finally, survival will be estimated separately by each indication.

Health Outcomes to be Measured

• Agreement, sensitivity, specificity, PPV, NPV of cancer diagnoses between CPRD Aurum and combination of HES, and CR,
• Agreement, sensitivity, specificity, PPV, NPV of cancer diagnoses between CPRD GOLD and combination of HES, and CR,
• Agreement, sensitivity, specificity, PPV, NPV of deaths between CPRD Aurum and ONS death registry
• Agreement, sensitivity, specificity, PPV, NPV of deaths between CPRD GOLD and ONS death registry,
• Incidence,prevalence and mortality rates by cancer type.
• A multiplicative adjustment factor to correct potential differences in morbidity parameters estimations by cancer type, gender and age.
• Survival probabilities estimates

The types of cancer for which the epidemiological measures will be estimated are:: multiple myeloma, acute myeloid leukemia, acute lymphoblastic leukemia, lung cancer, melanoma, colorectal cancer, prostate cancer, pancreatic, neuroendocrine, glioblastoma multiforme, gastric cancer, breast cancer, non-Hodgkin lymphoma, head and neck, kidney, bladder, uterus, ovarian, esophagus, and thyroid. A small number of these types has already been explored in the literature so their inclusion here aims to compare these estimates with existing published findings. Additional types will be explored to chosen based on their high frequency in the population, clinical relevancy and further investigation of currently under-explored types.

Collaborators

George Kafatos - Chief Investigator - Amgen Ltd
Olga Archangelidi - Corresponding Applicant - Amgen Ltd
David Neasham - Collaborator - Amgen Ltd
Joe Maskell - Collaborator - Amgen Ltd

Linkages

HES Admitted Patient Care;NCRAS Cancer Registration Data;No additional NCRAS data required;ONS Death Registration Data