A comparison of cancer incidence and prevalence in CPRD GOLD and CPRD Aurum primary care databases

Study type
Protocol
Date of Approval
Study reference ID
20_196
Lay Summary

The Clinical Practice Research Datalink (CPRD) has two databases of primary care electronic health records (EHR). These are called CPRD GOLD and CPRD Aurum, and each contains data collected from a different brand of the software used by general practices to manage their patients’ computerised medical records.

Researchers may combine data from two or more databases, for example to study a larger number of patients with a very rare disease, or taking a new drug. Before doing so it is important to ensure that the databases are similar, and to understand the reasons for any differences between them.

In this study we will investigate whether recorded occurrence of common types of cancer, and of all cancers combined, are comparable in CPRD GOLD and CPRD Aurum, taking into account any differences in the patient populations such as age and regional distributions. We will also use linked hospitalisation and death registration data to find additional patients whose cancer was not recorded in the EHR database, and see how this affects the comparisons.

The results from this study will be used to plan further research investigating whether individuals who carry the gene for Huntington’s disease (which is very rare) have a lower risk of developing cancer.

Technical Summary

The Clinical Practice Research Datalink (CPRD) provides two primary care electronic health record (EHR) databases for observational research: CPRD GOLD uses data from the Vision GP software, and CPRD Aurum which uses data from the EMIS Web GP software. Using both databases in a single study can increase statistical power, increase generalisability due to a more representative population, or allow validation of findings in a second independent data source. However it is important to first verify that the databases are comparable, and understand the reasons for any differences.

We will use CPRD GOLD and CPRD Aurum and compare prevalence and incidence of cancer in all sites combined, and in the 10 most common sites for males and females. The main analysis will ascertain incident and prevalent cases using the primary care data only, and compare time trends between 1990-2019 inclusive, and patterns stratified by age, gender, region, and area-based deprivation.

In secondary analyses, comparisons will be repeated in a subset of ‘overlapping’ practices which intially contributed data to CPRD GOLD, and subsequently contributed data to the CPRD Aurum database. We will also use an interrupted time series (ITS) analysis to formally test whether recording patterns for incident cancers change after practices switch from Vision to EMIS Web GP software. If appropriate, we will use multi-level Poisson models to further assess the relative contributions of measured population characteristics and between database variation.

We will repeat the main analyses using linked HES Admitted Patient Care (APC) and ONS death registration data to ascertain additional cancer outcomes between 2000-2019.

The findings will inform a planned study of the association between Huntington’s disease and cancer risk, but will be of wider value to other researchers wishing to use conduct observational research involving cancer outcomes, and combining CPRD GOLD and CPRD Aurum databases to increase study power.

Health Outcomes to be Measured

Cancer incidence and prevalence for all sites combined (excluding non-melanoma skin cancer [NSMC]), and for 10 most common sites for females and males, (see Table 1).

Table 1: Ten most common cancer sites in females and males in England, 2017*
Females Males
Rank ICD-10 code and site ICD-10 code and site
1 C50 Malignant neoplasm of breast C61 Malignant neoplasm of prostate
2 C34 Malignant neoplasm of bronchus and lung C34 Malignant neoplasm of bronchus and lung
3 C18-C20 Malignant neoplasm of colon and rectum C18-C20 Malignant neoplasm of colon and rectum
4 C54 Malignant neoplasm of corpus uteri C43 Malignant melanoma of skin
5 C43 Malignant melanoma of skin C82-C85 Non-Hodgkin's lymphoma
6 C56 Malignant neoplasm of ovary C67 Malignant neoplasm of bladder
7 C82-C85 Non-Hodgkin's lymphoma C20 Malignant neoplasm of rectum
8 C25 Malignant neoplasm of pancreas C64 Malignant neoplasm of kidney, except renal pelvis
9 C64 Malignant neoplasm of kidney, except renal pelvis C15 Malignant neoplasm of oesophagus
10 C91-C95 Leukaemia C91-C95 Leukaemia
* Based on cancer registrations

Collaborators

Rachael Williams - Chief Investigator - CPRD
Daniel Dedman - Corresponding Applicant - CPRD
Ian Douglas - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Krishnan Bhaskaran - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Liam Smeeth - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Michael Rawlins - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Nancy Wexler - Collaborator - Columbia University
Stephen Evans - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation