Anonymised primary care patient data can be individually linked to secondary care and other health and area-based datasets. This linkage enables CPRD to provide a fuller picture of the patient care record to support vital public health research, informing advances in patient safety and delivery of care. CPRD is expanding its healthcare data and research services to increase both the cover of primary care data and the number of datasets that are linked and made available on a routine basis to the research community.
Linked datasets currently available include:
- COVID-19 data
- Second Generation Surveillance System (SGSS) COVID-19 positive virology test data
- COVID-19 Hospitalisation in England Surveillance System (CHESS)
- Intensive Care National Audit and Research Centre (ICNARC) data on COVID-19 intensive care admissions
- Small area level data:
- Patient postcode linked deprivation measures
- Practice postcode linked deprivation measures
- Index of Multiple Deprivation (IMD)
- Townsend Deprivation Index
- Carstairs Index
- Rural-Urban Classification
- Data from NHS Digital:
- Hospital Episode Statistics (HES) Admitted Patient Care (HES APC) data
- HES Outpatient (HES OP) data
- HES Accident and Emergency (HES A&E) data
- HES Diagnostic Imaging Dataset (HES DID)
- Death registration data from the Office for National Statistics (ONS)
- Cancer data from NHS Digital National Disease Registration Service (NDRS) (formerly Public Health England (PHE)):
- Cancer Registration
- Systemic Anti-Cancer Therapy (SACT) Dataset
- National Radiotherapy Dataset (RTDS)
Publication: Padmanabhan S, Carty L, Cameron E, Ghosh RE, Williams R, Strongman H. Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications. Eur J Epidemiol, 2018.
Availability of linked data
Linkage of CPRD primary care data with other patient level datasets is available for English practices who have consented to participate in the linkage scheme. Each individual GP practice participating in CPRD's collection of their primary care data can choose to revoke their consent for data collection at any point.
CPRD respects all patient opt-outs. Patients who have registered an opt-out will not be extracted for CPRD research or for data linkage.
We are working to release quarterly updates of priority linkages to support COVID-19 research. These priority linkages comprise the NHS Digital (formerly Public Health England (PHE)) Second Generation Surveillance System (SGSS) COVID-19 virology test data, COVID-19 Hospitalisation in England Surveillance System (CHESS), Intensive Care National Audit and Research Centre (ICNARC) data on COVID-19 intensive care admissions, Hospital Episodes Statistics Admitted Patient Care, Office for National Statistics mortality data, and small area deprivation data.
The latest linked data comprise ONS deaths data (to 29/03/2021), HES APC (to 31/03/2021), SGSS and CHESS (to 23/02/2021), ICNARC data (to 17/03/2021), HES OP/DID (to 31/10/2020), HES A&E (to 31/03/2020), NCRAS cancer registrations/SACT/RTDS (to 31/12/2018) and small area data with 9,315,232 acceptable patients in the CPRD GOLD July 2021 build and 38,416,860 acceptable patients in the CPRD Aurum June 2021 build eligible for >/=1 linkage.
As standard, to ensure we honour patient opt-outs, we will supply the latest available linked data for each dataset. If you require data from a specific earlier linkage set, or are unsure about which source file should be used in your study, please contact us on firstname.lastname@example.org.
Access to linked data
Access to patient level data is dependent on approval of a study protocol via the Research Data Governance (RDG) process. All required linked data sources must be requested on the application form. Additionally, researchers who are first time users of a linked dataset must contact the CPRD Observational Research Team to discuss their requirements before submitting their application. Data are only provided by CPRD when part of a data extract is linked to CPRD primary care data.
CPRD-linked COVID-19 datasets comprise:
1. NHS Digital (formerly Public Health England (PHE)) Second Generation Surveillance System (SGSS) COVID-19 virology test data
2. PHE COVID-19 Hospitalisation in England Surveillance System (CHESS)
3. Intensive Care National Audit and Research Centre (ICNARC) data on COVID-19 intensive care admissions.
SGSS is the national laboratory reporting system used in England to capture routine laboratory data on infectious diseases and antimicrobial resistance. The SARS-CoV-2 testing started in UK laboratories on 24/02/2020, with the SGSS data reflecting testing (swab samples, PCR test method) offered to those in hospital and NHS key workers (i.e. Pillar 1). The CPRD-SGSS linked data currently contain positive tests results only.
Access to linked SGSS data is subject to prior approval. This dataset is not covered by existing licences, and data can only be released to organisations within the UK/EU/EEA.
The latest release of CPRD-SGSS data covers the period 01/03/2020 – 23/02/21.
Note, these SGSS data will not be further updated as COVID-19 test data now reliably flow into the GP primary care record.
Please click on the link below to download the documentation relating to CPRD-SGSS data.
- Download: SGSS documentation v1.3 (PDF, 172KB, 6 pages)
The former PHE established CHESS across all NHS Trusts in England on 15/03/2020 to collect epidemiological data on COVID-19 infection in persons requiring hospitalisation and ICU/HDU admission. Trends in hospital and critical care admission rates need to be interpreted in the context of testing recommendations, which changed over time.
Access to linked CHESS data is subject to prior approval. This dataset is not covered by existing licences, and data can only be released to organisations within the UK/EU/EEA.
The latest release of CPRD-CHESS data covers admissions to 23/02/2021.
Please click on the link below to download the documentation relating to CPRD-CHESS data.
Download: CHESS documentation v1.3 (PDF, 291KB, 11 pages)
Intensive Care National Audit and Research Centre (ICNARC) data on COVID-19 intensive care admissions
ICNARC is a national clinical audit covering all NHS adult, general intensive care and combined intensive care/high dependency units, and some additional specialist and non-NHS critical care units.
Data on patients critically ill with confirmed COVID-19 admitted to critical care units will be linked to the CPRD data.
The CPRD-ICNARC linked data comprise information on demographic variables (age, sex), admission/discharge, height/weight/BMI, clinical parameters (BP, hypertension, blood gas measurements, haemoglobin, platelet count, lactate, heart rate, respiratory rate etc), coma, mortality prediction and physiology scores).
The latest release of CPRD-ICNARC data covers admissions to 17/03/2021.
Please click on the link below to download the documentation relating to CPRD-ICNARC data.
Download: ICNARC documentation v1.0 (PDF, 230KB, 8 pages)
Classifications based on the population characteristics of small areas or neighbourhoods (and the individuals who live there) are available for linkage to CPRD primary care data. CPRD has linked GP practice postcodes and eligible patient residence postcodes for both CPRD GOLD and CPRD Aurum to some of the most commonly requested area level data. This includes several measures of area level deprivation and a rural-urban classification, and Clinical Commissioning Group (CCGs) pseudonym (practice level, England-only) . These measures can be used as a proxy for socio-demographic and socio-economic data which are generally poorly recorded in the primary care data given they do not directly relate to a patient's care.
For each measure the postcode of the practice or patient residence is mapped to lower layer Super Output Area (LSOA), SOA in Northern Ireland or datazone (DZ) in Scotland using a postcode lookup file.
Download: Small area level data based on patient postcode documentation v3.3 (PDF, 249KB, 14 pages)
Download: Small area level data based on practice postcode documentation v3.4 (PDF, 277KB, 15 pages)
Patient postcode linked measures are available for patients in English practices that have consented to participate in the linkage scheme. The latest available patient postcode of residence is mapped to an LSOA boundary. The LSOA of residence then allows linkage to the following LSOA-level deprivation measures;
- 2019 English Index of Multiple Deprivation (composite and individual domains)
- Townsend Deprivation Index: calculated using unadjusted 2011 census data
- Carstairs Index using 2011 census data
Data are provided as quintiles, deciles or twentiles of the deprivation score to prevent disclosure of patient location. In order to prevent the possibility of deductive disclosure of a patients’ area of residence, researchers will only be provided with one of the above linked datasets for any one study. Access is provided by CPRD subject to approval.
The general practice postcode linkages are available for all practices in CPRD GOLD and CPRD Aurum and use the general practice postcode which is linked via LSOA, SOA in Northern Ireland and datazone (DZ) in Scotland.
The general practice postcode linkage includes Clinical Commissioning Group (CCGs) pseudonym (England-only) and several well-known area-based measures of deprivation:
- 2019 English Index of Multiple Deprivation (composite and individual domains)
- 2020 Scottish Index of Multiple Deprivation (composite and individual domains)
- 2017 Northern Ireland Index of Multiple Deprivation (composite and individual domains)
- 2019 Welsh Index of Multiple Deprivation (composite and individual domains)
- Townsend Deprivation Index: calculated using unadjusted 2011 census data
- Carstairs Index: England, Wales and Scotland calculated using 2011 census data
As standard, the most recent national Indices of Deprivation are provided for each country. It is important to note that the IMD indices are not comparable between countries in the UK. Data is provided as quintiles or deciles of the deprivation score to prevent disclosure of patient location. In order to prevent the possibility of deductive disclosure of the location of a practice, researchers will only be provided with one practice level linkage for any one study. Access is provided by CPRD subject to approval.
It may be important to distinguish between rural and urban areas when investigating differences in social and economic characteristics of small areas. Populations can vary in their composition between urban and rural areas, as can access to services, employment and educational opportunities, and quality of life. The measures available for patient (England only) and practice postcode are:
- 2011 England and Wales Rural-Urban classification
- 2015 Northern Ireland Rural-Urban classification
- 2016 Scottish Rural-Urban classification
Access is provided by CPRD subject to approval.
For more information about data linkage and prices please contact CPRD Enquiries on email@example.com
NHS Digital has responsibility for standardising, collecting and publishing data and information from across the health and social care system in England.
CPRD linked data from NHS-Digital includes Hospital Episode Statistics (HES) - a database containing details of all admissions, Accident and Emergency attendances and outpatient appointments at NHS hospitals in England, and ONS mortality data.
HES Admitted Patient Care (HES APC) data contains details of all admissions to, or attendances at English NHS healthcare providers. It includes private patients treated in NHS hospitals, patients resident outside of England and care delivered by treatment centres (including those in the independent sector) funded by the NHS. All NHS healthcare providers in England, including acute hospital trusts, primary care trusts and mental health trusts provide data.
HES APC data includes the complete set of hospital episode information (admission and discharge dates, diagnoses (identifying primary diagnosis), specialists seen under and procedures undertaken) for each linked patient with a hospitalisation record. In addition, Augmented care data (intensive and/or high dependency levels of care) and Maternity data are available.
Diagnostic data recorded in HES are coded using the International Classification of Diseases version 10 (ICD10) coding frame; procedure information is coded using the UK Office of Population, Census and Surveys classification (OPCS) 4.6.
Requests for HES APC data access are subject to prior approval
The latest release of HES APC data covers the period April 1997 to March 2021.
Please click on the link below to download the documentation which provides an overview of the HES APC data linked to CPRD primary care patients.
Download: HES APC documentation v2.8 (PDF, 254KB, 12 pages)
Download: HES APC data dictionary v2.7 (PDF, 317KB, 12 pages)
More information about HES APC data can be found in the data resource profile below, and from a number of recent concordance and validation studies.
Publication: Herbert A, Wijlaars L, Zylbersztejn A, Cromwell D, Hardelid P. Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC). International Journal of Epidemiology, Volume 46, Issue 4, August 2017, Pages 1093–1093i.
Publication: Thorn JC, Turner EL, Hounsome L the CAP trial group, et al. Validating the use of Hospital Episode Statistics data and comparison of costing methodologies for economic evaluation: an end-of-life case study from the Cluster randomised triAl of PSA testing for Prostate cancer (CAP). BMJ Open 2016;6:e011063
Publication: Saine, ME et al. (2019). Concordance of hospitalizations between Clinical Practice Research Datalink and linked Hospital Episode Statistics among patients treated with oral antidiabetic therapies. Pharmacoepidemiol Drug Saf. issn: 1053-8569. doi: 10.1002/pds.4853
Publication: McDonald, L, CJ Sammon, et al. (2018). Under-recording of hospital bleeding events in UK primary care: a linked Clinical Practice Research Datalink and Hospital Episode Statistics study. Clin Epidemiol 10, pp. 1155– 1168. issn: 1179-1349 (Print) 1179-1349. doi: 10.2147/clep.s170304.
Publication: Williams, R et al. (2018). Cancer recording in patients with and without type 2 diabetes in the Clinical Practice Research Datalink primary care data and linked hospital admission data: a cohort study. BMJ Open 8.5, e020827. issn: 2044-6055. doi: 10.1136/bmjopen-2017-020827.
HES Outpatient (HES OP) data are a collection of individual records of outpatient appointments occurring in England only. The data includes information on the type of outpatient consultation appointment dates, the main specialty and treatment specialty under which the patient was treated, referral source, waiting times, clinical diagnosis and procedures performed. HES OP data can be used to support health resource utilisation studies, clarify clinical health care pathways and enable variations in the uptake of services to be evaluated, for example by gender and age.
Access to linked HES OP data is subject to prior approval.
The latest release of HES OP data covers the period April 2003 to October 2020.
Please click on the link below to download the documentation relating to HES Outpatient data.
Download: HES OP documentation v2.0 (PDF, 286KB, 13 pages)
Useful information can be found in the following validation study on the coverage of HES OP resource-use data in comparison to medical records from a cluster randomised trial:
Publication: Thorn JC, Turner E, Hounsome L, Walsh E , Donovan JL, Verne J, Neal DE , Hamdy FC, Martin RM, Noble SM. Validation of the Hospital Episode Statistics Outpatient Dataset in England. Pharmacoeconomics, 34 (2), 161-8, Feb 2016.
HES Accident and Emergency (HES A&E) data consists of individual records of patient care administered in the accident and emergency setting in England. These data are a subset of national A&E data collected by NHS England to monitor the national standard that 95% of patients attending A&E should wait no longer than 4 hours from arrival to admission, transfer or discharge. A&E data are submitted by A&E providers of all types in England. Data collected includes details about patients’ attendance, outcomes of attendance, waiting times, referral source, A&E diagnosis, A&E treatment (drugs prescribed not recorded), A&E investigations and Health Resource Group. HES A&E may be used to clarify the health care pathway, to quantity health resource use and costs in the emergency setting, and to assess variations in the uptake of emergency services over time.
Access to HES A&E data is subject to prior approval.
The latest release of HES A&E data covers the period April 2007 to March 2020.
Note: The Emergency Care Data Set (ECDS) is a new national dataset for urgent and emergency care and replaced the HES A&E dataset across England from 2019-20 financial year. ECDS will enable more detailed analysis and enhanced understanding of emergency services, and linkage to CPRD primary care data is in progress.
Please click on the link below to download the documentation relating to HES Accident & Emergency data.
Download: HES A&E documentation v1.8 (PDF, 274KB, 13 pages)
The Diagnostic Imaging Dataset (DID) is a collection of detailed information about diagnostic imaging tests, such as x-rays and MRI scans, taken from NHS providers' radiological information systems. The DID includes information on imaging tests carried out from 1 April 2012 on NHS patients in England. It does not include the images that are produced as a result of these tests. The DID captures information about referral source and patient type, details of the test (type of test and body site), plus items about waiting times for each diagnostic imaging event, from time of test request through to time of reporting. The DID enables analysis of demographic and geographic variation in access to different test types and different providers.
The DID is routinely linked to Hospital Episode Statistics (HES) through NHS Digital. This existing HES DID dataset has now been linked to CPRD primary care data enabling users to analyse patient care pathways. Access to HES DID data is subject to prior approval.
The latest release of HES DID data covers the period April 2012 to October 2020.
Please click on the link below to download the documentation relating to the HES Diagnostic Imaging Dataset.
Download: HES DID documentation v1.7 (PDF, 254KB, 10 pages)
Death Registration data contains data from the Office for National Statistics (ONS) and includes information on the official date and causes of death (using ICD codes).
Access to ONS Death Registration data is subject to prior approval.
The latest release of ONS Death Registration Data covers the period 2 January 1998 to 29 March 2021.
Please note that late registration for some deaths means that the proportion of deaths captured is lower for the last year of the coverage period, and this proportion is likely to differ by age at death and cause of death. This is especially pronounced for the last 1-2 weeks of available death data which shows an under count of the total number of deaths as these data do not capture those where the registration of a death has been delayed (eg deaths referred to coroners in England, Wales and Northern Ireland, which cannot be registered until investigations have been concluded, and can result in delays of months or years).
Please click on the link below to download the documentation relating to ONS death registration data.
Download: ONS death registration data documentation v2.6 (PDF, 225KB, 14 pages)
For more information please refer to the ONS User guide to mortality statistics, the ONS analysis exploring the impact of registration delays on mortality statistics and the associated dataset used for this report.
Further details can be found in three studies investigating the impact of the choice of data source in estimating mortality.
Publication: Gallagher, AM et al. (2019). The accuracy of date of death recording in the Clinical Practice Research Datalink GOLD database in England compared with the Office for National Statistics death registrations. Pharmacoepidemiol Drug Saf. issn: 1053-8569. doi: 10.1002/pds.4747.
Publication: Harshfield, A et al. (2018). Do GPs accurately record date of death? A UK observational analysis. BMJ Support Palliat Care. issn: 2045-435x. doi: 10.1136/bmjspcare-2018-001514.
Publication: Gallagher, AM. et al. (2016). The Impact of the Choice of Data Source in Record Linkage Studies Estimating Mortality in Venous Thromboembolism. PLoS One 11.2, e0148349. issn: 1932-6203. doi: 10.1371 / journal.pone.0148349.
Cancer data from NHS Digital National Disease Registration Service (NDRS) (formerly Public Health England (PHE))
Cancer data contain data provided by NHS Digital National Disease Registration Service (NDRS) (formerly Public Health England (PHE)) via the National Cancer Registration and Analysis Service (NCRAS). Linked NCRAS CPRD datasets include Cancer Registration data, the Systemic Anti-Cancer Treatment (SACT) Dataset and the National Radiotherapy Dataset (RTDS).
Access to cancer data is subject to prior approval.
Please be aware that it currently takes around 12-18 months from protocol approval to delivery of cancer data, and that timelines for accessing the NCRAS data are largely out of CPRD control.
The data contains a record for each registrable tumour diagnosed or treated in England, of which the NCRAS has been notified. Cancers are coded using the International Classification of Diseases for Oncology, revision 3, 2011. They are also back mapped to the tenth revision of the International Classification of Diseases version 10.
The latest release of NDRS cancer registration data covers the period January 1990 – December 2018.
Download: CPRD NCRAS documentation v10.1 update (PDF, 300KB, 15 pages)
Download: CPRD Cancer Registration dictionary v10.1 (PDF, 316KB, 26 pages)
More information about the cancer registration data can be found in the following data resource profile:
Publication: Henson KE, Elliss-Brookes L, Coupland VH, Payne E, Vernon S, Rous B, Rashbass J. Data Resource Profile: National Cancer Registration Dataset in England. International Journal of Epidemiology, dyz076.
Further details can be found in three studies comparing recording of cancer across data sources.
Publication: Strongman H, Williams R, Bhaskaran K. What are the implications of using individual and combined sources of routinely collected data to identify and characterise incident site-specific cancers? a concordance and validation study using linked English electronic health records data. BMJ Open 2020; 10:e037719. doi: 10.1136/bmjopen-2020-037719
Publication: Arhi, CS, A Bottle, et al. (2018). Comparison of cancer diagnosis recording between the Clinical Practice Research Datalink, Cancer Registry and Hospital Episodes Statistics. Cancer Epidemiol 57, pp. 148–157. issn: 1877-7821. doi: 10.1016/j.canep.2018.08.009.
Publication: Margulis, AV, J Fortuny, et al. (2018a). Validation of Cancer Cases Using Primary Care, Cancer Registry, and Hospitalization Data in the United Kingdom. Epidemiology 29.2, pp. 308–313. issn: 1044-3983. doi: 10.1097/ede.0000000000000786.
The SACT dataset covers chemotherapy treatment for all solid tumour and haematological malignancies, including those in clinical trials. Information is included about programme and regime of treatment, and the outcome for each treatment. In the latest linkage release (set 19) SACT data is available for patients with tumours recorded in the cancer registration data from January 2014 to December 2018. Data prior to January 2014 is also available but should be used with caution due to incomplete ascertainment during this period.
Download: CPRD SACT dictionary v5.1 (PDF, 218KB, 14 pages)
More information about the SACT data can be found in the following data resource profile:
Publication: Bright CJ, Lawton S, Benson S, Bomb M, Dodwell D, Henson KE, McPhail S, Miller L, Rashbass J, Turnbull A, Smittenaar R. Data Resource Profile: The Systemic Anti-Cancer Therapy (SACT) Dataset. International Journal of Epidemiology, dyz137.
The RTDS dataset contains records of radiotherapy services provided since April 2009, including teletherapy and brachytherapy. All radiotherapy delivered in England to patients in NHS facilities, or in private facilities where delivery was funded by the NHS, is included. Brachytherapy delivered for the treatment of non-malignant disease, radiotherapy delivered using unsealed sources, and non-therapeutic exposures delivered using radiotherapy machines (e.g. imaging) are not included. In the latest linkage release (set 19) RTDS data is available for patients with tumours recorded in the cancer registration data from April 2012 to December 2018.
Download: CPRD RTDS dictionary v4.1 (PDF, 186KB, 9 pages)
The source data are provided to organisations that hold CPRD multi-study licences to enable researchers to ascertain which patients are eligible for linkage to each dataset and to clarify the coverage periods for each data source. The linkage eligibility file only includes patients from practices that have consented to take part in the linkage process. The file contains flags to indicate whether the patient is eligible for each individual linked data source. Some patients will not be eligible for any of the linked data sources, whereas others may be eligible for some/all of them. These data are provided so that multi-study licence users can determine the appropriate population to include in their study. The linkage coverage file indicates the start and end of coverage for each individual linked data source.
Access to source data for CPRD GOLD and/or CPRD Aurum is available to nominated users only; for access, please contact us at firstname.lastname@example.org.
Download: Linkage source data documentation v1.20 (PDF, 200KB, 8 pages)
CPRD has developed a probabilistic mother-baby link algorithm, based on data recorded in the primary care medical record. This links likely mother-baby pairs within the CPRD GOLD database, based on family number plus maternity information from the mother’s primary care record, and the month of birth of newly registered babies.
Download: CPRD Mother-Baby Link Documentation v1.2 (PDF, 308KB, 6 pages)
CPRD Pregnancy Registers
The CPRD GOLD and CPRD Aurum Pregnancy Registers contain a list of all pregnancy episodes recorded in the corresponding primary care databases. The Pregnancy Registers are derived from the primary care data based on an algorithm developed for CPRD GOLD by CPRD and the London School of Hygiene and Tropical Medicine.
Each record within the Pregnancy Registers represents a unique pregnancy episode with a number of variables provided including details of the start and end of the pregnancy, trimester dates and the outcome of the pregnancy. There may be more than one episode per woman. In addition to this, live births in the CPRD GOLD Pregnancy Register are linked to the CPRD GOLD Mother-Baby Link so that researchers may access de-identified information on the resulting infants.
Download: CPRD GOLD Pregnancy Register documentation v1.1 (PDF, 175KB, 8 pages)
Download: CPRD Aurum Pregnancy Register documentation v1 (PDF, 120KB, 7 pages)
Publication: Minassian C, Williams R, Meeraus WH, Smeeth L, Campbell OMR, Thomas SL. Methods to generate and validate a Pregnancy Register in the UK Clinical Practice Research Datalink primary care database. Pharmacoepidemiol Drug Saf, Volume 28, Number 7, p.923-933 (2019)
Publication: Campbell J, Bhaskaran K, Thomas S, Williams R, McDonald HI, Minassian C. Investigating the optimal handling of uncertain pregnancy episodes in the CPRD GOLD Pregnancy Register: a methodological study using UK primary care data. BMJ Open 2022;12:e055773.