Using linked data

Learning objectives 

By the end of this module, the reader will have learnt:

What is CPRD linked data?

CPRD works with General Practitioner (GP) practices across the UK, who contribute anonymised Electronic Healthcare Record (EHR) primary care data. CPRD has established a number of standard linkages between these primary care data and secondary care and other health and area-based datasets. 

Why use linked data?

These linkages enable CPRD to provide a richer picture of the patient care journey to support vital public health research, informing advances in patient safety and delivery of care. Reasons for using linked data include:

  • There are gaps in the primary care data which linked data can fill,
  • Other data sources may provide a clearer and/or fuller picture of health care interactions,
  • Some diagnoses/procedures are routinely or only provided in hospital.

Linked data can help to answer questions such as:

  • What was the cause of death?
  • Was the patient hospitalised?
  • Did the patient have surgery?
  • Has the patient seen a specialist?
  • When was the patient diagnosed with cancer?
  • Is this result associated with level of deprivation?

What linked data is available?

The full list of standard linked datasets including documentation describing the structure of the data, the coding systems used, and the coverage period linked to CPRD primary care data, are available at www.cprd.com/linked-data. Some highlights are given below.

Hospital Episode Statistics (HES) Admitted Patient Care (APC) data

Admissions (including day cases) to English NHS health care providers, including:

  • All diagnoses recorded per hospitalisation,
  • Admission and discharge dates,
  • Specialists seen,
  • Procedures undertaken,
  • Maternity data,
  • Critical care data.

Note, these data do not include drugs administered in hospital.

Office for National Statistics (ONS) death registration data

All deaths in England must be reported to the General Register Office. There should be a medical certificate showing the cause of death, which is signed by a doctor. Contains:

  • The underlying cause of death, as well as other contributing causes of death, recorded using ICD-9 (up to 2001) and ICD-10 codes.

Note, late registration for some deaths means that the proportion of deaths captured is lower for the last year of the coverage period, and is especially pronounced for the last 1-2 weeks of available death data.

Small Area Data

CPRD has linked GP practice postcodes and eligible patient residence postcodes for both CPRD GOLD and CPRD Aurum to some of the most commonly requested area level data. These include:

  • Index of Multiple Deprivation
  • Carstairs Index
  • Townsend Deprivation Index
  • Rural-Urban Classification
  • CCG flag

Details of all the linked data sources available are listed at www.cprd.com/linked-data.

How does the linkage work?

NHS Digital, the CPRD trusted third party for linkages, works to link all patients except for those who have opted out or dissented from providing data to CPRD for research or from disclosure of personal confidential data (such as the identifiers required for linkage) to NHS Digital whilst registered at any UK GP practice. Patient-level linkage is currently enabled exclusively for patients registered at practices in England.

GP practices provide only the de-identified medical data to CPRD, whilst patient identifier information e.g. NHS number, date of birth, sex, postcode is provided to NHS Digital. The dataset to be linked also does the same, separating the patient identifiers for NHS Digital to carry out the linkage, from the de-identified data for CPRD. NHS Digital then uses the patient identifiers to match a CPRD patient to the linked dataset patient. A schematic illustrating the linkage process and more detail is available at www.cprd.com/safeguarding-patient-data.

NHS Digital has responsibility for data and information from across the health and social care system in England, not the devolved nations. Linkage methods and access to data are different in Scotland, Wales and Northern Ireland, and linked data for patients in the devolved nations are not currently available via CPRD.

Who is eligible for linkage?

All patients registered to GP practices in England that have consented to take part in the linkage process and have not opted-out or dissented from the sharing of confidential patient information for planning and research.

  • Patients with a valid NHS-number are eligible to be linked to HES, ONS death, COVID-19 and NCRAS data. 
  • Patients with a valid postcode are eligible to be linked to the patient-level small area level data. 

Please note, a patient being eligible for linkage does mean not that the patient will have linked data. Amongst patients eligible for linkage for each source, lack of linked data may reflect there being no linked data for that patient, or that linkage was not successful.

How is linked data updated?

Each linked data release provides an incremental update in linked data coverage for patients already included in CPRD GOLD and/or CPRD Aurum, and a full update (i.e. all historic data) for patients at newly contributing practices and for new patients joining and already contributing practice. 

Updates to CPRD linked data are announced on the CPRD research bulletin, and on the CPRD website news at https://cprd.com/news and on the linked data webpage www.cprd.com/linked-data. Researchers who would like to be sent the CPRD research bulletin can be added to the mailing list by emailing enquiries@cprd.com.

Requesting NCRAS linked data

For studies that request National Cancer Registration and Analysis Service (NCRAS) data, there are additional steps required to access these datasets prior to submission of the research application – a completed NCRAS Data Selection Form must first be approved by CPRD. This process is detailed in the NCRAS documentation available at www.cprd.com/linked-data#Cancer%20data.

Tips for studies using linked data

Researchers should:

  • Contact CPRD Enquiries at enquiries@cprd.com before submitting a research application if they: 
    • Intend to use NCRAS or COVID-19 data linkages.
    • Have never used linked data before.
    • Are unsure whether a data source has the data they need (note, there is lots of information about each linked dataset in the documentation available at www.cprd.com/linked-data).
  • Make it clear that you understand that not all CPRD patients are eligible for linkage and the time periods for availability of linked datasets differ.
  • Describe why the linkage data is beneficial to the study and how it will be used – research applications need to include a statement about how access to linked data for the study will benefit patients.


Next module: How to access CPRD data

Page last reviewed