By the end of this module, the reader will have learnt:
- What is CPRD linked data?
- Why use linked data?
- What linked data is available?
- How does the linkage work?
- Who is eligible for linkage?
- How is linked data updated?
- Requesting NCRAS linked data
- Tips for studies using linked data
CPRD works with General Practitioner (GP) practices across the UK, who contribute anonymised Electronic Healthcare Record (EHR) primary care data. CPRD has established a number of standard linkages between these primary care data and secondary care and other health and area-based datasets.
These linkages enable CPRD to provide a richer picture of the patient care journey to support vital public health research, informing advances in patient safety and delivery of care. Reasons for using linked data include:
- There are gaps in the primary care data which linked data can fill,
- Other data sources may provide a clearer and/or fuller picture of health care interactions,
- Some diagnoses/procedures are routinely or only provided in hospital.
Linked data can help to answer questions such as:
- What was the cause of death?
- Was the patient hospitalised?
- Did the patient have surgery?
- Has the patient seen a specialist?
- When was the patient diagnosed with cancer?
- Is this result associated with level of deprivation?
The full list of standard linked datasets including documentation describing the structure of the data, the coding systems used, and the coverage period linked to CPRD primary care data, are available at www.cprd.com/linked-data. Some highlights are given below.
Hospital Episode Statistics (HES) Admitted Patient Care (APC) data
Admissions (including day cases) to English NHS health care providers, including:
- All diagnoses recorded per hospitalisation,
- Admission and discharge dates,
- Specialists seen,
- Procedures undertaken,
- Maternity data,
- Critical care data.
Note, these data do not include drugs administered in hospital.
Office for National Statistics (ONS) death registration data
All deaths in England must be reported to the General Register Office. There should be a medical certificate showing the cause of death, which is signed by a doctor. Contains:
- The underlying cause of death, as well as other contributing causes of death, recorded using ICD-9 (up to 2001) and ICD-10 codes.
Note, late registration for some deaths means that the proportion of deaths captured is lower for the last year of the coverage period, and is especially pronounced for the last 1-2 weeks of available death data.
Small Area Data
CPRD has linked GP practice postcodes and eligible patient residence postcodes for both CPRD GOLD and CPRD Aurum to some of the most commonly requested area level data. These include:
- Index of Multiple Deprivation
- Carstairs Index
- Townsend Deprivation Index
- Rural-Urban Classification
- CCG flag
Details of all the linked data sources available are listed at www.cprd.com/linked-data.
NHS England, the CPRD Trusted Third Party for linkages, works to link all patients except for those who have opted out or dissented from providing data to CPRD for research or from disclosure of personal confidential data (such as the identifiers required for linkage) to NHS England whilst registered at any UK GP practice. Patient-level linkage is currently enabled exclusively for patients registered at practices in England.
GP practices provide only the de-identified medical data to CPRD, whilst patient identifier information e.g. NHS number, date of birth, sex, postcode is provided to NHS England. The dataset to be linked also does the same, separating the patient identifiers for NHS England to carry out the linkage, from the de-identified data for CPRD. NHS England then uses the patient identifiers to match a CPRD patient to the linked dataset patient. A schematic illustrating the linkage process and more detail is available at www.cprd.com/safeguarding-patient-data.
NHS England has responsibility for data and information from across the health and social care system in England, not the devolved nations. Linkage methods and access to data are different in Scotland, Wales and Northern Ireland, and linked data for patients in the devolved nations are not currently available via CPRD.
All patients registered to GP practices in England that have consented to take part in the linkage process and have not opted-out or dissented from the sharing of confidential patient information for planning and research.
- Patients with a valid NHS-number are eligible to be linked to HES, ONS death, COVID-19 and NCRAS data.
- Patients with a valid postcode are eligible to be linked to the patient-level small area level data.
Please note, a patient being eligible for linkage does mean not that the patient will have linked data. Amongst patients eligible for linkage for each source, lack of linked data may reflect there being no linked data for that patient, or that linkage was not successful.
Each linked data release provides an incremental update in linked data coverage for patients already included in CPRD GOLD and/or CPRD Aurum, and a full update (i.e. all historic data) for patients at newly contributing practices and for new patients joining and already contributing practice.
Updates to CPRD linked data are announced on the CPRD research bulletin, and on the CPRD website news at https://cprd.com/news and on the linked data webpage www.cprd.com/linked-data. Researchers who would like to be sent the CPRD research bulletin can be added to the mailing list by emailing firstname.lastname@example.org.
For studies that request National Cancer Registration and Analysis Service (NCRAS) data, there are additional steps required to access these datasets prior to submission of the research application – a completed NCRAS Data Selection Form must first be approved by CPRD. This process is detailed in the NCRAS documentation available at www.cprd.com/linked-data#Cancer%20data.
- Contact CPRD Enquiries at email@example.com before submitting a research application if they:
- Intend to use NCRAS or COVID-19 data linkages.
- Have never used linked data before.
- Are unsure whether a data source has the data they need (note, there is lots of information about each linked dataset in the documentation available at www.cprd.com/linked-data).
- Make it clear that you understand that not all CPRD patients are eligible for linkage and the time periods for availability of linked datasets differ.
- Describe why the linkage data is beneficial to the study and how it will be used – research applications need to include a statement about how access to linked data for the study will benefit patients.
Next module: How to access CPRD data