Guidance: Requesting linked data from CPRD

Version 1.4

Date: 25 November 2021

Access to linked data

Access to linked data is dependent on either an approved protocol or feasibility study application.

Following protocol approval:

  • For multi-study licence holders, all requests must be submitted through the linkage request service, including feasibility studies requiring linked data. For access to linked data that is not currently covered by an existing contract, an additional data access agreement will be required. Please contact the CPRD Contracts team (enquiries@cprd.com) for further information.
  • For single study datasets, linked data will be supplied alongside the primary care data through the CPRD dataset delivery service. The Observational Research (OR) team will be in touch separately. 

Types of linked data request

There are two types of linked data request:

  1. Linked data required in order to finalise the study population
  2. Linked data required for a defined study population

For type [1] requests, applicants must complete the CPRD ‘Linkage Request form’ and supply the code lists/definition of the events required in the data sources of interest, as outlined in the approved protocol. The data supplied at this stage will include only the patient identifier, code, and date. After finalising the study population, applicants will then need to make a type [2] request for all of the linked data approved in the protocol.

For type [2] requests, applicants must complete the CPRD ‘Linkage Request form’ and supply the list of patients for the study population. Applicants must also ensure that:

  • only the linked data sources approved in the protocol are requested, and all of them have been included, 
  • the database build and linkage set are listed,
  • the stratification for any small area data is included,
  • deduplication across CPRD GOLD and CPRD Aurum has been considered,
  • the number of patients in the request is consistent with the approved protocol,
  • if the study population is ≥600k, there is a plan for further data minimisation by completing the CPRD ‘Data Minimisation workbook’,
  • if this is a further request for a protocol that has already had linked data released, the CPRD ‘Request for Data Update form’ has been submitted and approved before completing this request for linked data.

The study population should be restricted to those eligible for the linked data requested. 

Submitting a request for linked data

Download: 

(Word, 360KB, 2 pages)
 

Download:

(Word, 860KB, 1 page)
 

The Data Minimisation workbook is available from CPRD (enquiries@cprd.com).

Lists of codes/patients should be provided as tab-delimited text files (.txt) and the request should be zipped into a single file. For zipped files >20MB, please contact CPRD (enquiries@cprd.com) for further advice.

Requests for linked data must be submitted by either the Chief Investigator (CI) or a collaborator named on the approved protocol, copying the CI, by return to CPRD (enquiries@cprd.com).

All requests for linked data will be acknowledged by CPRD within 2 working days of receipt.

Linked data will be provided, by secure transfer, within 10 working days of receipt of a valid, approved request. If the application is incorrectly completed or the lists of codes/patients are not in the correct format, the request will not be processed until these issues are resolved, which may affect the timelines for data delivery. 

To ensure that requests are processed in an efficient and timely manner, please follow the guidance outlining the requirements, which differ by study population definition (Appendix 1), how to apply eligibility for linkage (Appendix 2) and how to prepare code lists (Appendix 3). It is the responsibility of the study team to undertake due diligence to ensure that:

  • the request is complete and correct
  • the delivered data is in line with the linkage request form 

Redelivery by CPRD may incur a charge. 

Contractual Acknowledgements at publication

Please note that when publishing research based on linked data, there are contractual requirements relating to acknowledgments, these are outlined in Appendix 4

Appendix 1: Linkage request requirements

Study population definition
CPRD Requirements 
What CPRD will provide

The study population will be based on primary care data only, but data from one or more linked data sources are required for these patients/practices. 
(type 2 linkage request)

Type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

If the study requires practice level linked data, provide the list of practices included in the study.

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

 

The study population will be based on coded events from linked data only. Primary care data may be used to apply additional inclusion and exclusion criteria.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3)

Only the relevant events of interest and limited data variables (patient identifier, code, and date) for the requested linked data sources, to enable finalisation of the study population.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on coded events from both primary care and linked data sources.
(linked data will be provided in two stages)

 

Stage 1: type 1 linkage request:

Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3)

Only the relevant events of interest and limited data variables (patient identifier, code, and date) for the requested linked data sources, to enable finalisation of the study population alongside the CPRD primary care data.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on non-coded events from linked data e.g. hospital admission dates, dates of death, socioeconomic data.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the definition for the events of interest in the approved data sources.

Only the relevant events of interest and limited data variables (patient identifier, code / requested field, and date) for the requested linked data sources, to enable finalisation of the study population.
 

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.


Appendix 2: How to apply eligibility for linkage

1. Request the following files from CPRD (enquiries@cprd.com):

  • The list of patient and practice files (CPRD Denominator files) for the primary care database build that you plan to use for your study (e.g. Aurum June 2021).
  • The linkage eligibility files (linkage_eligibility.txt and linkage_coverage.txt) and supporting documentation for the linkage set you plan to use for your study (e.g. linkage set 21).

Please note that for new research studies, CPRD will only provide linked data from the latest linkage set available. Earlier versions of linked data may be provided for ongoing studies conditional on adequate justification. Please contact the CPRD (enquiries@cprd.com) to confirm the latest version of linked data available.

2. Create a source population for the primary care database build by applying patient acceptability criteria for research and any relevant time constraints (e.g. removing patients that died before the start of your study).

3. Combine the source population from step 2 with the list of patients in the linkage eligibility file (linkage_eligibility.txt), excluding those patients who do not appear in both files.

4. For studies limited to those who are eligible for linkage: Refine the list of patients from step 3 to those who are eligible for linkage to the data source/s approved for your study. For example, to apply linkage eligibility for Hospital Episode Statistics (HES) Admitted Patient Care data and Office for National Statistics death registration data, you should retain those patients where variables hes_e AND death_e are both equal to 1. These patients are eligible for linkage to both data sources and can be considered as your source population.

5. Apply any further criteria based on events in primary care then save your list of patients including the relevant linkage flags (patid, hes_e, death_e) as a tab delimited text file and email this together with your completed linkage request form to CPRD (enquiries@cprd.com). Please use the following naming convention: ‘protocol number_organisation name_patientlist.txt’ e.g. 21_100001_UniversityA_patientlist.txt.

Appendix 3: How to prepare code lists 

Code lists should be provided to CPRD as tab delimited text files. Each code list type should be provided in a separate file and each code should appear on a new line. Please see the table below for the coding frames and coding format found in CPRD linked data sources. Please ensure that all code lists are provided in the coding format shown below to avoid delays. All code lists should be submitted together with the completed CPRD Linkage Request form to enquiries@cprd.com

CPRD Linked Data Source
Coding Frame 
Code Format
Code Example

ONS Death Registration Data

ICD-9 / ICD-10

 

NNN
NNN.N
XNNN.N 

410
410.1
E953.0 

HES Admitted Patient Care
ONS Death Registration data

ICD-10  XNN
XNN.N 

G00
G00.1

HES Outpatient data
HES Accident & Emergency 

ICD-10  XNN
XNNN 
G00
G001

HES Admitted Patient Care
HES Outpatient data

OPCS  XNN
XNNN 
Q07
Q071
HES Accident & Emergency A&E diagnosis/treatment NN
NNN
01
201
HES Accident & Emergency  A&E investigations NN 02
HES Diagnostic Imaging Dataset  Imaging Code - NICIP XXXX
XNXXX
XXXXX
XXXXXX
CART
C4DAC
CAAAG
CCHESB
HES Diagnostic Imaging Dataset Imaging Code -SNOMED-CT NN* 10077008
1051311000000104

 

Appendix 4: Contractual Acknowledgements at publication

The following statements below should be included in publications arising from the use of CPRD GOLD, CPRD Aurum and/or linked data.

Any Publication arising from the use of the following data sources should include the accompanying statement:

  • CPRD primary care data: “This study is based in part on data from the Clinical Practice Research Datalink obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. The data is provided by patients and collected by the NHS as part of their care and support. The interpretation and conclusions contained in this study are those of the author/s alone”. The Customer will ensure that the description of the CPRD Database in any such Publication is accurate and current, and agrees to request publication of a correction to any published description which CPRD deems to be inaccurate if so, requested by CPRD;
  • Office for National Statistics (ONS) data: Acknowledge the ONS as the provider of the data and include the statement “The interpretation and conclusions contained in this study are those of the author/s alone”;
  • Hospital Episode Statistics (HES) and/or ONS data: “Copyright © (year), re-used with the permission of The Health & Social Care Information Centre. All rights reserved”. Users should ensure that the description of the HES data/ONS data in any such publication is accurate and current, and agree to request publication of a correction to any published description which CPRD or the linked data owner deems to be inaccurate, if so requested by CPRD or the linked data owner;
  • Public Health England (PHE) data: “Public Health England (year): [Title]. [Version]. [Publisher]. [Resource Type] e.g. e.g. Public Health England (2015): National Cancer Registration Data. (CAS Snapshot 15.01), Public Health England (dataset) 
  • Office of Population Censuses and Surveys (OPCS) codes: "The OPCS Classification of Interventions and Procedures, codes, terms and text is Crown copyright (2016) published by Health and Social Care Information Centre, also known as NHS Digital and licenced under the Open Government Licence available at http://www.nationalarchives.gov.uk/doc/open-government-licence/open-government-licence.htm.
 
[Page last reviewed 16 December 2021]