Guidance: Requesting linked data from CPRD

Version 1.6

Date: 20 April 2022

Access to linked data

Access to linked data is dependent on either an approved protocol or feasibility study application.

Following protocol approval:

  • For multi-study licence holders, all requests must be submitted through the electronic Research Application Portal (eRAP). For access to linked data that is not currently covered by an existing contract, an additional data access agreement will be required. Please contact the CPRD Contracts team (enquiries@cprd.com) for further information.
  • For single study datasets and studies including NCRAS data, linked data will be supplied alongside the primary care data through the CPRD dataset delivery service. The Observational Research (OR) team will be in touch separately. 
  • For feasibility studies, the submission must be made using a Linked Data Request Form.

Types of linked data request

There are two types of linked data request:

  1. Linked data required in order to finalise the study population
  2. Linked data required for a defined study population

For type [1] requests, applicants must complete the CPRD Request for type 1 linked data using eRAP and supply the code lists/definition of the events required in the data sources of interest, as outlined in the approved protocol. The data supplied at this stage will include only the patient identifier, code, and date. After finalising the study population, applicants will then need to make a type [2] request for all of the linked data approved in the protocol.

For type [2] requests, applicants must complete the CPRD Request for type 2 linked data using eRAP and upload the list of patients for the study population.

For requests to re-deliver linked data (i.e. for a protocol that has already had linked data released), the CPRD ‘Request for Data Update form’ must be submitted and approved before completing this request for linked data.

The study population should be restricted to those eligible for the linked data requested. 

Linked data request forms

The Data Minimisation workbook is available from CPRD (enquiries@cprd.com).
 

Submitting a request for linked data

Lists of codes/patients should be provided as tab-delimited text files (.txt) and the request should be zipped into a single file. For zipped files >20MB, please contact CPRD (enquiries@cprd.com) for further advice.

Requests for linked data must be submitted by either the Chief Investigator (CI) or the Corresponding Applicant (CA).

Linked data will be provided, by secure transfer, within 10 working days of receipt of a valid, approved request. If the application is incorrectly completed or the lists of codes/patients are not in the correct format, the request will not be processed until these issues are resolved, which may affect the timelines for data delivery. 

To ensure that requests are processed in an efficient and timely manner, please follow the guidance outlining the requirements, which differ by study population definition (Appendix 1), how to apply eligibility for linkage (Appendix 2) and how to prepare code lists (Appendix 3). It is the responsibility of the study team to undertake due diligence to ensure that:

  • the request is complete and correct
  • the delivered data is in line with the completed request 

Redelivery by CPRD may incur a charge. 

Contractual Acknowledgements at publication

Please note that when publishing research based on linked data, there are contractual requirements relating to acknowledgments, these are outlined in Appendix 4

Appendix 1: Linkage request requirements 

Study population definition 

CPRD Requirements 

What CPRD will provide 

The study population will be based on primary care data only, but data from one or more linked data sources are required for these patients/practices. 
(type 2 linkage request)

Type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

If the study requires practice level linked data, provide the list of practices included in the study.

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on coded events from linked data only. Primary care data may be used to apply additional inclusion and exclusion criteria.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3)

Only the relevant events of interest and limited data variables (patient identifier, code, and date) for the requested linked data sources, to enable finalisation of the study population.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on non-coded events from linked data e.g. hospital admission dates, dates of death, socioeconomic data.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the definition for the events of interest in the approved data sources.

Only the relevant events of interest and limited data variables (patient identifier, code / requested field, and date) for the requested linked data sources, to enable finalisation of the study population.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of <600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of ≥600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

 

Appendix 2: How to apply eligibility for linkage

1. Request the following files from CPRD (enquiries@cprd.com):

  • The list of patient and practice files (CPRD Denominator files) for the primary care database build that you plan to use for your study (e.g. Aurum June 2021).
  • The latest linkage eligibility files (GOLD/Aurum_enhanced_eligibility_[month]_[year].txt and linkage_coverage_[month]_[year].txt).

Supporting documentation for the linked data are available from https://www.cprd.com/linked-data.

Please note that for new research studies, CPRD will only provide the latest linked data available for each approved data source. Earlier versions of linked data may be provided for ongoing studies conditional on adequate justification. Please contact the CPRD (enquiries@cprd.com) to confirm the latest version of linked data available.

2. Create a source population for the primary care database build by applying patient acceptability criteria for research and any relevant time constraints (e.g. removing patients that died before the start of your study).

3. Combine the source population from step 2 with the list of patients in the linkage eligibility file (GOLD/Aurum_enhanced_eligibility_[month]_[year].txt), excluding those patients who do not appear in both files. 

4. For studies limited to those who are eligible for linkage: Refine the list of patients from step 3 to those who are eligible for linkage to the data source/s approved for your study. For example, to apply linkage eligibility for Hospital Episode Statistics (HES) Admitted Patient Care data and Office for National Statistics death registration data, you should retain those patients where variables hes_apc_e and ons_death_e are both equal to 1. These patients are eligible for linkage to both data sources and can be considered as your source population.

5. Apply any further criteria based on events in primary care then save your list of patients including the relevant linkage flags (patid, hes_apc_e, ons_death_e) as a tab delimited text file and email this together with your completed linkage request form to CPRD (enquiries@cprd.com). Please use the following naming convention: ‘protocol number_organisation name_patientlist.txt’ e.g. 21_100001_UniversityA_patientlist.txt.

Please note that to ensure provision of the latest available data per data source, and to honour patient opt-outs, the latest eligibility status per patient, for each requested linked data source, will be applied during the processing of a request for linked data. If an earlier source file was used to finalise the list of patients, this earlier eligibility information reflects indicative eligibility only. The linkage eligibility file reflecting patient eligibility status at the time the linkage was undertaken will be provided with the delivery of linked data, this should be used to finalise the denominator populations and associated person-time as appropriate.

Appendix 3: How to prepare code lists 

Code lists should be provided to CPRD as tab delimited text files. Each code list type should be provided in a separate file and each code should appear on a new line. Please see the table below for the coding frames and coding format found in CPRD linked data sources. Please ensure that all code lists are provided in the coding format shown below to avoid delays. All code lists should be submitted together with the completed CPRD Linkage Request form to enquiries@cprd.com
 

CPRD Linked Data Source  Coding Frame Code Format Code Example 
ONS Death Registration Data ICD-9 / ICD-10 NNN
NNN.N
XNNN.N 
410
410.1
E953.0 
HES Admitted Patient Care
ONS Death Registration data
ICD-10  XNN
XNN.N 
G00
G00.1
HES Outpatient data
HES Accident & Emergency 
ICD-10  XNN
XNNN 
G00
G001
HES Admitted Patient Care
HES Outpatient data
OPCS  XNN
XNNN 
Q07
Q071
HES Accident & Emergency A&E diagnosis/treatment NN
NNN
01
201
HES Accident & Emergency  A&E investigations NN 02
HES Diagnostic Imaging Dataset  Imaging Code - NICIP XXXX
XNXXX
XXXXX
XXXXXX
CART
C4DAC
CAAAG
CCHESB
HES Diagnostic Imaging Dataset  Imaging Code -SNOMED-CT NN* 10077008
1051311000000104

 

Appendix 4: Contractual Acknowledgements at publication

The following statements below should be included in publications arising from the use of CPRD GOLD, CPRD Aurum and/or linked data.

Any Publication arising from the use of the following data sources should include the accompanying statement:

  • CPRD primary care data: “This study is based in part on data from the Clinical Practice Research Datalink obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. The data is provided by patients and collected by the NHS as part of their care and support. The interpretation and conclusions contained in this study are those of the author/s alone”. The Customer will ensure that the description of the CPRD Database in any such Publication is accurate and current, and agrees to request publication of a correction to any published description which CPRD deems to be inaccurate if so, requested by CPRD;
  • Office for National Statistics (ONS) data: Acknowledge the ONS as the provider of the data and include the statement “The interpretation and conclusions contained in this study are those of the author/s alone”;
  • Hospital Episode Statistics (HES), ONS and/or NCRAS data: “Copyright © (year), re-used with the permission of The Health & Social Care Information Centre. All rights reserved”. Users should ensure that the description of the HES/ONS/NCRAS data in any such publication is accurate and current, and agree to request publication of a correction to any published description which CPRD or the linked data owner deems to be inaccurate, if so requested by CPRD or the linked data owner;
  • Office of Population Censuses and Surveys (OPCS) codes: "The OPCS Classification of Interventions and Procedures, codes, terms and text is Crown copyright (2016) published by Health and Social Care Information Centre, also known as NHS Digital and licenced under the Open Government Licence available at http://www.nationalarchives.gov.uk/doc/open-government-licence/open-government-licence.htm.

 

 

 

Page last reviewed