Guidance: Requesting linked data from CPRD

Version 1.10

Date: 13 December 2023

Access to linked data

Access to linked data is dependent on either an approved protocol or feasibility study application.

Following protocol approval:

  • For multi-study licence holders, all requests must be submitted through the electronic Research Application Portal (eRAP). For access to linked data that is not currently covered by an existing contract, an additional data access agreement will be required. Please contact the CPRD Contracts team (enquiries@cprd.com) for further information.
  • For single study datasets and studies including NCRAS SACT or RTDS data, linked data will be supplied alongside the primary care data through the CPRD dataset delivery service. The Observational Research (OR) team will be in touch separately. 
  • For feasibility studies, requests must be submitted through the electronic Research Application Portal (eRAP).

Types of linked data request

There are two types of linked data requests:

  1. Linked data required in order to finalise the study population
  2. Linked data required for a defined study population

For type [1] requests, applicants must complete the CPRD Request for type 1 linked data using eRAP and supply the code lists/definition of the events required in the data sources of interest, as outlined in the approved protocol. The data supplied at this stage will include only the patient pseudonym (i.e. patid), code, and date. After finalising the study population, applicants will then need to make a type [2] request for all  the linked data approved in the protocol.

For type [2] requests, applicants must complete the CPRD Request for type 2 linked data using eRAP and upload the list of patients for the study population.

If this is a further request to re-deliver linked data (i.e. for a protocol that has already had linked data released), the CPRD Data Update Request webform must be submitted and approved before completing this request for linked data.

The study population should be restricted to those eligible for the linked data requested. 

Submitting a request for linked data

Lists of codes/patients should be provided as tab-delimited text files (.txt) and the request should be zipped into a single file. For zipped files >20MB, please contact CPRD (enquiries@cprd.com) for further advice.

Requests for linked data must be submitted by either the Chief Investigator (CI) or the Corresponding Applicant (CA).

Linked data will be provided, by secure transfer, within 10 working days of receipt of a valid, approved request. If the application is incorrectly completed or the lists of codes/patients are not in the correct format, the request will not be processed until these issues are resolved, which may affect the timelines for data delivery. 

To ensure that requests are processed in an efficient and timely manner, please follow the guidance outlining the requirements, which differ by study population definition (Appendix 1), how to apply eligibility for linkage (Appendix 2) and how to prepare code lists (Appendix 3). It is the responsibility of the study team to undertake due diligence to ensure that:

  • the request is complete and correct
  • the delivered data is in line with the completed request 

Redelivery by CPRD may incur a charge. 

 

Appendix 1: Linkage request requirements 

Study population definition 

CPRD Requirements 

What CPRD will provide 

The study population will be based on primary care data only, but data from one or more linked data sources are required for these patients/practices. 
(type 2 linkage request)

Type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

If the study requires practice level linked data, provide the list of practices included in the study.

For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on coded events from linked data only. Primary care data may be used to apply additional inclusion and exclusion criteria.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3)

Only the relevant events of interest and limited data variables (patient pseudonym (i.e. patid), code, and date) for the requested linked data sources, to enable finalisation of the study population.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on coded events from both primary care and linked data sources.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3)

Only the relevant events of interest and limited data variables (patient pseudonym (i.e. patid), code, and date) for the requested linked data sources, to enable finalisation of the study population alongside the CPRD primary care data.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

The study population will be based on non-coded events from linked data e.g. hospital admission dates, dates of death, socioeconomic data.
(linked data will be provided in two stages)

Stage 1: type 1 linkage request:

Provide the definition for the events of interest in the approved data sources.

Only the relevant events of interest and limited data variables (patient pseudonym (i.e. patid), code / requested field, and date) for the requested linked data sources, to enable finalisation of the study population.

Stage 2: type 2 linkage request:

Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2)

For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources.

For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook.

 

Appendix 2: How to apply eligibility for linkage

1. Request the following files from CPRD (enquiries@cprd.com):

  • The list of patient and practice files (i.e. CPRD Denominator files) for the primary care database build that you plan to use for your study (e.g. Aurum June 2021).
  • The latest linkage eligibility files (GOLD/Aurum_enhanced_eligibility_[month]_[year].txt and linkage_coverage_[month]_[year].txt).

Supporting documentation for the linked data are available from https://www.cprd.com/linked-data.

Please note that for new research studies, CPRD will only provide the latest linked data available for each approved data source. Earlier versions of linked data may be provided for ongoing studies conditional on adequate justification. Please contact the CPRD (enquiries@cprd.com) to confirm the latest version of linked data available.

2. Create a source population for the primary care database build by applying patient acceptability criteria for research and any relevant time constraints (e.g. removing patients that died before the start of your study).

3. Combine the source population from step 2 with the list of patients in the linkage eligibility file (GOLD/Aurum_enhanced_eligibility_[month]_[year].txt), excluding those patients who do not appear in both files. 

4. For studies limited to those who are eligible for linkage: Refine the list of patients from step 3 to those who are eligible for linkage to the data source/s approved for your study. For example, to apply linkage eligibility for Hospital Episode Statistics (HES) Admitted Patient Care data and Office for National Statistics death registration data, you should retain those patients where variables hes_apc_e and ons_death_e are both equal to 1. These patients are eligible for linkage to both data sources and can be considered as your source population.

5. Apply any further criteria based on events in primary care then save your list of patients including the relevant linkage flags (patid, hes_apc_e, ons_death_e) as a tab delimited text file and attach this to the Linked Data Request submitted on eRAP by the CI or CA. Please ensure that this list contains all patients for whom linked data is required. Please also use the following naming convention: ‘protocol number_organisation name_patientlist.txt’ e.g. 21_100001_UniversityA_patientlist.txt.

Please note that to ensure provision of the latest available data per data source, and to honour patient opt-outs, the latest eligibility status per patient, for each requested linked data source, will be applied during the processing of a request for linked data. If an earlier source file was used to finalise the list of patients, this earlier eligibility information reflects indicative eligibility only. The linkage eligibility file reflecting patient eligibility status at the time the linkage was undertaken will be provided with the delivery of linked data, this should be used to finalise the denominator populations and associated person-time as appropriate.

Appendix 3: How to prepare code lists 

Code lists should be provided to CPRD as tab delimited text files. Each code list type should be provided in a separate file and each code should appear on a new line. Please see the table below for the coding frames and coding format found in CPRD linked data sources. Please ensure that all code lists are provided in the coding format shown below to avoid delays. All code lists should be submitted together with the completed CPRD Linkage Request form to enquiries@cprd.com
 

CPRD Linked Data Source Coding FrameCode FormatCode Example 
ONS Death Registration DataICD-9 / ICD-10NNN
NNN.N
XNNN.N 
410
410.1
E953.0 
HES Admitted Patient Care
ONS Death Registration data
ICD-10 XNN
XNN.N 
G00
G00.1
HES Outpatient data
HES Accident & Emergency 
ICD-10 XNN
XNNN 
G00
G001
HES Admitted Patient Care
HES Outpatient data
OPCS XNN
XNNN 
Q07
Q071
HES Accident & EmergencyA&E diagnosis/treatmentNN
NNN
01
201
HES Accident & Emergency A&E investigationsNN02
HES Diagnostic Imaging Dataset Imaging Code - NICIPXXXX
XNXXX
XXXXX
XXXXXX
CART
C4DAC
CAAAG
CCHESB
HES Diagnostic Imaging Dataset Imaging Code -SNOMED-CTNN*10077008
1051311000000104
NCRAS Cancer Registration Tumour and Treatment dataICD-9 / ICD-10NNN
NNNN
XNN
XNNN
183
1832
C54
C542

 

 

 

 

 

Page last reviewed