How to access CPRD data

Learning objectives

By the end of this module, the reader will have learnt:

How do I know whether CPRD data is suitable for my research?

There are a number of resources available to help researchers explore whether CPRD primary care data and linked data are suitable for their research:

  • The CPRD GOLD and CPRD Aurum Data Specifications describe the format of how CPRD primary care data is provided (including the file and table structure, and field formats), and the coding systems used – available at www.cprd.com/primary-care-data-public-health-research.
  • The CPRD GOLD and CPRD Aurum Data Resource Profile publications describe how CPRD GOLD and CPRD Aurum data are collected and made available for research, what type of data is available in these primary care databases recorded by GPs, and the strengths and limitations of these datasets – available at www.cprd.com/primary-care-data-public-health-research.
  • The CPRD GOLD and CPRD Aurum Code Browser Tools that CPRD has developed are specific for the medical codes and product codes used in CPRD GOLD and CPRD Aurum primary care data. Researchers should explore the code browser tool to see if their conditions or treatments of interest have codes which are recorded by GPs in their patients’ records. The code dictionaries are updated with each CPRD primary care database release to include any new codes introduced by the GP software systems – this can be provided upon request, please contact enquiries@cprd.com so that download credentials can be set up for you (credentials expire within 7 days).
  • The Linked Data Documentation for all the established CPRD linked datasets, includes the source of each dataset, the format and structure of the data, the coding systems used, and the coverage period linked to CPRD primary care data – available at www.cprd.com/linked-data.
  • The Algorithm-Derived Data Documentation for the established derived datasets and value-added variables, includes the format and structure of the data, and the methodology - available at www.cprd.com/cprd-algorithm-derived-data.
  • The ICD-10 Code Dictionary contains the codes used in some linked datasets (HES Admitted Patient Care data, HES Outpatient data, HES Accident & Emergency data, and ONS death registration data). This is not a CPRD resource but is freely available online. Researchers are advised to seek access to these codes via licence with NHS England or search the internet for free resources.  A free version of the ICD-10 dictionary can be found on the World Health Organization website www.who.int/classifications/icd/icdonlineversions/en/. The ICD-10 dictionary can also be downloaded from NHS England TRUD (https://isd.digital.nhs.uk/trud3/user/guest/group/0/home).
  • The CPRD Bibliography lists publications from studies using CPRD data. Researchers should refer to relevant publications based on CPRD data in your area of research to estimate the numbers of patients expected in the CPRD databases. The searchable list of publications is updated monthly – available at www.cprd.com/bibliography.
  • CPRD may offer to run a Feasibility Count where an estimate of patients for a study cannot be obtained from freely available literature or the CPRD Bibliography. Only one free simple feasibility count is permitted per proposed research study. Requests must be limited to the application of up to three criteria only. Please note that counts based on entity types are not provided under this service – the Feasibility Count Request form can be provided upon request along with the CPRD Code Browsers, please contact enquiries@cprd.com with details about your request so that CPRD can confirm if a Feasibility Count can be provided.
  • Researchers can request a Feasibility Study if more detailed information is required to assess the feasibility of conducting a future study – more detail about Feasibility Studies, including guidance notes are available at www.cprd.com/research-applications. Please note, there are fees and contractual procedures involved in this service.
     
  • CPRD Synthetic Datasets have been developed to enable researchers to explore the structure of CPRD primary care data without access to real patient-level data – more detail about these synthetic datasets and the request form are available at www.cprd.com/content/synthetic-data. Please note, there are fees and contractual procedures involved in this service.  

We also recommend that researchers liaise with UK clinicians to understand how the patients of interest are treated, managed, and their data recorded within the UK healthcare system.

For researchers from an organisation that holds a CPRD Multi-Study Licence (MSL), we recommend discussing your research with your organisation’s nominated users or key licence contacts in the first instance, to understand the CPRD licence, how best to use CPRD data and submit research applications, and any processes specific to your organisation.  

What are the options for accessing CPRD data? 

There are two licence options for accessing CPRD primary care data and the majority of linked datasets with specific contract terms:

  1. Single study dataset licence – where a study dataset defined by an approved research application will be prepared by CPRD, and access granted to researchers via the CPRD Trusted Research Environment (TRE).
  2. Multi-study licence (MSL) – enables an organisation to conduct multiple studies within a 12-month period and for nominated users to access the primary care data directly. 

As a not-for-profit, cost-recovery UK government research service, CPRD must recoup the cost of delivering research services from data access licence fees. The fees for access to CPRD data (excluding VAT) are available at www.cprd.com/pricing. Official quotes and further information can be provided upon request by contacting enquiries@cprd.com.

Single study dataset licence

Organisations must submit a research application via www.erap.cprd.com. Applicants need to discuss their study protocol with a member of the CPRD Observational Research team by sending an email to enquiries@cprd.com along with their draft protocol. 

Once the research application is approved, the CPRD Contracts team will liaise with the applicant to manage the contractual arrangements, and the CPRD Observational Research team will liaise with the applicant to manage the definition of the study dataset to be prepared. Following agreement of the data specification, CPRD will make the final study dataset, containing all variables for the patients defined in the study population for all primary care and linked datasets requested in the approved protocol, available to the research applicants on the CPRD TRE. The fees will depend on what data types are requested, the duration of the study in the TRE workspace, and how many Virtual Machines (VMs) are required. 

Multi-study licence

A CPRD primary care multi-study licence (MSL) enables an organisation to have direct online access to the CPRD GOLD and CPRD Aurum primary care databases via nominated users, which may be the most cost-effective option if an organisation anticipates conducting multiple patient-anonymised observational research studies within a 12-month period. The MSL includes at least two nominated users (additional nominated users can be purchased for the Standard and Full MSL) and allows organisations to conduct preliminary internal exploration prior to submission of a research application.

Organisations that hold a CPRD primary care data MSL can also purchase MSLs for certain linked datasets for the same 12-month period. CPRD does not provide a complete version of the linked data, organisations will receive individual study data cuts for each approved study.

This option enables organisations to explore CPRD primary care directly, and extract CPRD GOLD and CPRD Aurum data for research studies during the 12-month period (with the exception of studies requiring linked NCRAS SACT and/or RTDS data which must be provided as a complete study dataset by CPRD). Linked and algorithm-derived data for approved studies must be requested from CPRD. 

What do I need to access CPRD data?

Access to CPRD anonymised patient-level data via a single study dataset requires completion of the following steps:

  • Client approval: Organisations must first gain CPRD Client approval in order to access CPRD data (or CPRD Funder approval in order to fund research using CPRD data but not access data directly) – more detail about this requirement and the application forms are available at  www.cprd.com/Data-access.
  • Research Application approval: Following CPRD Client / Funder approval, researchers must register for an account on the CPRD electronic Research Application Portal (eRAP) at www.erap.cprd.com. Once their eRAP account is approved, researchers can draft and submit Research Applications to be reviewed via CPRD’s Research Data Governance (RDG) process. Information about the RDG process is available at www.cprd.com/research-applications. Guidance about completing and submitting a Research Application is available at the footnotes of the eRAP website at www.cprd.com/content/guidance-completion-cprd-research-data-governance-rdg-application. A decision on a research application can normally be expected approximately 4 weeks from the submission date.
  • Licence agreement: Once a CPRD Research Application is approved by RDG, a licence agreement must be put in place between CPRD and the Research Application Sponsor, and between the Sponsor and any collaborators accessing CPRD data. CPRD will get in touch to execute the appropriate contract templates, covering the terms and conditions for the data requested.
  • Data specification and provision: CPRD will then liaise with Research Application Chief Investigator (CI) and Corresponding Applicant (CA) to agree on a data specification, specifying how the study population is to be defined. Once this has been agreed and signed off, CPRD will prepare and provide the final study dataset, containing all variables for the patients defined in the study population for the primary care and linked datasets justified in the approved protocol, on the CPRD TRE.  

For studies that request National Cancer Registration and Analysis Service (NCRAS) data, there are additional steps required to access the SACT and/or RTDS datasets. Prior to submission of the research application, a completed NCRAS Data Selection Form must first be approved by CPRD. This process is detailed in the NCRAS documentation available at www.cprd.com/linked-data#Cancer%20data.

For organisations that hold a CPRD Multi-Study Licence (MSL), the licence agreement and data access steps are slightly different. Please see the training module For MSL holders for further detail.

When do you need ethics approval? 

Approval from an NHS Research Ethics Committee (REC) may be required if the proposed study is not purely observational. It is the CI’s responsibility to determine whether additional REC approval is required. Please consult the latest version of GafREC Research Ethics Committee – Standard Operating Procedures for more guidance. 

What are the key obligations for using CPRD data?

Protecting the confidentiality of patient data is paramount to CPRD. Researchers must adhere to robust terms and conditions governing how the anonymised data is used. The full terms and conditions governing use of CPRD data are detailed in the CPRD licence agreements however, here are some of the key contractual obligations for researchers using CPRD data:

  • Respect patient confidentiality and the public ‘trust’ in what we do: for the benefit of all and protecting their individual interests in privacy.
    • Ensure proper use and care of data which may reduce chances of data breaches.
    • Comply with Data Protection policy and ensure security measures in place for CPRD data. 
  • Researchers cannot attempt to re-identify, contact, or target patients or GP practices.
  • Researchers cannot merge CPRD data with other datasets or compare cases from different extracts.
  • Researchers cannot use CPRD data for ‘market analysis’- advertising campaigns or sales.
  • Report incidents and data breaches within 24hours – let us know about any possible data loss, corruption, and possible re-identification.
  • Researchers cannot share log-ins that you personally are responsible for.
  • Researchers cannot share CPRD data outside of your organisation without an approved research application and the appropriate contractual agreements in place e.g. collaborators on your protocol.
  • Access to CPRD data is covered by a valid contractual agreement. Data can be retained for 12 months initially, after which they must either be destroyed, or an extension requested and approved by CPRD.
  • Any amendments to research applications are submitted within 4 years of initial study protocol approval. 

For researchers from an organisation that holds a CPRD Multi-Study Licence (MSL), there are more points specific to MSL agreements in the For MSL holders module, and we also recommend discussing your research with your nominated users or key licence contacts in the first instance, to understand the obligations detailed within your CPRD licence. 

Contractual acknowledgements at publication

Please note that when publishing research based on linked data, there are contractual requirements relating to acknowledgments stated in the CPRD data access agreements.

The following statements below should be included in publications arising from the use of CPRD GOLD, CPRD Aurum and linked data.

  • CPRD primary care data: “This study is based in part on data from the Clinical Practice Research Datalink obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. The data is provided by patients and collected by the NHS as part of their care and support. The interpretation and conclusions contained in this study are those of the author/s alone”. The Customer will ensure that the description of the CPRD Database in any such Publication is accurate and current, and agrees to request publication of a correction to any published description which CPRD deems to be inaccurate if so, requested by CPRD;
  • Hospital Episode Statistics (HES) and/or Office for National Statistics (ONS) data: “Copyright © (year), re-used with the permission of The Health & Social Care Information Centre. All rights reserved”. Users should ensure that the description of the HES/ONS data in any such publication is accurate and current, and agree to request publication of a correction to any published description which CPRD or the linked data owner deems to be inaccurate, if so requested by CPRD or the linked data owner;
  • National Cancer Registration and Analysis Service (NCRAS) data: "This work uses data that has been provided by patients and collected by the NHS as part of their care and support. The data are collated, maintained and quality assured by the National Disease Registration Service, which is part of NHS England.";

Publications should also include the study protocol number and preserve confidentiality at the reporting stage - the possibility of unintentional (deductive) disclosure arises when cells with small numbers of patients are quoted. Applicants should note that, when reporting the data, CPRD policy is that no cell should contain <5 events.  If you are using CPRD data that has a Digital Object Identifier (DOI) please cite this in any publications. Further information is available at www.cprd.com/digital-object-identifiers-dois-datasets.


Next module: Defining your study population

Page last reviewed