CPRD Trusted Research Environment

CPRD Trusted Research Environments (TREs) are considered the future method for access to healthcare data.

In recent years there has been a growing public awareness of data privacy and calls for greater transparency of data sharing, including assurances that healthcare data is safe, secure, and only used for its intended purposes.

CPRD has provided access to anonymised patient data for research for the benefit of public health for 35 years. We have an excellent track record in ensuring that data is safe, secure, and only used for its intended purposes, as recently validated by an external NHS Audit. To further strengthen the secure, safe use of data for research, we have embarked on the development of a CPRD TRE.

  • About Trusted Research Environments
  • The CPRD Trusted Research Environment (TRE)
  • How the CPRD Trusted Research Environment (TRE) works
  • Technical details
  • Onboarding to the TRE
  • Next steps

About Trusted Research Environments

Trusted Research Environments (TREs), also known as Secure Data Environments (SDE) or Data Safe Havens, are highly secure computing environments that provide remote access to data for approved researchers to use for public health research.

TREs differ from widely used models of data access where researchers need to download data onto their computer to be able to use it for their analysis.

You can find out more about TREs in this document by HDR UK: https://www.hdruk.ac.uk/wp-content/uploads/2021/09/HDRUK_TRE-One-Pager.pdf

We are developing the CPRD TRE in line with the Five Safes framework, allowing approved researchers to access our data in a secure and controlled way.

The Five Safes are:

  1. Safe People – Researchers are trained and authorised to use the data safely
  2. Safe Projects – Research projects are approved by data owners for public good
  3. Safe Setting – A secure environment that prevents unauthorised use
  4. Safe Data – Data is treated to protect any confidentiality concerns
  5. Safe Outputs – Approved outputs that are non-identifiable

In response to patient feedback we have developed an “Airlock” which uses both human and automated checking to ensure any data leaving the TRE is a “Safe Output”. 

Find out more in the news article Patients reviewed our CPRD Trusted Research Environment ‘airlock’ system.

The CPRD Trusted Research Environment (TRE)

The CPRD TRE service gives approved researchers with approved projects secure access to CPRD healthcare data. All patient information in the TRE is anonymised, which means that any identifying (or personal) information such as names, addresses or NHS numbers are removed and an individual cannot be identified.

How the CPRD Trusted Research Environment (TRE) works – a guide for potential users

Using Government Digital Standards (GDS); we have created a secure, scalable, reliable, and accessible service. 
Please go to the training guide: CPRD TRE features: a guide for users | CPRD

The technical components of the TRE

The CPRD TRE consists of a safe shared workspace that users securely connect to via a virtual machine. The workspace is protected from the internet and provides access to research data, code libraries and analytics tools.

A diagram of a shared workspace with three virtual machines that enables researchers to work together on the same data and analysis. The workspace contains data analysis and statistics tools and a dataset has been added via a secure airlock. There is a connection to GitHub to enable code libraries and scripts to be imported from GitHub. There is no online access. An automated checker and human checks are conducted in the output airlock to ensure all analysis is anonymised.
 

 

The Goldacre Review

In February 2021, Professor Ben Goldacre was commissioned by the UK Government to review safety and security in the use of health data for research and analysis.

The Goldacre Review – Better, broader, safer: using health data for research and analysis – was released in April 2022 and made 57 recommendations focusing on the use of TREs as the future direction of travel for a strengthened and consistent management and access of healthcare data.

In June 2022, the Department of Health and Social Care released the Data saves lives: reshaping health and social care with data policy paper, which supported the recommendations in the Goldacre Review.

In line with this UK-wide direction of travel, our major strategic aim is therefore to move to a predominantly TRE-based model for data access and analysis.

Most research that utilises our data will take place within the TRE but there may be limited instances, such as patient consented trials, where it will be possible for data to leave the TRE to be combined with other datasets for analysis.

What we have delivered

In April 2022 we began working on the first iteration of our TRE. This initial version allowed our researchers to log into a secure environment and use tools such as R, Python and Stata to analyse synthetic (artificial) data.

In April 2023, we tested a second iteration of the TRE with internal and external users. This version allowed researchers to log in to dedicated areas of the secure environment, called workspaces, and analyse anonymised real-world data.

Onboarding to the TRE

A.    Initial Application:

Timings are dependent on the applicant.

1.    Use the eRAP (electronic Research Application Portal) to fill out an application.

B.    Validation of New Protocols to first RDG Outcomes:

2.    The RDG reviews the research protocol and requests clarifications or provides approval.

3.    For full applications (such as SSL) this step takes an average of 30 working days from valid application submission to first outcome.

C.    Contract Process:

From mail out to signature: this step usually takes a minimum of 4 weeks.

4.    The client receives a contract template.
       a.    The contract includes agreements related to end users and data usage.

5.    The client legal team reviews the contract and returns it signed.

D.    Data Specification:

This step usually takes 8-12 weeks.

6.    The client is contacted to start defining data requirements (using the data specification form) within 15 days, as per SLA).

7.    Client completion of the data specification form and agreement on the data specification; takes on average 6 weeks. 

E.    Data Delivery:

This phase spans a total of 6 weeks and includes concurrent activities:

8.    Data and training:
·    Workspace Owner and Super User Training on the TRE platform (4 weeks).
·    Data Cut: Extracting and preparing the data (within 30 days as per SLA).

F.    Workspace Creation

This phase takes 1 to 2 weeks

9.    Data imported into TRE and checked.

10.    Researcher access to data within the TRE.

11.    Research starts.

Remember that these timelines are averages, and actual durations may vary based on client-specific processes and our internal procedures. When planning research studies, consider these steps and their associated timeframes before data analysis can commence.

Next steps

From April 2024, CPRD Single Study Licence users will be the first to access the CPRD TRE.

From April 2025 RDG clients will be able to use the TRE and from October 2025 onwards we will be migrating our MSL clients over to the new platform.

High level Roadmap milestones

Roadmap for TRE delivery up to Q4 2026

April 2024 First clients start the Single Study Licence on boarding process that will lead to using the TRE.

July 2024 Release of SQL dataset products for larger cohort sizes.

March 2025 release of airlock improvements.

April 2025 Further development for MSL clients to conduct RDG studies.

October 2025 Migration of MSL clients to TRE begins.

Our roadmap is subject to influences of legislative, technological and client need.

We will continue to update this page as we progress through the development stages.

Further information

GitHub - MHRA/cprd-oss-tre: An accelerator to help organizations build Trusted Research Environments on Azure.

Medicines and Healthcare products Regulatory Agency Delivery Plan 2021-2023

CPRD TRE features: a guide for users | CPRD

Page last reviewed