Data quality

CPRD collects data from contributing practices on a daily basis and integrates this with existing data to create releases for observational research.  

Before the data is made available for research, checks are carried out covering the integrity, structure and format of the data. Issues highlighted by the checks are reviewed and addressed before data is incorporated into the data release for researchers.  

We check: 

  • the volume of data downloaded against that supplied  
  • data volumes are in the expected range 
  • all data elements received are of the correct type, length and format 

Our range of validation and quality checks include:

  • Collection-level validation ensures integrity by checking that data received from practices contain only expected data files and ensures that all data elements are of the correct type, length and format. Duplicate records are identified and removed.  
  • Transformation-level validation checks for referential integrity between records ensure that there are no orphan records included in the database (for example, that all event records link to a patient).  
  • Research-quality-level validation covers the actual content of the data. CPRD provides a patient-level data quality metric in the form of a binary ‘acceptability’ flag. This is based on recording and internal consistency of key variables including date of birth, practice registration date and transfer out date. 

In addition to checks undertaken by the CPRD teams before the data is released, researchers using the data are advised to undertake study-specific checks themselves. 
 

Useful publications on the quality of CPRD data for research   

 

See also  

Using CPRD primary care data  

Safeguarding patient data   

CPRD database releases and their digital object identifiers (DOIs) 

Page last reviewed