CPRD collects data from contributing practices on a daily basis and integrates this with existing data to create releases for observational research.
Before the data is made available for research, checks are carried out covering the integrity, structure and format of the data. Issues highlighted by the checks are reviewed and addressed before data is incorporated into the data release for researchers.
- the volume of data downloaded against that supplied
- data volumes are in the expected range
- all data elements received are of the correct type, length and format
Our range of validation and quality checks include:
- Collection-level validation ensures integrity by checking that data received from practices contain only expected data files and ensures that all data elements are of the correct type, length and format. Duplicate records are identified and removed.
- Transformation-level validation checks for referential integrity between records ensure that there are no orphan records included in the database (for example, that all event records link to a patient).
- Research-quality-level validation covers the actual content of the data. CPRD provides a patient-level data quality metric in the form of a binary ‘acceptability’ flag. This is based on recording and internal consistency of key variables including date of birth, practice registration date and transfer out date.
In addition to checks undertaken by the CPRD teams before the data is released, researchers using the data are advised to undertake study-specific checks themselves.
Useful publications on the quality of CPRD data for research
- Publication: Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, Smeeth L. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015 Jun 6;44(3):827–36.
- Publication: Wolf A, Dedman D, Campbell J, Booth H, Lunn D, Chapman J, Myles P. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int J Epidemiol. 2019 Dec 1;48(6):1740-1740g. doi: 10.1093/ije/dyz034.
- Publication: Jick SS, Hagberg KW, Persson R, Vasilakis-Scaramozza C, Williams T, Crellin E, Myles P. Quality and completeness of diagnoses recorded in the new CPRD Aurum Database: evaluation of pulmonary embolism. Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1134-1140. doi: 10.1002/pds.4996.
- Publication: Persson R, Vasilakis-Scaramozza C, Hagberg KW, Sponholtz T, Williams T, Myles P, Jick SS. CPRD Aurum database: Assessment of data quality and completeness of three important comorbidities. Pharmacoepidemiol Drug Saf. 2020 Nov;29(11):1456-1464. doi: 10.1002/pds.5135.