An assessment of CPRD Aurum data quality

Study type
Protocol
Date of Approval
Study reference ID
18_191
Lay Summary

Each new database must be evaluated to be sure that the data are of sufficiently high quality to be useful in medical research. The data generated by general practitioners (GPs) who keep their patients' medical records on computers can be very useful for medical research, but the quality of the data must be demonstrated first. Without evidence of data quality, the use of data collected primarily for patient care in research may be scientifically questionable. Since the data in the CPRD Aurum database were not originally designed to be used for research it is particularly important to assess their quality and completeness. We propose a thorough data quality assessment study using multiple strategies to evaluate the different types of data collected in the database before using them for research purposes. We will compare information in CPRD Aurum to information in a hospital database to see if the data in Aurum are complete. We will also look to see if medications recorded in the electronic record match the diagnoses recorded by the GPs and if the diagnoses are supported by the treatments they receive.

Technical Summary

We will use several data quality assessment techniques to assess the quality and completeness of the CPRD Aurum data. We propose a number of exercises based on published recommendations to assess the quality of the newly available CPRD Aurum data including:
• Comparison with a gold standard, comparing hospitalisation-related data in CPRD Aurum to linked HES Admitted Patient Care records.
• data element agreement and validity check methods involving several drug /lab value and drug /disease pairings to look for consistency
• data source agreement method: to assess completeness, correctness, concordance, and plausibility of breast cancer diagnoses in the CPRD Aurum data by comparisons with previously published findings from CPRD GOLD.
• element presence (to understand availability and potential bias of key covariates): We will calculate the number of body mass index (BMI), smoking, blood pressure (BP) records per patient by practice and restricted to patients with cardiovascular disease (CVD) in each practice (a subset of patients who should have more recordings of each of these variables). We will provide the mean, median and mode for each indicator for all patients in a practice vs people with CVD.
We will also look at the proportion of patients who receive drug treatments for benign prostatic hypertrophy (BPH) and have an indication for the drug in their record to assess presence in the GP record.
• data consistency over time: Total number of prescriptions and diagnoses by practice, by month or quarter

Health Outcomes to be Measured

Type II diabetes
- Breast cancer
- Pulmonary embolism
- Benign prostatic hyperplasia
- Myocardial infarction
- Rheumatoid arthritis
- Breast cancer and all cancers

Collaborators

Susan Jick - Chief Investigator - BCDSP - Boston Collaborative Drug Surveillance Program
Susan Jick - Corresponding Applicant - BCDSP - Boston Collaborative Drug Surveillance Program
- Collaborator -
Catherine Vasilakis-Scaramozza - Collaborator - BCDSP - Boston Collaborative Drug Surveillance Program
Eleanor Yelland - Collaborator - CPRD
Katrina Hagberg - Collaborator - BCDSP - Boston Collaborative Drug Surveillance Program
Puja Myles - Collaborator - CPRD
Rebecca Persson - Collaborator - BCDSP - Boston Collaborative Drug Surveillance Program

Former Collaborators

Elizabeth Crellin - Collaborator - The Health Foundation

Linkages

HES Admitted Patient Care