This resource module is for researchers from organisations that hold a CPRD Multi-Study Licence (MSL).
By the end of this module, the reader will have learnt:
- What are denominator data?
- Contents of the denominator data file pack
- Content of the denominator files
- When do we use denominator data?
- How to use denominator data
- How to access denominator data
Note: All patient level data examples used in this training pack are made for the purposes of this training and do not represent real patients.
What are denominator data?
Denominator data are files that CPRD can provide to nominated users upon request, from organisations that hold a CPRD MSL, in order to support the definition and extraction of CPRD primary care data under the contractual terms of the licence. Organisations that do not hold a CPRD MSL do not require these files as CPRD will extract and provide the data agreed within the data specification for their study.
Contents of the denominator data file pack
CPRD provide a package of denominator files for each build of a primary care database. This is provided as a zip file that contains three files.
- Practice file: One row with supporting information for each practice in the database build.
- All Patient file: One row with supporting information for each patient in the database build (Note: Includes non-acceptable patients).
- Patient file: One row with supporting information for each acceptable patient in the database build.
‘Acceptable’ patients are those deemed to have a ‘research-quality’ record. Details of the acceptable flag definition can be found in the glossary terms for the CPRD primary care database at www.cprd.com/primary-care-data-public-health-research.
For research studies, most investigators choose to use only acceptable patients.
Content of the denominator files
The CPRD GOLD Practice file includes pracid, region, last collection date (lcd), and up-to-standard (uts) date. The CPRD UTS date is defined in the glossary terms for the CPRD primary care database at www.cprd.com/primary-care-data-public-health-research.
The CPRD Aurum Practice file includes pracid, region and last collection date (lcd). UTS is currently not populated in CPRD Aurum.
The CPRD GOLD All Patient file includes the following variables:
The CPRD GOLD Patient file includes the same variables but only includes patients who are acceptable (where accept ==1)
CPRD GOLD registration dates
CPRD GOLD contains two registration dates. These allow for patients to ‘come and go’ over time.
- CPRD GOLD first registration date (frd): The date the patient first registered with the GP practice. If the patient only has ‘temporary’ records, then this is the date of their first encounter with the practice. If the patient has ‘permanent’ records, it is the date of the first ‘permanent’ record (excludes preceding temporary records).
- CPRD GOLD current registration date (crd): The date the patient’s current period of registration with the GP practice began (i.e. most recent registration date). If there are no ‘transferred out periods’ in the patient’s record, then this is the same as the first registration date. If there are ‘transferred out periods’, this is the date of the first ‘permanent’ record after the latest transferred out period.
The CPRD Aurum All Patient file includes the following variables:
The CPRD Aurum Patient file will include the same variables but will only include patients who are acceptable (where acceptable ==1).
The CPRD death date is a derived algorithm implemented by CPRD and is populated in both CPRD GOLD and CPRD Aurum.
When do we use denominator data?
These data are used to define individual patient registration time, eligibility for inclusion in a study population, and calculating denominators for incidence and prevalence rate estimation. Patient registration time is calculated by defining a start and end date for patients in the source population, or pool of patients eligible for inclusion in the study population.
- To estimate point prevalence, we would need to know the total number of patients in the source population on a specified day: including patients who are ‘active’ or ‘currently registered’ on the specified date. These are patients whose ‘start’ is before and ‘end’ is after, the specified date.
- To estimate incidence rates or period prevalence, we would need to know the total number of days that a person was registered: count this as the number of days between the patient ‘start’ and ‘end’ dates.
How to use denominator data
Definition of start and end dates - recommendation from CPRD researchers
|Start - maximum of:||
|End - minimum of:||
Calculating registration time - CPRD primary care data
Diagrams to visualise periods of patient registration time with complete follow-up are very useful to develop code to operationalise study population definitions. These involve drawing the individual components that will be taken into account when defining start and end dates.
Example: Calculating eligible person-time in a project using CPRD GOLD primary care data (no linked data). This example uses the February 2022 build of CPRD GOLD.
From top to bottom:
- CPRD GOLD primary care data is available from approximately 1988 to the end of January 2022.
- Study period, defined for each research project. Here defined as 01/01/1995 - 31/12/2020.
- UTS data for practice 10001 is available between the UTS date and the last collection date. If the UTS date is not being used, available follow-up would not be limited by a practice-level ‘start’ date.
- Current registration and transfer out dates are drawn for each patient separately.
The eligible person-time each person contributes to the total follow-up time is as follows:
Start: maximum (latest) of CRD, UTS, and study start date
End: minimum (earliest) of TOD, LCD, and study end date
Eligible person-time contributing to total person-time
|10010001||UTS||TOD||UTS to TOD|
|10210001||CRD||TOD||CRD to TOD|
|10310001||CRD||Study end date||CRD to study end date|
|10410001||CRD||Study end date||None - CRD is after study end date|
Denominator data for projects including linked data
Where linked data is required to define exposure, outcomes, or key covariates, it is crucial to consider linkage eligibility and linked data coverage periods when defining the source population eligible for inclusion in the study population.
The source file – also referred to as the linkage eligibility file – includes all patients who are registered in English practices that have consented to take part in the linkage process before the data was transferred to the trusted third party for linkage. The file contains a flag for each linked data source to indicate whether patients could be linked (e.g. ons_e = 1/0). Not all patients in the linkage file will be eligible, for example due to an invalid NHS number.
The coverage period for linked data sets will differ for each data source and must be considered when defining the period of patient registration time with complete follow-up. For example, if an outcome is most frequently recorded in an inpatient hospital setting but coverage of HES APC is only available until March 2021, data after this date will be incomplete and including this is likely to result in misclassification bias due to outcome events being missed.
Example: Calculating eligible person-time in a project using CPRD Aurum primary care data and linked HES APC data from the January 2022 update. This example uses the February 2022 build of CPRD Aurum.
From top to bottom:
- CPRD Aurum primary care data is available from approximately 1988 to the end of January 2022.
- HES OP data are available from April 1997 to October 2020 in the most recent release in this example
- Study period, defined for each research project. Here defined as 01/01/2000-31/12/2020.
- Data for practice 20002 is available from the beginning of data collection to the last collection date.
- Current registration and transfer out dates are drawn for each patient.
The eligible person-time each patient contributes to the total follow-up time is as follows:
Start: maximum (latest) of regstartdate, HES OP coverage start, and study start date
End: minimum (earliest) of regenddate, HES OP coverage end, LCD, and study end date
Eligible person-time contributing to total person-time
|20020002||Study start date||regenddate||Study start to regenddate|
|20120002||regstartdate||regenddate||regstartdate to regenddate|
|20220002||regstartdate||HES OP end of coverage date||regstartdate to study end date|
|20320002||Study start date||regenddate||None - regenddate is before study start date|
How to access denominator data
- Denominator data can be requested from email@example.com by nominated users from organisations that hold CPRD Multi-Study Licences. Researchers should contact the nominated users within their organisation to share these files.
- Requests should state the data source you are interested in (CPRD GOLD / CPRD Aurum) and the month and year of the build.
- Lookup files and the source data for linkage sets (coverage periods and eligibility) can also be requested in this way.