Denominator data

Learning objectives

This training module is for researchers from organisations that hold a CPRD Multi-Study Licence (MSL). 

By the end of this module, the reader will have learnt:

Note: All patient level data examples used in this training pack are made for the purposes of this training and do not represent real patients.

 

What are denominator data?

Denominator data are files that CPRD provide to nominated users from organisations that hold a CPRD MSL, in order to support the definition and extraction of CPRD primary care data under the contractual terms of the licence. Organisations that do not hold a CPRD MSL do not require these files as CPRD will extract and provide the data agreed within the data specification for their study.

Contents of the denominator data file pack

CPRD provide a package of denominator files for each build of a primary care database. This is provided as a zip file that contains three files.

  • Practice file: One row with supporting information for each practice in the database build.
  • All Patient file: One row with supporting information for each patient in the database build (Note: Includes non-acceptable patients).
  • Patient file: One row with supporting information for each acceptable patient in the database build.

‘Acceptable’ patients are those deemed to have a ‘research-quality’ record. Details of the acceptable flag definition can be found in the glossary terms for the CPRD primary care database at www.cprd.com/primary-care-data-public-health-research

For research studies, most investigators choose to use only acceptable patients.

Content of the denominator files

The CPRD GOLD Practice file includes pracid, region, last collection date (lcd), and up-to-standard (uts) date. The CPRD UTS date is defined in the glossary terms for the CPRD primary care database at www.cprd.com/primary-care-data-public-health-research

An example of the CPRD GOLD Practice file, does not represent real patients.

 

The CPRD Aurum Practice file includes pracid, region and last collection date (lcd). UTS is currently not populated in CPRD Aurum.

An example of the CPRD Aurum Practice file, does not represent real patients.

 

The CPRD GOLD All Patient file includes the following variables:

An example of the CPRD GOLD All Patient file, does not represent real patients.

The CPRD GOLD Patient file includes the same variables but only includes patients who are acceptable (where accept ==1)

 

CPRD GOLD registration dates

CPRD GOLD contains two registration dates. These allow for patients to ‘come and go’ over time.

  • CPRD GOLD first registration date (frd): The date the patient first registered with the GP practice. If the patient only has ‘temporary’ records, then this is the date of their first encounter with the practice. If the patient has ‘permanent’ records, it is the date of the first ‘permanent’ record (excludes preceding temporary records).
  • CPRD GOLD current registration date (crd): The date the patient’s current period of registration with the GP practice began (i.e. most recent registration date). If there are no ‘transferred out periods’ in the patient’s record, then this is the same as the first registration date. If there are ‘transferred out periods’, this is the date of the first ‘permanent’ record after the latest transferred out period.

 

The CPRD Aurum All Patient file includes the following variables:

An example of the CPRD Aurum All Patient file, does not represent real patients.

The CPRD Aurum Patient file will include the same variables but will only include patients who are acceptable (where acceptable ==1).

 

The CPRD death date is a derived algorithm implemented by CPRD and is populated in both CPRD GOLD and CPRD Aurum.

 

When do we use denominator data?

These data are used to define individual patient registration time, eligibility for inclusion in a study population, and calculating denominators for incidence and prevalence rate estimation. Patient registration time is calculated by defining a start and end date for patients in the source population, or pool of patients eligible for inclusion in the study population.

  • To estimate point prevalence, we would need to know the total number of patients in the source population on a specified day: including patients who are ‘active’ or ‘currently registered’ on the specified date. These are patients whose ‘start’ is before and ‘end’ is after, the specified date.
  • To estimate incidence rates or period prevalence, we would need to know the total number of days that a person was registered: count this as the number of days between the patient ‘start’ and ‘end’ dates.

 

How to use denominator data

Definition of start and end dates - recommendation from CPRD researchers

 

CPRD GOLD

CPRD Aurum

Start - maximum of: 
  • Registration date (frd/crd)
  • Start of study period
  • Optionally include up-to-standard (uts) date
  • Registration date (regstartdate)
  • Start of study period
End - minimum of:
  • Transfer out date (tod)
  • Practice last collection date (lcd)
  • Death date (deathdate)
  • End of study period
  • Registration end date (regenddate)
  • Practice last collection date (lcd)
  • Death date (cprd_ddate)
  • End of study period

 

Calculating registration time - CPRD primary care data

Diagrams to visualise periods of patient registration time with complete follow-up are very useful to develop code to operationalise study population definitions. These involve drawing the individual components that will be taken into account when defining start and end dates.

Example: Calculating eligible person-time in a project using CPRD GOLD primary care data (no linked data). This example uses the February 2022 build of CPRD GOLD.

An example of calculating registration time in CPRD GOLD, does not represent real patients.

From top to bottom:

  • CPRD GOLD primary care data is available from approximately 1988 to the end of January 2022.
  • Study period, defined for each research project. Here defined as 01/01/1995 - 31/12/2020.
  • UTS data for practice 10001 is available between the UTS date and the last collection date. If the UTS date is not being used, available follow-up would not be limited by a practice-level ‘start’ date.
  • Current registration and transfer out dates are drawn for each patient separately.

 

The eligible person-time each person contributes to the total follow-up time is as follows:

Patient ID

Start: maximum (latest) of CRD, UTS, and study start date

End: minimum (earliest) of TOD, LCD, and study end date

Eligible person-time contributing to total person-time

10010001 UTS TOD UTS to TOD
10210001 CRD TOD CRD to TOD
10310001 CRD Study end date CRD to study end date
10410001 CRD Study end date None - CRD is after study end date

 

Denominator data for projects including linked data

Where linked data is required to define exposure, outcomes, or key covariates, it is crucial to consider linkage eligibility and linked data coverage periods when defining the source population eligible for inclusion in the study population.

The source file – also referred to as the linkage eligibility file – includes all patients who are registered in English practices that have consented to take part in the linkage process before the data was transferred to the trusted third party for linkage. The file contains a flag for each linked data source to indicate whether patients could be linked (e.g. ons_e = 1/0). Not all patients in the linkage file will be eligible, for example due to an invalid NHS number.

The coverage period for linked data sets will differ for each data source and must be considered when defining the period of patient registration time with complete follow-up. For example, if an outcome is most frequently recorded in an inpatient hospital setting but coverage of HES APC is only available until March 2021, data after this date will be incomplete and including this is likely to result in misclassification bias due to outcome events being missed.

 

Example: Calculating eligible person-time in a project using CPRD Aurum primary care data and linked HES APC data from the January 2022 update. This example uses the February 2022 build of CPRD Aurum.

An example of calculating registration time in CPRD Aurum, does not represent real patients.

From top to bottom:

  • CPRD Aurum primary care data is available from approximately 1988 to the end of January 2022.
  • HES OP data are available from April 1997 to October 2020 in the most recent release in this example
  • Study period, defined for each research project. Here defined as 01/01/2000-31/12/2020.
  • Data for practice 20002 is available from the beginning of data collection to the last collection date.
  • Current registration and transfer out dates are drawn for each patient.

The eligible person-time each patient contributes to the total follow-up time is as follows:

Patient ID

Start: maximum (latest) of regstartdate, HES OP coverage start, and study start date

End: minimum (earliest) of regenddate, HES OP coverage end, LCD, and study end date

Eligible person-time contributing to total person-time

20020002 Study start date regenddate Study start to regenddate
20120002 regstartdate regenddate regstartdate to regenddate
20220002 regstartdate HES OP end of coverage date regstartdate to study end date
20320002 Study start date regenddate None - regenddate is before study start date

 

How to access denominator data

Nominated users can access the latest primary care data support files (i.e. code browser tools, denominator data, look-up files, and linkage source/eligibility files) after logging into the CPRD Data Access Portal on the Shared (L:) drive. Nominated users may share these files with colleagues who are authorised users within their organisation for the purposes of progressing a CPRD study.

Nominated users may requests older versions of the support files from enquiries@cprd.com by specifying whether CPRD GOLD or CPRD Aurum files are required, and for which month and year build.

Page last reviewed