The impact of the National Data Opt-out programme on representativeness of the Clinical Practice Research Datalink (CPRD) database.

Study type
Protocol
Date of Approval
Study reference ID
23_003171
Lay Summary

CPRD data have been used extensively for research informing public health policy, clinical guidelines, and drug safety. A key strength of these data is their representativeness to the UK population. In May 2018, a National Data Opt-out Programme (NDOP) was launched, whereby patients can request that their confidential patient information not be used beyond their personal care, e.g. not used for health research. Although CPRD uses only de-identified patient data, patients choosing to 'opt-out' will not be included in the CPRD datasets. Evidence to date suggests that people choosing to opt-out are more likely to be older, female, and less socially disadvantaged; as a result, there is concern that NDOP may lead to under-representation of sections of the population in CPRD data. This could compromise the representativeness of the CPRD database, and consequently, studies using these data might be biased, and policy decisions based on these data may not fully reflect benefits/harms for all sections of society. CPRD is therefore proposing to undertake a data quality monitoring exercise to assess any potential impact of the NDOP on the representativeness of the CPRD database. The distribution of patient demographic characteristics, proportion of people with key indicator conditions (such as atrial fibrillation, asthma, and chronic kidney disease) and deaths from specific causes (including cancers, circulatory diseases, and mental/behavioural disorders) will be plotted for each monthly database over the NDOP policy roll-out period (May 2018 to July 2022) and compared to national statistics to understand the representativeness of CPRD data over time.

Technical Summary

The National Data Opt-out Policy (NDOP) was launched in May 2018 to enable individuals to opt out of their confidential patient information being used for purposes beyond their individual care. Differential opt-out patterns by socio-demographic characteristics could introduce bias in routine healthcare data sources, like CPRD, that are used for research and healthcare planning, and could therefore impact on the generalisability of research findings that use these data.

The aim of the proposed work is to assess the representativeness of the CPRD primary care database over time. We will assess a) the CPRD patient distribution in terms of demographic characteristics (age, sex, and socioeconomic status), b) the percentage of CPRD patients across different geographies (regions, and rural/urban classification), and c) estimate the prevalence of key indicator conditions and cause-specific mortality.

The study will comprise a series of consecutive monthly cross-sectional descriptive reports. We will use data from each monthly database build, January 2018 – July 2022, with additional analyses in archived versions of the database from the preceding decade. Key outcomes will comprise Quality Outcome Framework (QOF) indicator conditions (including coronary heart disease, diabetes mellitus, and dementia), and all cause and cause-specific mortality rates (including deaths from cancer, respiratory disease, and circulatory disease). Analyses will be undertaken in the entire CPRD database population, and in age and/or sex specific groups for certain outcomes.

Descriptive analyses will explore the distribution of key sociodemographic characteristics of the CPRD database population, by rural/urban classification and region, which will be compared to national population statistics. The prevalence of selected QOF indicator conditions and cause-specific mortality rates will be calculated for CPRD population and compared to published national annual QOF outcomes/ONS death registrations.

Health Outcomes to be Measured

Selected Quality Outcome Framework (QOF) indicator conditions:
- Atrial fibrillation
- Coronary heart disease
- Heart failure
- Hypertension
- Peripheral arterial disease
- Stroke and transient ischaemic attack
- Asthma
- Chronic obstructive pulmonary disease (COPD)
- Cancer
- Chronic kidney disease
- Diabetes mellitus
- Palliative care
- Dementia
- Epilepsy
- Learning disabilities
- Mental health
- Osteoporosis
- Rheumatoid arthritis
- Obesity
Selected mortality outcomes:
- Age-standardised all-cause mortality rate (by sex and region)
- Age-specific mortality rates by sex
- Age-standardised cause-specific mortality rates, by sex, for:
~ Cancer
~ Respiratory disease
~ Circulatory disease
~ Mental and behavioural disorders
~ Diseases of the nervous system

Collaborators

Sonia Coton - Chief Investigator - CPRD
Sonia Coton - Corresponding Applicant - CPRD
Chisomo Mutafya - Collaborator - CPRD
Justin Chan - Collaborator - CPRD
Rachael Williams - Collaborator - CPRD

Linkages

ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation;Rural-Urban Classification