Longitudinal Assessment of Ethnicity Completeness and Distributions in CPRD Aurum

Study type
Protocol
Date of Approval
Study reference ID
22_001738
Lay Summary

Reports have shown that ethnicity plays a role in the health outcome of patients. As a result, ethnicity is an important factor in research studies. This study aims to describe the capture and distribution of ethnicity data in CPRD Aurum, a primary care database. All patients’ ethnicities will be reported overall and by patient factors: sex, region, age group, index of multiple deprivation and registration year. This will provide a picture of the current ethnicity missingness patterns in the data source, showing how it is not recorded in connection to patient factors. The findings of this study will help develop guidance methods for handling missing ethnicity data in database studies. This will help researchers improve study results and provide further insights for understanding health inequalities related to ethnicity.

The common methods for handling missing ethnicity data include creating a combined missing or unknown category, excluding individuals with missing ethnicity or all together exclusion of ethnicity from the analysis. Multiple imputation is a popular statistical method for handling missing data in medical research. One of the limitations of this method for ethnicity in particular is that the data missingness is not at random. When this assumption is not true results created may be misleading. One way to compare ethnicity data captured in databases is to compare distributions to a ‘gold standard’ benchmark such as census data from the Office of National Statistics (ONS). As a final step in this study, we plan to compare observed ethnicity distributions to ONS data.

Technical Summary

Ethnicity is an important factor in health outcomes research. Several methods have been developed to handle missing ethnicity data, including weighted imputation using population level estimates and multiple imputation by chained equations. However, there is currently no best practice guidance for managing missing ethnicity data in UK primary care research studies.

This study will comprise of two objectives:
1. To describe ethnicity completeness in UK primary care by the following demographic factors: sex, region, age group, index of multiple deprivation (IMD) score and registration year
2. To compare missingness levels across the demographic factors to Office of National Statistics (ONS) data

The study period will be between 01 January 2000 until 31 December 2022, all previously registered patients and newly registered patients will be captured within this period. All patients in CPRD Aurum will be eligible for the study, not limiting to disease or pre-specified exposures. Ethnicity completeness will be described overall and stratified by year, newly registered patients, and ethnic groups. To compare between the different levels of completeness logistic regression analyses will be performed for the variables of interest. As a final step in this study, we plan to compare observed ethnicity distributions to ONS data stratifying by region. The findings of this study will help develop guidance and methods for handling missing ethnicity data in database studies. This will help researchers improve study results and provide further insights for understanding health inequalities related to ethnicity. Improved data methods and accuracy of study results from will aid the development of health intervention that improve patient’s health outcomes.

Health Outcomes to be Measured

The primary outcome of this study is ethnicity recording, we plan to assess how the capture of this variable is affected by registration year and demographic factors: age group, sex, region, and IMD score.

Collaborators

James Carpenter - Chief Investigator - London School of Hygiene & Tropical Medicine ( LSHTM )
Esther Tolani - Corresponding Applicant - London School of Hygiene & Tropical Medicine ( LSHTM )
Irene Petersen - Collaborator - University College London ( UCL )
Rohini Mathur - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )

Linkages

Patient Level Index of Multiple Deprivation