An investigation of the completeness and accuracy of the recording of stroke events (including by subtype) in primary (CPRD) and secondary care databases (HES and ONS)

Date of Approval
Application Number
Lay Summary

This study will investigate various aspects of the recording of a diagnosis of stroke in the primary care Clinical Practice Research Datalink (CPRD) database. We are particularly interested in establishing what proportion of stroke events are recorded in primary care databases (completeness) and how specific the coded events are in terms of the main types of stroke (diagnostic accuracy). We will do this by identifying the same stroke event in multiple data sources (CPRD, Hospital Episode Statistics (HES) and mortality data from the Office of National Statistics (ONS)) and comparing the way in which they are recorded and coded.

Our proposed study will help to establish the best strategy for identifying individuals who have experienced a stroke in electronic healthcare records, including CPRD. This would inform the definitions and patient selection for future observational studies that rely on CPRD and other sources of healthcare data, including our own.

Technical Summary

The overall aim of this study is to assess the completeness and diagnostic validity of the recording of stroke, by subtype, in primary care using HES as a reference. To this end we will construct - using HES in-patient data - a retrospective cohort of patients who have been hospitalised and diagnosed as having had a stroke. For those patients with linked CPRD-HES records, we will compare various aspects of the recording of individual stroke events across the two data sources. We will conduct a similar exercise using ONS data.

We will assess the completeness of the recording of stroke events in CPRD in a Venn diagram. We will also investigate the level of event date agreement across data sources. If we find a significant proportion of unrecorded stroke events in CPRD, we will perform logistic regression analyses to establish whether factors such as age, sex, year of stroke and mortality are contributing to the suboptimal recording of stroke in primary care. In addition, we will assess the diagnostic accuracy of CPRD stroke recording by subtype (ischaemic, subarachnoid hemorrhagic, intracerebral haemorrhagic, TIAs) and determine the sensitivity and positive predictive value (PPV) of different stroke code sets (all stroke and for stroke subtypes).

Health Outcomes to be Measured

Subarachnoid haemorrhage; intracerebral haemorrhage; cerebral infarction; stroke not specified as haemorrhage or infarction; transient cerebral ischaemic attacks (TIAs) and related syndromes.


Jennifer Quint - Chief Investigator - Imperial College London
Ann Morgan - Corresponding Applicant - London School of Hygiene & Tropical Medicine ( LSHTM )


HES Admitted Patient Care;ONS Death Registration Data