An investigation of the reliability and granularity of the recording of interstitial lung diseases in primary and secondary care datasets

Study type
Protocol
Date of Approval
Study reference ID
20_000068
Lay Summary

Interstitial lung disease (ILD) is a collective term for a large group of conditions that are typically characterized by scarring (fibrosis) of the lungs. Idiopathic pulmonary fibrosis or IPF is the most common type of ILD, with approximately 5,000 new cases recorded in the UK each year. There is evidence that the number of people with ILD, and IPF in particular, is increasing, but whether this is a real increase or simply a reflection of better diagnosis or recording is not clear. The prognosis for people who develop ILD such as IPF is extremely poor; average survival from diagnosis is just three years and until recently there has been little in the way of effective treatment.
Some people with other ILDs, for example, hypersensitivity pneumonitis and auto-immune related ILDs also suffer from progressive lung fibrosis, similar to that seen in IPF, and have equally poor survival prospects. However in general we know far less about the incidence, prevalence and patient outcomes for these rarer types of ILD. Research in this area is hampered by clinical challenges in diagnosing the different types of ILD, name changes and reclassifications over time, and uncertainties surrounding the quality of the recording of specific types of ILD in patients’ health records.
Through this study we hope to develop a better understanding of the way in which the different types of interstitial lung disease are recorded in primary healthcare databases and thereby support further research into the full spectrum of ILD and its consequences for patients.

Technical Summary

The validity of research based on CPRD data relies on the quality of the information recorded by GPs. Validation studies have suggested that the validity of clinical diagnoses of specific conditions is generally quite good, as evidenced by strong measures of positive predictive value (PPV), sensitivity and specificity. This is especially true of chronic diseases that are managed in a primary care setting, but less so for acute conditions that require hospitalisation.

Previous work has established the reliability of a diagnosis of IPF in GP records (PPV=95%). Given that IPF and other ILDs are almost always diagnosed in secondary care, this finding is not unexpected and it is generally accepted that a patient with an ILD diagnosis in their primary care record will indeed have the disease. However, questions remain about the completeness of the capture of cases of ILD in primary care data, the granularity of the recording in terms of ILD subtype, and whether GP recording practices have altered over time.

This validation study is designed to complement efforts to better characterise the incidence, prevalence and mortality associated with a number of important ILD subtypes, data for which are lacking at the present time but which are needed in order to support further research into this group of lung diseases. In the first instance, we will conduct a descriptive analysis in order to inform our understanding of how ILD is coded in primary and secondary care. We will then carry out investigations to see how well we can identify cases of specific ILDs using simple diagnostic code lists, and/or whether we need to use combinations of codes to improve case capture. We will also explore the possibility of developing an algorithm to identify patients with progressive fibrosing ILD, this being a patient group of particular clinical significance.

Health Outcomes to be Measured

Number (and frequency) of ILD code occurrences (Read codes in CPRD GOLD and AURUM, and ICD-10 codes in HES/ONS);
- Crude estimates for prevalence and incidence of all ILD, and specific ILD subtypes (based on case identification in CPRD, using clinical diagnostic codes only);
- alternative strategies or algorithms for identifying cases of ILD by subtype in CPRD data, should reliance on simple lists of diagnostic clinical codes prove to be suboptimal.

Collaborators

Jennifer Quint - Chief Investigator - Imperial College London
Ann Morgan - Corresponding Applicant - Imperial College London
Peter George - Collaborator - Royal Brompton Hospital
Rikisha Shah Gupta - Collaborator - Imperial College London

Linkages

HES Accident and Emergency;HES Admitted Patient Care;HES Diagnostic Imaging Dataset;HES Outpatient;ONS Death Registration Data