Development, external validation and health economic evaluation of primary care diagnostic prediction models for upper gastro-intestinal cancers

Study type
Protocol
Date of Approval
Study reference ID
23_002840
Lay Summary

Most patients with upper gastro-intestinal cancers have symptoms and visit their family doctor multiple times before being diagnosed. This suggests that their cancer could be picked up earlier, which would open more opportunities for treatment to help them live longer. However, many of the symptoms of upper gastro-intestinal cancer are common in patients who don’t have cancer, so it can be difficult to decide which patients should be referred to the hospital for urgent tests.

The aim of this study is to develop an accurate and cost-effective tool to help GPs spot patients at increased risk of having upper gastro-intestinal cancer earlier. We will use anonymised primary care and linked cancer registry, hospital, imaging and deaths information. We will develop statistical models which will tell us how likely cancer is in each patient. The models will include information on symptoms, tests and prescriptions. Taken together, this information could indicate whether cancer is likely to be present. In addition to traditional statistical methods, we will use newer ‘machine learning’ (artificial intelligence) to develop statistical models. This will allow us to take account of changes in information over time which may be important. We will test how well the models work and explore whether they would be useful to doctors. Finally, we will explore the impact of using the models on health outcomes (e.g. how many patients are cured) and healthcare resources. This research has the potential for public benefit as the models could aid early detection of upper gastro-intestinal cancer.

Technical Summary

Outcomes for the commonest upper gastro-intestinal (UGI: oesophago-gastric, pancreatic, gallbladder and biliary tract) cancers remain poor. Most UGI cancer patients have symptoms and multiple GP consultations in the two years pre-diagnosis, suggesting earlier detection is possible, with opportunities to treat with curative intent and improve outcomes. However, many of the symptoms of UGI cancer are common in primary care and it can be difficult to determine which patients should be referred for urgent investigation.

Available diagnostic prediction models do not account for temporal changes in predictors and only provide risk estimates for individual UGI cancers. In this study, we will utilise primary care (CPRD) and linked cancer registry (NCRAS, SACT), hospital (HES), imaging (DID), deprivation (IMD) and mortality (ONS) data to develop and validate a diagnostic prediction model for UGI cancer. A linked CPRD Aurum dataset will be used to develop models using both traditional methods and machine learning approaches, which will allow us to take account of dynamic changes in predictors. Variable selection will be informed by a systematic literature review. We will empirically compare the performance of conventional vs machine learning techniques (including deep neural networks, support vector machines and random forests) to identify the best performing approach. Models will be externally validated using CPRD GOLD. We will assess model calibration and discrimination and compare model performance at a range of risk thresholds, including the NICE 3% risk threshold for urgent cancer referral. We will perform health economic analyses to compare model ‘action thresholds’ and determine how the models might be used to best effect within the diagnostic pathway. This project fits within a larger package of work which aims to develop an UGI multi-cancer early detection (MCED) Platform (CanDetect). The research has the potential for public benefit as the models could aid early detection of UGI cancer.

Health Outcomes to be Measured

The primary clinical outcome is a diagnosis of an UGI cancer, as recorded in NCRAS data, within the 24 months following study entry (defined below). Model performance for alternative follow-up periods and for early stage cancer (stage I-II) will be explored in sensitivity analyses.
Further outcomes of interest to inform the health economic analysis will include cancer diagnostic tests; cancer stage at diagnosis; cancer treatments received; cancer recurrences; and cause-specific mortality.

Collaborators

Garth Funston - Chief Investigator - Queen Mary University of London
Garth Funston - Corresponding Applicant - Queen Mary University of London
Borislava Mihaylova - Collaborator - Barts and the London Queen Mary's School of Medicine and Dentistry
Didjier Masangwi - Collaborator - Queen Mary University of London
Fiona Walter - Collaborator - Queen Mary University of London
Judith Offman - Collaborator - Queen Mary University of London
Kirsten Arendse - Collaborator - Queen Mary University of London
Laura Woods - Collaborator - Newcastle University
Nikki Yi Ting Yu - Collaborator - Queen Mary University of London
Oleg Blyuss - Collaborator - Barts and the London Queen Mary's School of Medicine and Dentistry
Pawandeep Virpal - Collaborator - Queen Mary University of London
Rohini Mathur - Collaborator - Queen Mary University of London
Runguo Wu - Collaborator - Barts and the London Queen Mary's School of Medicine and Dentistry
Tahania Ahmad - Collaborator - Queen Mary University of London
Tyler Saunders - Collaborator - Queen Mary University of London

Linkages

HES Admitted Patient Care;HES Diagnostic Imaging Dataset;HES Outpatient;NCRAS Cancer Registration Data;NCRAS Systemic Anti-Cancer Treatment (SACT) data;ONS Death Registration Data;Patient Level Index of Multiple Deprivation