A risk score for pancreatic cancer diagnosis using machine learning techniques applied to linked routine data: full case-control study and economic evaluation

Study type
Protocol
Date of Approval
Study reference ID
22_001741
Lay Summary

Most pancreatic patients are diagnosed too late to be treated. Screening the whole population is neither practical nor appropriate, because so few people develop the disease.

We recently conducted a research study to examine whether pancreatic cancer patients could be diagnosed earlier using information from their GPs. We examined data on symptoms reported by pancreatic cancer patients in the 2 years before their diagnosis and compared these to those reported by similar patients who did not have pancreatic cancer. Using a type of artificial intelligence; ‘machine learning’, we were able to train a computer to spot which patients were at higher risk of pancreatic cancer, using their health records. This group of people is potentially one which could be screened if an appropriate blood or urine test were available. These results were promising but much more needs to be done to improve the approach in order to make it more successful and more cost-effective.

In this new study we will extend our original research project. We will use an improved approach, more patients and more up-to-date information. Once we have an improved score we will work out how many patients would have been recommended for screening had the tool had been available in the past, how cancers would have been diagnosed early if these individuals had been screened, and whether this tool is cost-effective to the NHS.

This study has the potential to help increase the number of pancreatic cancer patients diagnosed early, improving their options for treatment and survival.

Technical Summary

Most pancreatic cancer patients do not present with early alarm symptoms and are consequently diagnosed predominantly with late stage disease for which curative treatment is rarely possible. Substantial progress has recently been made in biomarker research and it is now a realistic possibility that a simple blood or urine test for the disease may soon be available. However, the only way such a test could be cost-effective for increasing early diagnosis within the NHS setting is if it were performed only on a high-risk sub-population (targeted screening).We have recently conducted a pilot study using CPRD linked to cancer registrations to evaluate whether such a sub-population can be determined from population-based sources using machine learning. We were able to identify 41-43% of patients 17-20 months prior to diagnosis (AUC 61-66%). Further work is now required to fine-tune the model parameters and test its utility in a ‘real world’ setting.

In this study we will first repeat our pilot study analyses using a larger number of much more up-to-date primary care records matched to population-based (rather than cancer patient) controls, working with clinicians and patients to refine our symptom lists and approach, as well as conducting stratified analyses for, amongst others, smokers and diabetic patients. Once established, we will test the algorithm’s performance had it been applied in ‘real time’, estimating the number of patients whom would have been recommended for biomarker testing and the number of cancers which later emerged amongst these patients (those which potentially could have been diagnosed earlier). Finally, we will conduct an economic evaluation of the cost-effectiveness of targeted screening using this algorithm within the NHS setting. This study has the potential to rapidly and significantly increase the proportion of pancreatic cancer patients diagnosed early.

Health Outcomes to be Measured

In Phase 1: Probability of future primary pancreatic cancer diagnosis (ICD-10 code C25)

In Phase 2: Estimated proportion of pancreatic cancers which could be diagnosed early in 'real time'

In Phase 3: Resource use, healthcare costs and survival for the identified cases of pancreatic cancer

Collaborators

Laura Woods - Chief Investigator - London School of Hygiene & Tropical Medicine ( LSHTM )
Laura Woods - Corresponding Applicant - London School of Hygiene & Tropical Medicine ( LSHTM )
Ananya Malhotra - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Bernard Rachet - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )
Han-I Wang - Collaborator - University of York
Morghan Hartmann - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )

Former Collaborators

Arron Gosnell - Collaborator - London School of Hygiene & Tropical Medicine ( LSHTM )

Linkages

HES Admitted Patient Care;NCRAS Cancer Registration Data;NCRAS Systemic Anti-Cancer Treatment (SACT) data;ONS Death Registration Data;Patient Level Index of Multiple Deprivation Domains;Rural-Urban Classification