A machine learning approach to classify types of headache from laboratory blood test results

Study type
Protocol
Date of Approval
Study reference ID
20_000173
Lay Summary

Headaches are among the most common disorders of the nervous system. Primary headaches, such as migraine and other not life-threatening disorders, can cause substantial levels of disability. Headache can also be caused by or occur secondarily as a symptom of other conditions such as non-traumatic intracranial hemorrhage, which can be more dangerous and life threatening. For patients with secondary headaches it can be a matter of time that determines their prognosis, and physicians must determine quickly which patients may need an immediate referral to the emergency department, a non-urgent referral to the neurologist, or further more invasive medical examinations such as neuroimaging and spinal tap. There is an unmet need in the emergency department for a less time-consuming triaging tool that supports physicians to quickly differentiate which patients are more likely to have secondary headaches and need urgent care.

It is suggested that the right combination of parameters (counts and ratios of blood cells) from results of a routine laboratory blood test (e.g. complete blood count) can help to quickly differentiate patients with primary headaches and secondary headaches. The purpose of this proposed study is to analyze whether the laboratory blood test results can support the distinction and classification of primary versus secondary headache, using data science techniques (machine learning) with CPRD data. The findings from this study will provide opportunities of using data science techniques to help physicians make a faster and more informed decision on the distinction of the two headache types to enable more appropriate care.

Technical Summary

This is a retrospective cross-sectional study using secondary data from the CPRD GOLD and CPRD Aurum databases in the UK. The study aims to develop a machine-learning (ML) approach based on laboratory blood test results to aid in classification of primary and secondary headaches for patients presented with headache symptoms at primary care encounters.

The proposed study will include patients who visited general practitioner (GP) practices in the UK for a complaint of headache symptom, received a specific diagnosis that can classify the headache as either primary or secondary headache and had laboratory blood test results recorded within a one-month time period in the CPRD system after the first GP visit.

In this study, the primary headaches will be classified according to diagnoses of migraine, tension-type headache and cluster headache. The secondary headaches will be classified according to diagnoses of ischemic stroke, cerebral venous thrombosis, hemorrhage, arteritis, and angiitis.

We will first describe patients’ demographic and clinical characteristics including laboratory blood test results) for the whole study population and for primary and secondary headache groups separately.

Then, to build a predictive model that helps classify two types of headache, we will examine two supervised ML algorithms, namely logistic regression and random forest. The parameters selected for the predictive model will include laboratory blood test results (counts of red blood cell, platelet, white blood cell, neutrophil, lymphocyte, monocyte, eosinophil and basophil, mean corpuscular volume, and hemoglobin) and ratios of these parameters. The performance of the predictive model between two algorithms will be evaluated by standard classification metrics including confusion matrix, accuracy, balanced accuracy, average precision score, F1-score and area under the curve.

It should be noted that the final predictive model will not be suitable for all patients with headache as those without a specific diagnosis will not be included.

Health Outcomes to be Measured

Primary causes of headaches including diagnoses of migraine, tension-type headache and cluster headache; Secondary causes of headaches including diagnoses of ischemic stroke, cerebral venous thrombosis, hemorrhage, arteritis, and angiitis.

Only patients presented with a headache symptom and received a specific diagnosis as mentioned above that can classify the headache as either primary or secondary will be included (please also see following sections - "Definition of the Study population" and "operational definition of outcomes" for details). Patients with headache as a symptom caused by other reasons, e.g. having a cold, drinking too much alcohol or dehydration will not be included in this study. Potential bias will be considered and described in the section - "Limitations of the study design, data sources, and analytic methods".

Collaborators

Carsten Magnus - Chief Investigator - F. Hoffmann - La Roche Ltd
Fei Yang - Corresponding Applicant - Roche
Benjamin Torben-Nielsen - Collaborator - F. Hoffmann - La Roche Ltd
Carsten Magnus - Collaborator - F. Hoffmann - La Roche Ltd
Iori Namekawa - Collaborator - Roche
Tong Meng - Collaborator - Roche Molecular Systems, Inc
Tony Chuang Liu - Collaborator - Alexion Pharma GmbH ( Switzerland )

Former Collaborators

Emilie Dejean - Collaborator - Celgene Corporation
Emilie DĂ©jean - Collaborator - Not from an Organisation
Marie Stobbe - Collaborator - F. Hoffmann - La Roche Ltd
Shu Wang - Collaborator - Genesis Research LLC
Simon Davidson - Collaborator - F. Hoffmann - La Roche Ltd
Chuang Liu - Collaborator - F. Hoffmann - La Roche Ltd