A data-driven approach for identifying falls subgroups through semantic similarity analysis

Date of Approval: 
2016-12-20 00:00:00
Lay Summary: 
Falls in the elderly are a serious health issue worldwide. 32% of adults aged above 65 years fall at least once a year. These falls are often serious and can lead to serious health problems, including disability and mortality. If we want to reduce the health impact of falls on people and the costs they impose on the health budget, then we need to understand what it is that make people more likely to have a fall. The data contained within the clinical practice research datalink (CPRD) could help find these factors. The most common way to use CPRD has been to answer very specific questions. For example, does having a particular disease cause some patients to experience falls? But are there other equally important questions that could be asked of the data that people haven't yet thought to ask? In this project we are using a new method to explore the CPRD data to find features or patterns in this data that can predict which people are most likely to have a dangerous fall. Some of these patterns we find will already be known, however this new strategy has the potential to find new and important associations.
Technical Summary: 
The objective of this study is to determine how to find good hypotheses from large health datasets. Our primary question is, how should we let the data guide us to find good questions? How do we use the data to learn what good questions to ask the data? Traditionally, population-based studies are used to ask whether there is a support for particular hypotheses. Instead, can we use CPRD data to identify what the good questions are? We are exploring a new strategy for diagnostic hypotheses formulation to analyse medical data. We plan to map CPRD data into a low dimensional space using semantic similarity analysis to provide a good representation of the data, in which visualisation and clustering are much easier. Then classic data mining techniques can be used in order to generate hypotheses that can be used to understand what the factors are that lead to falls and what those that are the conditions caused by falls? Falls provide a particularly rich data set for this study. It is a diagnosis that is relatively common in a population that has significant contact with general practitioners (older people), making it a suitable case study to test the methodology.
Health Outcomes to be Measured: 
GP records of fall as recorded via Read codes.
Application Number: 

Andy Brass - Chief Investigator - University of Manchester
Andy Brass - Corresponding Applicant - University of Manchester
Chris Todd - Collaborator - University of Manchester
Darren Ashcroft - Collaborator - University of Manchester
John Ainsworth - Collaborator - University of Manchester
Muhannad Almohaimeed - Collaborator - University of Manchester
Thamer Ba-Dhfari - Collaborator - University of Manchester
Tjeerd van Staa - Collaborator - University of Manchester