Databases of routinely collected data contain information collected electronically from several hundred general practices and for many millions of patients, stretching back over two decades or more. Their use in research is increasing rapidly and this work focuses around three under-developed aspects of database research:
1) Duplication of effort. We will create freely available computer software to automate a number of key commonly used and time-consuming data processes, thereby speeding up processing time and reducing project time-scales and costs.
2) Validity and replicability. In research using Primary Care Databases (PCDs) many key factors, such as the disease conditions a patient has and the treatments they are receiving, are defined using lists of relevant clinical codes, but very little detail about these lists is published and each research group develops their own lists. We aim to increase consistency in how clinical code lists are developed and used across the research community.
3) Data quality. PCD-based research is relatively new and there are a number of issues around incorrect and missing data that still need to be addressed. Methods will be developed to deal with this issue.
Routinely collected electronic medical record (EMR) databases are rich sources of data for health research. Although lacking the rigour of Randomised Controlled Trials (RCTs) and potentially affected by bias from uncontrolled factors, these databases allow the investigation of research questions which may not be feasible to address by other means. The UK leads the world in Primary Care Databases (PCDs), which collate data from the electronic records of patients registered with large numbers of general practices, benefitting from the almost complete computerisation of UK primary care. Publications using PCDs and the Clinical Practice Research Datalink (CPRD) in particular attract global research interest and applications are becoming more sophisticated, and the demands made on the data greater, as the field develops. This work focuses on three problematic or under-developed aspects of PCD-based research studies: a) reducing duplication of effort; b) increasing validity and reproducibility; c) improving analysis methodologies. To address these issues we will develop and maintain an online repository and also develop methods and software to perform power calculations, extract data and impute missing data. Our aim to increase the efficiency and transparency with which PCD-based research is conducted and the validity of the findings resulting from that research.
Health Outcomes to be Measured:
Asthma, atrial fibrillation, cancer, coronary heart disease, chronic kidney disease, chronic obstructive pulmonary disease, dementia, depression, diabetes mellitus, epilepsy, heart failure, hypertension, hypothyroidism, learning disability, osteoarthritis, osteoporosis, severe mental illness, stroke, and Body Mass Index