Validation of CPRD database transformed to Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)

Study type
Protocol
Date of Approval
Study reference ID
19_044
Lay Summary

The Observational Medical Outcomes Partnership (OMOP) was a public-private partnership managed by the US not-for-profit organisation Foundation for the National Institute of Health. The OMOP created infrastructure to map different types of data sources (mainly medical claims data and Electronic Medical Records) into a Common Data Model (CDM) format. At Amgen, we have transformed the CPRD GOLD database into a customised version of the OMOP v 4 format. The transformation enables us to optimize use of the CPRD database alongside other large data sources. In using the OMOP version of CPRD instead of the data in the raw format, we are able to use similar program code to interrogate this database and others that have different structures in their native forms but have been transferred into a common structure. This allows us to develop standard programs and tools to produce information from those transformed databases.
As we will aim to publish research done using the CPRD in future studies using our OMOP version of the database, we aim to also publish a methodological paper that we can reference that shows that we have our validated our OMOP version of the CPRD data and that the same results would be produced using the data in the raw format or in the OMOP format.

Technical Summary

In order to validate the OMOP version of the database against the raw CPRD, the number of unique data rows will be compared between the raw CPRD data and the equivalent OMOP CDM-transformed CPRD data for the following tables: Practice, Patient, Clinical, Referral, Test, Therapy.
In addition, estimates of lifetime point prevalence, period prevalence and incidence rates will be produced. We will produce these rates in the raw and OMOP versions of the database for twenty diseases and thirteen therapies chosen for their relevance to Amgen's products and for two lifestyle factors (smoking status and BMI) and two lab results.

We will produce two rates using the raw CPRD: Those that include information before a patient's observation start date (the greater of the UTS, up-to-standard date and the CRD, current-registration-date) and those that do not include this information. The standard version 4 of the OMOP CDM doesn't include the information before a patient's observation start date. We expect to show that the information prior to a patient's observation start date is necessary for calculating accurate incidence and prevalence rates and to show we get an exact match when we use the OMOP version of the database when this information is included.

No statistical tests will be carried out.

Health Outcomes to be Measured

Counts of rows in tables in OMOP and raw CPRD; estimates of lifetime point prevalence for diseases, prescriptions, lifestyle factors and tests; estimates of period prevalence for diseases, prescriptions, lifestyle factors and tests; estimates of incidence rates for diseases, prescriptions, lifestyle factors and tests

Collaborators

Joe Maskell - Chief Investigator - Amgen Ltd
Olga Archangelidi - Corresponding Applicant - Amgen Ltd
David Neasham - Collaborator - Amgen Ltd
George Kafatos - Collaborator - Amgen Ltd
Maurille Feudjo Tepie - Collaborator - Amgen Ltd