Evaluating the performance of cutting-edge statistical methods in reducing confounding bias using plasmode simulations of electronic health record data

Study type
Protocol
Date of Approval
Study reference ID
21_000484
Lay Summary

This study will focus on state-of-the-art statistical methods – High-Dimensional Propensity Scores, Targeted Maximum Likelihood Estimation, Neural Networks, Support Vector Machines, decision trees, and meta-classifiers - that aim at estimating unbiased treatment effect. To evaluate their performance, a simulation method called Plasmode will allow to replicate the electronic health record database structure to obtain realistic simulations under a chosen treatment effect.

Using electronic health record data implies specific issues due to their observational nature. In such studies, unlike in randomized trials, it is not possible to control the assignment of patients in each studied group, that leads to selection bias and confounding issues. Specific statistical methods must be used to remove this bias and propensity scores are the most commonly used method to handle confounding bias. However, this last decade has seen the rise of an increasing number of algorithmic and machine learning methods that may achieve better confounding bias removal. To evaluate these methods performance on electronic health record data, the most adequate option is to use plasmode simulations that maintain the original data structure so that the causal pathway between variables remains realistic.

The objective of this study is to evaluate the performance of these state-of-the-art statistical methods on estimating unbiased treatment effect using Plasmode simulations of electronic health record data. If these methods are shown to perform better than standard propensity score method, then they could be used more systematically to study exposure effects in electronic health record data.

Technical Summary

Propensity score is the probability of receiving a given treatment based on various baseline characteristics. This method is widely recommended to handle confounding bias in comparative research [1. Austin 2011 ]. Logistic regression is almost exclusively used to estimate propensity score, and the baseline characteristics are generally simply selected by clinician. Although it may prove accurate in many cases, residual bias may arise from these naïve modelling choices, and alternative methods may be more accurate.

High-Dimensional propensity score (hdPS) is an algorithm that automatically selects the covariates to be included in the propensity score model based on their potential for confounding – provided by Bross’ formula [2. Schneeweiss 2009 ]. This method is advocated as a powerful way to select the right baseline covariates for eradicating confounding bias.

Neural Networks, Support Vector Machines (SVM), decision trees (CART), and meta-classifiers are alternatives to logistic regression in estimating propensity scores, with fewer assumptions and supposedly at least as great accuracy [3. Westreich 2010 ].

Targeted Maximum Likelihood Estimation (TMLE) is a general algorithm for the construction of double-robust, semiparametric, efficient substitution estimators based on the targeted minimum loss-based estimation and machine learning algorithms to minimise the risk of model misspecification [4. Van der Laan 2011 ].

To evaluate all these methods, we will replicate a previous study (ISAC protocol n°16_166R) whose main objective was to assess the effectiveness of initiating therapy with one or two drug classes in hypertension disease management [5. Marinier 2019]. New users initiating bi-therapy will be matched (1:2) to those initiating monotherapy using propensity scores built with above methods and the outcome of time to blood pressure control will be analysed with a Cox proportional hazards model – except for TMLE that directly estimates the hazard ratio in a non-parametric model on the entire patient set.

Health Outcomes to be Measured

Being a methodological study, the focus will be on only one outcome – the time to blood pressure control – that was the primary outcome of the previous study ISAC n°16_166R [5. Marinier 2019] .

Collaborators

Virginie SIMON - Chief Investigator - IRIS - Institut de Recherches Internationales Servier
Virginie SIMON - Corresponding Applicant - IRIS - Institut de Recherches Internationales Servier
Adrien Billaud - Collaborator - IRIS - Institut de Recherches Internationales Servier
Gauvain Youdom - Collaborator - IRIS - Institut de Recherches Internationales Servier
Jade Vadel - Collaborator - IQVIA Operations France SAS

Linkages

HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation