Evaluating the performance of cutting-edge statistical methods in reducing confounding bias using plasmode simulations of electronic health record data

Date of Approval
Application Number
Technical Summary

Propensity score is the probability of receiving a given treatment based on various baseline characteristics. This method is widely recommended to handle confounding bias in comparative research [1. Austin 2011 ]. Logistic regression is almost exclusively used to estimate propensity score, and the baseline characteristics are generally simply selected by clinician. Although it may prove accurate in many cases, residual bias may arise from these naïve modelling choices, and alternative methods may be more accurate.

High-Dimensional propensity score (hdPS) is an algorithm that automatically selects the covariates to be included in the propensity score model based on their potential for confounding – provided by Bross’ formula [2. Schneeweiss 2009 ]. This method is advocated as a powerful way to select the right baseline covariates for eradicating confounding bias.

Neural Networks, Support Vector Machines (SVM), decision trees (CART), and meta-classifiers are alternatives to logistic regression in estimating propensity scores, with fewer assumptions and supposedly at least as great accuracy [3. Westreich 2010 ].

Targeted Maximum Likelihood Estimation (TMLE) is a general algorithm for the construction of double-robust, semiparametric, efficient substitution estimators based on the targeted minimum loss-based estimation and machine learning algorithms to minimise the risk of model misspecification [4. Van der Laan 2011 ].

To evaluate all these methods, we will replicate a previous study (ISAC protocol n°16_166R) whose main objective was to assess the effectiveness of initiating therapy with one or two drug classes in hypertension disease management [5. Marinier 2019]. New users initiating bi-therapy will be matched (1:2) to those initiating monotherapy using propensity scores built with above methods and the outcome of time to blood pressure control will be analysed with a Cox proportional hazards model – except for TMLE that directly estimates the hazard ratio in a non-parametric model on the entire patient set.

Health Outcomes to be Measured

Being a methodological study, the focus will be on only one outcome – the time to blood pressure control – that was the primary outcome of the previous study ISAC n°16_166R [5. Marinier 2019] .


Virginie SIMON - Chief Investigator - IRIS - Institut de Recherches Internationales Servier
Virginie SIMON - Corresponding Applicant - IRIS - Institut de Recherches Internationales Servier
Adrien Billaud - Collaborator - IRIS - Institut de Recherches Internationales Servier
Gauvain Youdom - Collaborator - IRIS - Institut de Recherches Internationales Servier
Jade Vadel - Collaborator - IQVIA Operations France SAS


HES Admitted Patient Care;ONS Death Registration Data;Patient Level Index of Multiple Deprivation