Javascript must be enabled to continue!
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators
View through CrossRef
Background:
Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties.
Methods:
We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators.
Results:
With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
Conclusions:
Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.
Title: Machine Learning for Causal Inference: On the Use of Cross-fit Estimators
Description:
Background:
Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions.
However, the use of machine learning may result in complications for inference.
Doubly robust cross-fit estimators have been proposed to yield better statistical properties.
Methods:
We conducted a simulation study to assess the performance of several different estimators for the average causal effect.
The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities.
We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation).
We estimated nuisance functions with parametric models and ensemble machine learning separately.
We further assessed doubly robust cross-fit estimators.
Results:
With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage.
When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
Conclusions:
Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies.
However, these approaches may require larger sample sizes to avoid finite-sample issues.
Related Results
Causal discovery and prediction: methods and algorithms
Causal discovery and prediction: methods and algorithms
(English) This thesis focuses on the discovery of causal relations and on the prediction of causal effects. Regarding causal discovery, this thesis introduces a novel and generic m...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Generalized Estimator of Population Variance utilizing Auxiliary Information in Simple Random Sampling Scheme
Generalized Estimator of Population Variance utilizing Auxiliary Information in Simple Random Sampling Scheme
In this study, using the Simple Random Sampling without Replacement (SRSWOR) method, we propose a generalized estimator of population variance of the primary variable. Up to the fi...
Efficient Class of Variance Estimators for Population using Supplementary Information in Stratified Random Sampling
Efficient Class of Variance Estimators for Population using Supplementary Information in Stratified Random Sampling
This paper addresses an efficient class of variance estimators for population using stratified random sampling. The suggested class of estimators using supplementary information ha...
A Practical Guide to Causal Inference in Three-Wave Panel Studies
A Practical Guide to Causal Inference in Three-Wave Panel Studies
Causal inference from observational data poses considerable challenges. This guide explains an approach to estimating causal effects using panel data focussing on the three-wave pa...
Improved Mean Estimators for Population utilizing Dual Supplementary Characteristics under Simple Random Sampling
Improved Mean Estimators for Population utilizing Dual Supplementary Characteristics under Simple Random Sampling
This paper makes another addition to the existing literature of population mean estimation. An improved family of mean estimators for the population is suggested using simple rando...
Use of causal claims in observational studies: a research on research study
Use of causal claims in observational studies: a research on research study
Abstract
Objective
To evaluate the consistency of causal statements in the abstracts of observational studies published in The ...

