Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators

View through CrossRef
Background: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties. Methods: We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators. Results: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage. Conclusions: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.
Ovid Technologies (Wolters Kluwer Health)
Title: Machine Learning for Causal Inference: On the Use of Cross-fit Estimators
Description:
Background: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions.
However, the use of machine learning may result in complications for inference.
Doubly robust cross-fit estimators have been proposed to yield better statistical properties.
Methods: We conducted a simulation study to assess the performance of several different estimators for the average causal effect.
The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities.
We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation).
We estimated nuisance functions with parametric models and ensemble machine learning separately.
We further assessed doubly robust cross-fit estimators.
Results: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage.
When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
Conclusions: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies.
However, these approaches may require larger sample sizes to avoid finite-sample issues.

Related Results

Causal discovery and prediction: methods and algorithms
Causal discovery and prediction: methods and algorithms
(English) This thesis focuses on the discovery of causal relations and on the prediction of causal effects. Regarding causal discovery, this thesis introduces a novel and generic m...
A Practical Guide to Causal Inference in Three-Wave Panel Studies
A Practical Guide to Causal Inference in Three-Wave Panel Studies
Causal inference from observational data poses considerable challenges. This guide explains an approach to estimating causal effects using panel data focussing on the three-wave pa...
Operational decision-making with machine learning and causal inference
Operational decision-making with machine learning and causal inference
Optimizing operational decisions, routine actions within some business or operational process, is a key challenge across a variety of domains and application areas. The increasing ...
Research Paradigms and the Strengthening of Causal Inference in Epidemiology
Research Paradigms and the Strengthening of Causal Inference in Epidemiology
Changes in research paradigms and theories about disease causation have frequently led to refinements in frameworks for causal inference. Among the most promising paradigm shifts i...
Causal Inference and Scientific Paradigms in Epidemiology
Causal Inference and Scientific Paradigms in Epidemiology
This anthology of articles on causal inference and scientific paradigms in epidemiology covers several important topics including the search for causal explanations, the strengths ...
Generalized Inequalities to Optimize the Fitting Method for Track Reconstruction
Generalized Inequalities to Optimize the Fitting Method for Track Reconstruction
A standard criterium in statistics is to define an optimal estimator as the one with the minimum variance. Thus, the optimality is proved with inequality among variances of competi...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
Reflections Of Zoltan P. Dienes On Mathematics Education
Reflections Of Zoltan P. Dienes On Mathematics Education
The name of Zoltan P. Dienes (1916- ) stands with those ofJean Piaget, Jerome Bruner, Edward Begle, and Robert Davis as legendary figures whose work left a lasting impression on th...

Back to Top