Javascript must be enabled to continue!
A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data
View through CrossRef
AbstractBackgroundPrior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets.MethodsWe propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin’s rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women’s Health Across the Nation (SWAN).ResultsThe simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications.ConclusionThe proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies.
Springer Science and Business Media LLC
Title: A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data
Description:
AbstractBackgroundPrior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR).
This approach however is computationally expensive, especially on large-scale datasets.
MethodsWe propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin’s rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data.
We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods.
We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women’s Health Across the Nation (SWAN).
ResultsThe simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data.
RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value.
In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors.
Furthermore, RR-BART offers substantial computational savings.
When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications.
ConclusionThe proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies.
Related Results
Perceptions of Telemedicine and Rural Healthcare Access in a Developing Country: A Case Study of Bayelsa State, Nigeria
Perceptions of Telemedicine and Rural Healthcare Access in a Developing Country: A Case Study of Bayelsa State, Nigeria
Abstract
Introduction
Telemedicine is the remote delivery of healthcare services using information and communication technologies and has gained global recognition as a solution to...
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Abstract
Background
The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
AbstractThe long-range spin-triplet supercurrent transport is an interesting phenomenon in the superconductor/ferromagnet ("Equation missing") heterostructure containing noncolline...
How is missing data handled in cluster randomized controlled trials? A review of trials published in the NIHR Journals Library 1997–2024
How is missing data handled in cluster randomized controlled trials? A review of trials published in the NIHR Journals Library 1997–2024
Background:
Cluster randomized controlled trials are increasingly used to evaluate the effectiveness of interventions in clinical and public health research. However, m...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct
Introduction
Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Abstract
Introduction
Microwave ablation (MWA) has emerged as a minimally invasive treatment for patients with inoperable non-small cell lung cancer (NSCLC). However, whether it i...
High Expression of AMIGO2 Is an Independent Predictor of Poor Prognosis in Pancreatic Cancer
High Expression of AMIGO2 Is an Independent Predictor of Poor Prognosis in Pancreatic Cancer
Abstract
Background.The AMIGO2 extracellular domain has a leucine - rich repetitive domain (LRR) and encodes a type 1 transmembrane protein , and is a member of the AMIGO g...

