Javascript must be enabled to continue!
GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
View through CrossRef
SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.
Title: GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
Description:
SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953).
While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al.
, 2020) is available for decision tree models.
However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles.
Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs.
In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units.
Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions.
With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.
2 GHz CPUs.
We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.
2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.
Related Results
Applying gradient tree boosting to QTL mapping with Shapley additive explanations
Applying gradient tree boosting to QTL mapping with Shapley additive explanations
Abstract
Mapping quantitative trait loci (QTLs) is one of the major goals of quantitative genetics; however, identifying the interactions between QTLs (i.e., epistasis)...
Ensembles of Random SHAPs
Ensembles of Random SHAPs
The ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to...
The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review
The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review
In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different...
Inter-specific variations in tree stem methane and nitrous oxide exchanges in a tropical rainforest
Inter-specific variations in tree stem methane and nitrous oxide exchanges in a tropical rainforest
<p>Tropical forests are the most productive terrestrial ecosystems, global centres of biodiversity and important participants in the global carbon and water cycles. T...
ADN program benchmarking using standardized exams for assessment and remediation
ADN program benchmarking using standardized exams for assessment and remediation
The purpose of this research investigation was to determine the correlational values between testing scores when utilizing the Assessment Technologies Institute™, LLC (ATI) standar...
Spatial patterns of argan-tree influence on soil quality of intertree areas in open woodlands of South Morocco
Spatial patterns of argan-tree influence on soil quality of intertree areas in open woodlands of South Morocco
Abstract. The endemic argan tree (Argania spinosa) populations in South Morocco are highly degraded due to overbrowsing, illegal firewood extraction and the expansion of intensive ...
The Sensitivity Feature Analysis for Tree Species Based on Image Statistical Properties
The Sensitivity Feature Analysis for Tree Species Based on Image Statistical Properties
While the statistical properties of images are vital in forestry engineering, the usefulness of these properties in various forestry tasks may vary, and certain image properties mi...
Superpixel Correlation for Explainable Image Classification
Superpixel Correlation for Explainable Image Classification
Abstract
Explainable AI (XAI) is essential for fostering trust in the predictions of deep neural networks (DNNs), especially within the domain of image classification. SH...

