Javascript must be enabled to continue!
On Heterogeneous Ensembles for Anomaly Detection: Ask Three Experts
View through CrossRef
Algorithms for anomaly detection differ in how they conceptualize anomalies. When the nature and distribution of anomalies is unknown, combining diverse approaches should therefore improve robustness. However, heterogeneous ensembles remain uncommon in both studies and practice. We investigate heterogeneous ensembles and focus on three often overlooked questions: (a) how much does combining methods improve detection? (b) how do score normalization and aggregation affect ensembles? (c) does the evaluation metric influence optimal design? Our study considers structured tabular data with semantically independent features, providing key principles for designing effective ensembles. Experiments with 14 established algorithms on real and synthetic data show that small heterogeneous ensembles outperform individual methods (including deep learning), homogeneous ensembles, and larger ensembles. Linear normalization and mean aggregation offer stable performance across sizes and metrics. Optimal ensemble size depends on the evaluation criterion: top-focused metrics (e.g., Average Precision) favor small ensembles, while ranking metrics (e.g., AUROC) prefer slightly larger ones. Based on these findings, we recommend a heterogeneous ensemble of three complementary methods as a robust default. Given that the best individual method is dataset-dependent, this dataset-agnostic ensemble provides practical relevance. Overall, this work addresses a key gap in anomaly detection research and provides guidance for its application.
Title: On Heterogeneous Ensembles for Anomaly Detection: Ask Three Experts
Description:
Algorithms for anomaly detection differ in how they conceptualize anomalies.
When the nature and distribution of anomalies is unknown, combining diverse approaches should therefore improve robustness.
However, heterogeneous ensembles remain uncommon in both studies and practice.
We investigate heterogeneous ensembles and focus on three often overlooked questions: (a) how much does combining methods improve detection? (b) how do score normalization and aggregation affect ensembles? (c) does the evaluation metric influence optimal design? Our study considers structured tabular data with semantically independent features, providing key principles for designing effective ensembles.
Experiments with 14 established algorithms on real and synthetic data show that small heterogeneous ensembles outperform individual methods (including deep learning), homogeneous ensembles, and larger ensembles.
Linear normalization and mean aggregation offer stable performance across sizes and metrics.
Optimal ensemble size depends on the evaluation criterion: top-focused metrics (e.
g.
, Average Precision) favor small ensembles, while ranking metrics (e.
g.
, AUROC) prefer slightly larger ones.
Based on these findings, we recommend a heterogeneous ensemble of three complementary methods as a robust default.
Given that the best individual method is dataset-dependent, this dataset-agnostic ensemble provides practical relevance.
Overall, this work addresses a key gap in anomaly detection research and provides guidance for its application.
Related Results
Ensembles of ensembles of ensembles: On using low-dimensional nonlinear systems to design climate prediction experiments
Ensembles of ensembles of ensembles: On using low-dimensional nonlinear systems to design climate prediction experiments
<p>The challenges of climate prediction are varied and complex. On the one hand they include conceptual and mathematical questions relating to the consequences of mod...
Bifix codes, Combinatorics on Words and Symbolic Dynamical Systems
Bifix codes, Combinatorics on Words and Symbolic Dynamical Systems
Codes bifixes, combinatoire des mots et systèmes dynamiques symboliques
L'étude des ensembles de mots complexité linéaire joue un rôle très important dans la théori...
Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active Learning
Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active Learning
Anomaly detection (AD) task corresponds to identifying the true anomalies among a given set of data instances. AD algorithms score the data instances and produce a ranked list of c...
A systematic survey: role of deep learning-based image anomaly detection in industrial inspection contexts
A systematic survey: role of deep learning-based image anomaly detection in industrial inspection contexts
Industrial automation is rapidly evolving, encompassing tasks from initial assembly to final product quality inspection. Accurate anomaly detection is crucial for ensuring the reli...
The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review
The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review
In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different...
MedicalCLIP: Anomaly-Detection Domain Generalization with Asymmetric Constraints
MedicalCLIP: Anomaly-Detection Domain Generalization with Asymmetric Constraints
Medical data have unique specificity and professionalism, requiring substantial domain expertise for their annotation. Precise data annotation is essential for anomaly-detection ta...
Labeling Expert: A New Multi-Network Anomaly Detection Architecture Based on LNN-RLSTM
Labeling Expert: A New Multi-Network Anomaly Detection Architecture Based on LNN-RLSTM
In network edge computing scenarios, close monitoring of network data and anomaly detection is critical for Internet services. Although a variety of anomaly detectors have been pro...
Anomaly Detection in Multivariate Time Series Using Uncertainty Estimation
Anomaly Detection in Multivariate Time Series Using Uncertainty Estimation
Abstract
Today’s industrial machines are equipped with several sensors that detect environmental changes and generate time series. One challenging task is the detection o...

