Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Outlier detection for mixed data

View through CrossRef
Abstract Outlier detection is crucial in various sectors such as finance, insurance, medicine, and IT security. Its application helps to identify suspicious behaviors and enhance the robustness of statistical models. However, a common challenge arises when dealing with data that includes both numerical and categorical attributes, as most traditional outlier detection methods are applicable only to numerical data. To overcome this limitation, this study proposes to apply the Factor Analysis on Mixed Data (FAMD) method to transform both types of attributes into numerical Principal Components (PCs). Traditional outlier detection methods are then applied to these PCs. The proposed method is evaluated on classic datasets from the supervised classification literature and two simulated data contaminated by different types of outliers : (a) global outliers which significantly deviate from most data points, (b) local outliers which are not necessarily extreme values but are considered abnormal within their specific context or neighborhood, (c) rare outliers which have unexpected categories compared to the typical data distribution, and (d) mixed outliers which can be both global and rare, or local and rare.
Springer Science and Business Media LLC
Title: Outlier detection for mixed data
Description:
Abstract Outlier detection is crucial in various sectors such as finance, insurance, medicine, and IT security.
Its application helps to identify suspicious behaviors and enhance the robustness of statistical models.
However, a common challenge arises when dealing with data that includes both numerical and categorical attributes, as most traditional outlier detection methods are applicable only to numerical data.
To overcome this limitation, this study proposes to apply the Factor Analysis on Mixed Data (FAMD) method to transform both types of attributes into numerical Principal Components (PCs).
Traditional outlier detection methods are then applied to these PCs.
The proposed method is evaluated on classic datasets from the supervised classification literature and two simulated data contaminated by different types of outliers : (a) global outliers which significantly deviate from most data points, (b) local outliers which are not necessarily extreme values but are considered abnormal within their specific context or neighborhood, (c) rare outliers which have unexpected categories compared to the typical data distribution, and (d) mixed outliers which can be both global and rare, or local and rare.

Related Results

Investigating Outlier Detection Techniques Based on Kernel Rough Clustering
Investigating Outlier Detection Techniques Based on Kernel Rough Clustering
Background: Data quality is crucial to the success of big data analytics. However, the presence of outliers affects data quality and data analysis. Employing effective outlier dete...
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of t...
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of m...
Outlier Detection and Correction for the Deviations of Tooth Profiles of Gears
Outlier Detection and Correction for the Deviations of Tooth Profiles of Gears
To decrease the influence of outlier on the measurement of tooth profiles, this paper proposes a method of outlier detection and correction based on the grey system theory. After s...
Improving Human Mobility Forecasts: A Study on Outlier Correction with Multi-Agent Techniques
Improving Human Mobility Forecasts: A Study on Outlier Correction with Multi-Agent Techniques
Abstract This research is part of designing and implementing an AutoML platform for time series forecasting. Having discussed missing value imputation and outlier detection...
Outlier detection and rejection in scatterplots: Do outliers influence intuitive statistical judgments?
Outlier detection and rejection in scatterplots: Do outliers influence intuitive statistical judgments?
According to a growing body of research, human adults are remarkably accurate at extracting intuitive statistics from graphs, such as finding the best-fitting regression line throu...
Multi-View Clustering-Based Outlier Detection for Converter Transformer Multivariate Time-Series Data
Multi-View Clustering-Based Outlier Detection for Converter Transformer Multivariate Time-Series Data
Online monitoring systems continuously collect massive multivariate time-series data from converter transformers. Accurate outlier detection in these data is essential for identify...
A Systematic Literature Review on Outlier Detection in Wireless Sensor Networks
A Systematic Literature Review on Outlier Detection in Wireless Sensor Networks
A wireless sensor network (WSN) is defined as a set of spatially distributed and interconnected sensor nodes. WSNs allow one to monitor and recognize environmental phenomena such a...

Back to Top