Javascript must be enabled to continue!
Investigating Outlier Detection Techniques Based on Kernel Rough Clustering
View through CrossRef
Background:
Data quality is crucial to the success of big data analytics. However, the
presence of outliers affects data quality and data analysis. Employing effective outlier detection
techniques to eliminate dirty data can improve data quality and garner more accurate analytical
insights. Data uncertainty presents a significant challenge for outlier detection methods and warrants further refinement in the era of big data.
Objective:
The unsupervised outlier detection based on the integration of clustering and outlier
scoring scheme is the current research hotspot. However, hard clustering fails when dealing with
abnormal patterns with uncertain and unexpected behavior. Rough boundaries help identify more
accurate cluster structures. Therefore, this article uses uncertainty soft clustering based on rough set
theory to extend the clustering technology and designs appropriate scoring schemes to capture abnormal instances. This solves the problem of outlier detection in uncertain and nonlinear complex
data.
Methods:
This paper proposes the flow of an outlier detection algorithm based on Kernel Rough
Clustering and then compares the detection accuracy with five existing popular methods using synthetic and real-world datasets. The results show that the proposed method has higher detection accuracy.
Results:
The detection precision and recall of the proposed method were improved. For the detection accuracy, it is superior to popular methods, indicating that the proposed method has a good
detection effect in identifying outlier.
Conclusion:
Compared with popular methods, the proposed method has a slight advantage in detection accuracy and is one of the effective algorithms that can be selected for outlier detection.
Bentham Science Publishers Ltd.
Title: Investigating Outlier Detection Techniques Based on Kernel Rough
Clustering
Description:
Background:
Data quality is crucial to the success of big data analytics.
However, the
presence of outliers affects data quality and data analysis.
Employing effective outlier detection
techniques to eliminate dirty data can improve data quality and garner more accurate analytical
insights.
Data uncertainty presents a significant challenge for outlier detection methods and warrants further refinement in the era of big data.
Objective:
The unsupervised outlier detection based on the integration of clustering and outlier
scoring scheme is the current research hotspot.
However, hard clustering fails when dealing with
abnormal patterns with uncertain and unexpected behavior.
Rough boundaries help identify more
accurate cluster structures.
Therefore, this article uses uncertainty soft clustering based on rough set
theory to extend the clustering technology and designs appropriate scoring schemes to capture abnormal instances.
This solves the problem of outlier detection in uncertain and nonlinear complex
data.
Methods:
This paper proposes the flow of an outlier detection algorithm based on Kernel Rough
Clustering and then compares the detection accuracy with five existing popular methods using synthetic and real-world datasets.
The results show that the proposed method has higher detection accuracy.
Results:
The detection precision and recall of the proposed method were improved.
For the detection accuracy, it is superior to popular methods, indicating that the proposed method has a good
detection effect in identifying outlier.
Conclusion:
Compared with popular methods, the proposed method has a slight advantage in detection accuracy and is one of the effective algorithms that can be selected for outlier detection.
Related Results
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background:
Clustering is one of the most important data mining methods. The k-means
(c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Genetic Variation in Potential Kernel Size Affects Kernel Growth and Yield of Sorghum
Genetic Variation in Potential Kernel Size Affects Kernel Growth and Yield of Sorghum
Large‐seededness can increase grain yield in sorghum [Sorghum bicolor (L.) Moench] if larger kernel size more than compensates for the associated reduction in kernel number. The ai...
Sorghum Kernel Weight
Sorghum Kernel Weight
The influence of genotype and panicle position on sorghum [Sorghum bicolor (L.) Moench] kernel growth is poorly understood. In the present study, sorghum kernel weight (KW) differe...
Physicochemical Properties of Wheat Fractionated by Wheat Kernel Thickness and Separated by Kernel Specific Density
Physicochemical Properties of Wheat Fractionated by Wheat Kernel Thickness and Separated by Kernel Specific Density
ABSTRACTTwo wheat cultivars, soft white winter wheat Yang‐mai 11 and hard white winter wheat Zheng‐mai 9023, were fractionated by kernel thickness into five sections; the fractiona...
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of t...
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of m...
Outlier Detection and Correction for the Deviations of Tooth Profiles of Gears
Outlier Detection and Correction for the Deviations of Tooth Profiles of Gears
To decrease the influence of outlier on the measurement of tooth profiles, this paper proposes a method of outlier detection and correction based on the grey system theory. After s...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

