Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

View through CrossRef
Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc. There are many measures available in the literature to define the distance between two numerical data objects. It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered. Only a few distance measures are available in the literature to find the similarities among categorical data objects. This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation. We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets. Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets. The performances are evaluated in the context of outlier detection task in data mining.
Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP
Title: Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data
Description:
Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc.
There are many measures available in the literature to define the distance between two numerical data objects.
It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered.
Only a few distance measures are available in the literature to find the similarities among categorical data objects.
This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation.
We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets.
Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets.
The performances are evaluated in the context of outlier detection task in data mining.

Related Results

Analysis of a Similarity Measure for Non-Overlapped Data
Analysis of a Similarity Measure for Non-Overlapped Data
A similarity measure is a measure evaluating the degree of similarity between two fuzzy data sets and has become an essential tool in many applications including data mining, patte...
Similarity Search with Data Missing
Similarity Search with Data Missing
Similarity search is a fundamental research problem with broad applications in various research fields, including data mining, information retrieval, and machine learning. The core...
An Improved Innovation Robust Outliers Detection Method for Airborne Array Position and Orientation Measurement System
An Improved Innovation Robust Outliers Detection Method for Airborne Array Position and Orientation Measurement System
The airborne array position and orientation measurement system (array POS) is a key device for high-resolution multi-dimensional real-time imaging motion compensation of military r...
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Integral experiments especially criticality experiments help a lot in designing either new nuclear reactor or criticality assembly. The calculation uncertainty of the integral para...
Research Note: A Study of Outliers of International Tourism Statistics
Research Note: A Study of Outliers of International Tourism Statistics
As international tourism is an industry that is easily impacted by external shocks, there is always structural mutation of the time series related with it, which causes the existen...
SNOMED CT Primitive Concept Similarity Measure by Concept Name Text Similarity Approach
SNOMED CT Primitive Concept Similarity Measure by Concept Name Text Similarity Approach
In the last few years, Concept Similarity Measures (CSMs) become important for the biomedical ontologies in order to find adaptable treatments from the conceptually similar disease...
A Method for Detecting Abnormal Changes in the Temperature Field of Grain Bulk Based on HSV Features of Cloud Maps
A Method for Detecting Abnormal Changes in the Temperature Field of Grain Bulk Based on HSV Features of Cloud Maps
HighlightsAbnormal grain temperature changes were detected by calculating the similarity of HSV features in cloud maps.The F-measures were higher for the improved method than for m...
Improved Cosine Similarity Measures for q-Rung Orthopair Fuzzy Sets
Improved Cosine Similarity Measures for q-Rung Orthopair Fuzzy Sets
In this paper, we introduce some novel cosine similarity measures forĀ \(q\)-rung orthopair fuzzy sets (\(q\)-ROFSs), which capture both direction and magnitude aspects of fuzzy set...

Back to Top