Javascript must be enabled to continue!

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc. There are many measures available in the literature to define the distance between two numerical data objects. It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered. Only a few distance measures are available in the literature to find the similarities among categorical data objects. This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation. We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets. Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets. The performances are evaluated in the context of outlier detection task in data mining.

Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP

Roy Thomas* J.E. Judith

International Journal of Innovative Technology and Exploring Engineering

2020

Title: Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Description:

Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc.

There are many measures available in the literature to define the distance between two numerical data objects.

It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered.

Only a few distance measures are available in the literature to find the similarities among categorical data objects.

This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation.

We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets.

Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets.

The performances are evaluated in the context of outlier detection task in data mining.

Back

A similarity measure is a measure evaluating the degree of similarity between two fuzzy data sets and has become an essential tool in many applications including data mining, patte...

Similarity Search with Data Missing

Similarity search is a fundamental research problem with broad applications in various research fields, including data mining, information retrieval, and machine learning. The core...

An Improved Innovation Robust Outliers Detection Method for Airborne Array Position and Orientation Measurement System

The airborne array position and orientation measurement system (array POS) is a key device for high-resolution multi-dimensional real-time imaging motion compensation of military r...

Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments

Integral experiments especially criticality experiments help a lot in designing either new nuclear reactor or criticality assembly. The calculation uncertainty of the integral para...

Bagan Kendali Robust Multivariat untuk Pengamatan Individual

AbstractThe most widely used of control chart in multivariate control processing is control chart T2 Hotelling. There are 2 kinds of control chart T2 Hotelling, namely T2 Hotelling...

Research Note: A Study of Outliers of International Tourism Statistics

As international tourism is an industry that is easily impacted by external shocks, there is always structural mutation of the time series related with it, which causes the existen...

SNOMED CT Primitive Concept Similarity Measure by Concept Name Text Similarity Approach

In the last few years, Concept Similarity Measures (CSMs) become important for the biomedical ontologies in order to find adaptable treatments from the conceptually similar disease...

A Method for Detecting Abnormal Changes in the Temperature Field of Grain Bulk Based on HSV Features of Cloud Maps

HighlightsAbnormal grain temperature changes were detected by calculating the similarity of HSV features in cloud maps.The F-measures were higher for the improved method than for m...

Email:
Password:

Email:

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Related Results