Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Leveraging Edit Distance to Reveal Hidden Patterns in Sequences of Sets

View through CrossRef
Abstract In this paper, we analyze the edit-distance-based approach to classification of sequences of sets. Our goal is to push the edit distance measure to its limits to see just how weak a signal it can detect when applied to sequences of sets. It is a thorough experimental study exploring various aspects of the measure in isolation. To achieve this, we needed precise control over the characteristics of the experimental data. That is why we also propose a flexible dataset generator capable of controlling the main properties of the sequences of sets model, which we make publicly available as an online tool. To give our analysis better context, in each experiment we evaluate the edit distance approach against a standard bag of words approach. Our study uncovers a vast range of findings (from trivial to surprising), which we thoroughly discuss in the paper and, based on them, provide general guidelines on the classification of sequences of sets. Among others, we find that edit distance is in fact able to successfully capture all the main characteristics of sequences of sets --- even the most subtle ones! Moreover, the proposed dataset generator proved to be a very powerful tool with much broader applications than the scope of this paper and can be used to create benchmarks for any data processing algorithms involving sequences of sets.
Title: Leveraging Edit Distance to Reveal Hidden Patterns in Sequences of Sets
Description:
Abstract In this paper, we analyze the edit-distance-based approach to classification of sequences of sets.
Our goal is to push the edit distance measure to its limits to see just how weak a signal it can detect when applied to sequences of sets.
It is a thorough experimental study exploring various aspects of the measure in isolation.
To achieve this, we needed precise control over the characteristics of the experimental data.
That is why we also propose a flexible dataset generator capable of controlling the main properties of the sequences of sets model, which we make publicly available as an online tool.
To give our analysis better context, in each experiment we evaluate the edit distance approach against a standard bag of words approach.
Our study uncovers a vast range of findings (from trivial to surprising), which we thoroughly discuss in the paper and, based on them, provide general guidelines on the classification of sequences of sets.
Among others, we find that edit distance is in fact able to successfully capture all the main characteristics of sequences of sets --- even the most subtle ones! Moreover, the proposed dataset generator proved to be a very powerful tool with much broader applications than the scope of this paper and can be used to create benchmarks for any data processing algorithms involving sequences of sets.

Related Results

Persons and Their Private Personas: Living with Yourself
Persons and Their Private Personas: Living with Yourself
Public life is usually understood to be whatever we do or say in our formal and professional relationships. At the workplace, at the doctor’s office or at the café, we need to make...
Online Education
Online Education
Online education is considered the latest generation in the practice of distance education. As described by professionals in the field, distance education is a form of teaching and...
GEDAN: Learning the Edit Costs for Graph Edit Distance
GEDAN: Learning the Edit Costs for Graph Edit Distance
Graph Edit Distance (GED) is defined as the minimum cost transformation of one graph into another and is a widely adopted metric for measuring the dissimilarity between graphs. The...
Frames with desired angle properties
Frames with desired angle properties
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] The purpose of this dissertation is to study frames with desired angle properties. More precisely, we study ...
Let’s Edit: Using Wikipedia Edit-a-thons as Vehicles for Information Literacy
Let’s Edit: Using Wikipedia Edit-a-thons as Vehicles for Information Literacy
This article explores the integration of Wikipedia into information literacy instruction through the use of edit-a-thons, highlighting its potential despite historical skepticism f...
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
<p>This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zea...
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
<p>This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zea...
Interactive Music Distance Education Platform Based on RBF Algorithm
Interactive Music Distance Education Platform Based on RBF Algorithm
INTRODUCTION: Since the 21st century, Internet technology has been developing rapidly, and the field of education has gradually broken through the traditional offline teaching mode...

Back to Top