Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Leveraging Edit Distance to Reveal Hidden Patterns in Sequences of Sets

View through CrossRef
Abstract In this paper, we analyze the edit-distance-based approach to classification of sequences of sets. Our goal is to push the edit distance measure to its limits to see just how weak a signal it can detect when applied to sequences of sets. It is a thorough experimental study exploring various aspects of the measure in isolation. To achieve this, we needed precise control over the characteristics of the experimental data. That is why we also propose a flexible dataset generator capable of controlling the main properties of the sequences of sets model, which we make publicly available as an online tool. To give our analysis better context, in each experiment we evaluate the edit distance approach against a standard bag of words approach. Our study uncovers a vast range of findings (from trivial to surprising), which we thoroughly discuss in the paper and, based on them, provide general guidelines on the classification of sequences of sets. Among others, we find that edit distance is in fact able to successfully capture all the main characteristics of sequences of sets --- even the most subtle ones! Moreover, the proposed dataset generator proved to be a very powerful tool with much broader applications than the scope of this paper and can be used to create benchmarks for any data processing algorithms involving sequences of sets.
Title: Leveraging Edit Distance to Reveal Hidden Patterns in Sequences of Sets
Description:
Abstract In this paper, we analyze the edit-distance-based approach to classification of sequences of sets.
Our goal is to push the edit distance measure to its limits to see just how weak a signal it can detect when applied to sequences of sets.
It is a thorough experimental study exploring various aspects of the measure in isolation.
To achieve this, we needed precise control over the characteristics of the experimental data.
That is why we also propose a flexible dataset generator capable of controlling the main properties of the sequences of sets model, which we make publicly available as an online tool.
To give our analysis better context, in each experiment we evaluate the edit distance approach against a standard bag of words approach.
Our study uncovers a vast range of findings (from trivial to surprising), which we thoroughly discuss in the paper and, based on them, provide general guidelines on the classification of sequences of sets.
Among others, we find that edit distance is in fact able to successfully capture all the main characteristics of sequences of sets --- even the most subtle ones! Moreover, the proposed dataset generator proved to be a very powerful tool with much broader applications than the scope of this paper and can be used to create benchmarks for any data processing algorithms involving sequences of sets.

Related Results

Online Education
Online Education
Online education is considered the latest generation in the practice of distance education. As described by professionals in the field, distance education is a form of teaching and...
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
<p>This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zea...
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
<p>This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zea...
Let’s Edit: Using Wikipedia Edit-a-thons as Vehicles for Information Literacy
Let’s Edit: Using Wikipedia Edit-a-thons as Vehicles for Information Literacy
This article explores the integration of Wikipedia into information literacy instruction through the use of edit-a-thons, highlighting its potential despite historical skepticism f...
Distance learning in professional education: topical issues
Distance learning in professional education: topical issues
The importance and necessity of introducing distance learning is due to the global situation with coronavirus infection since the beginning of 2020, which resulted in the emergency...
Fuzzimetric Sets: An Integrated Platform for Both Types of Interval Fuzzy Sets
Fuzzimetric Sets: An Integrated Platform for Both Types of Interval Fuzzy Sets
Type-2 sets are the generalized &ldquo;fuzzified&rdquo; sets that can be used in the fuzzy system. Unlike type-1 fuzzy sets, Type-2 allow the fuzzy sets to be &ldquo;fu...
Preparing Faculty for Distance Learning Teaching
Preparing Faculty for Distance Learning Teaching
Due to the recent development of delivery and communication technology and the success of distance learning, educational organizations are starting to use distance teaching to reac...
Figs S1-S9
Figs S1-S9
Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...

Back to Top