Javascript must be enabled to continue!
Leveraging Edit Distance to Reveal Hidden Patterns in Sequences of Sets
View through CrossRef
Abstract
In this paper, we analyze the edit-distance-based approach to classification of sequences of sets.
Our goal is to push the edit distance measure to its limits to see just how weak a signal it can detect when applied to sequences of sets.
It is a thorough experimental study exploring various aspects of the measure in isolation.
To achieve this, we needed precise control over the characteristics of the experimental data.
That is why we also propose a flexible dataset generator capable of controlling the main properties of the sequences of sets model, which we make publicly available as an online tool.
To give our analysis better context, in each experiment we evaluate the edit distance approach against a standard bag of words approach.
Our study uncovers a vast range of findings (from trivial to surprising), which we thoroughly discuss in the paper and, based on them, provide general guidelines on the classification of sequences of sets.
Among others, we find that edit distance is in fact able to successfully capture all the main characteristics of sequences of sets --- even the most subtle ones!
Moreover, the proposed dataset generator proved to be a very powerful tool with much broader applications than the scope of this paper and can be used to create benchmarks for any data processing algorithms involving sequences of sets.
Springer Science and Business Media LLC
Title: Leveraging Edit Distance to Reveal Hidden Patterns in Sequences of Sets
Description:
Abstract
In this paper, we analyze the edit-distance-based approach to classification of sequences of sets.
Our goal is to push the edit distance measure to its limits to see just how weak a signal it can detect when applied to sequences of sets.
It is a thorough experimental study exploring various aspects of the measure in isolation.
To achieve this, we needed precise control over the characteristics of the experimental data.
That is why we also propose a flexible dataset generator capable of controlling the main properties of the sequences of sets model, which we make publicly available as an online tool.
To give our analysis better context, in each experiment we evaluate the edit distance approach against a standard bag of words approach.
Our study uncovers a vast range of findings (from trivial to surprising), which we thoroughly discuss in the paper and, based on them, provide general guidelines on the classification of sequences of sets.
Among others, we find that edit distance is in fact able to successfully capture all the main characteristics of sequences of sets --- even the most subtle ones!
Moreover, the proposed dataset generator proved to be a very powerful tool with much broader applications than the scope of this paper and can be used to create benchmarks for any data processing algorithms involving sequences of sets.
Related Results
Eating Disorders Intensive Treatment (EDIT) Subteam: Shoring Up MDT Working to Turn the Tide for Patients at Risk of Hospitalisation
Eating Disorders Intensive Treatment (EDIT) Subteam: Shoring Up MDT Working to Turn the Tide for Patients at Risk of Hospitalisation
AimsPresentations of severe Eating Disorders (ED) to the Tertiary Eating Disorders Specialist Service (TESS) in Lanarkshire have increased in recent years. Our criteria has also ex...
Persons and Their Private Personas: Living with Yourself
Persons and Their Private Personas: Living with Yourself
Public life is usually understood to be whatever we do or say in our formal and professional relationships. At the workplace, at the doctor’s office or at the café, we need to make...
CONCEPT OF “HIDDEN ASSETS” AND METHODOLOGICAL BASES FOR THEIR ASSESSMENT
CONCEPT OF “HIDDEN ASSETS” AND METHODOLOGICAL BASES FOR THEIR ASSESSMENT
To assess the prospects of an enterprise and plan its activities, it is very important to know what assets it has, including hidden assets. Today, there is no single understanding ...
Phylogenetic Classification of Feline Immunodeficiency Virus
Phylogenetic Classification of Feline Immunodeficiency Virus
Background: The feline immunodeficiency virus (FIV) is responsible for a retroviral disease that affects domestic and wild cats worldwide, causing Feline Acquired Immunodeficiency ...
Between the Classes of Soft Open Sets and Soft Omega Open Sets
Between the Classes of Soft Open Sets and Soft Omega Open Sets
In this paper, we define the class of soft ω0-open sets. We show that this class forms a soft topology that is strictly between the classes of soft open sets and soft ω-open sets, ...
BINARY TOPOLOGY BASED ON SOME NEW SETS
BINARY TOPOLOGY BASED ON SOME NEW SETS
In this chapter, we introduce and some new sets called binary -open sets, binary -sets, binary -sets, binary -closed sets, binary -sets and binary -sets , which are simple forms of...
Distance Learning Overview
Distance Learning Overview
The knowledge explosion, the increased complexity of human life, and the ubiquitous nature of technology coupled with the globalization of the marketplace herald the need to embrac...
Online Education
Online Education
Online education is considered the latest generation in the practice of distance education. As described by professionals in the field, distance education is a form of teaching and...

