Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Parameterized Strings: Algorithms and Applications

View through CrossRef
The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols. A parameterized match (p-match) exists between two p-strings if the constants match exactly and there exists a bijection between the parameter symbols. Historically, p-strings have been employed in source code cloning, plagiarism detection, and structural similarity between biological sequences. By handling the intricacies of the parameterized suffix, we can efficiently address complex applications with data structures also reusable in traditional matching scenarios. In this dissertation, we extend data structures for p-strings (and variants) to address sophisticated string computations.;We introduce a taxonomy of classes for longest factor problems. Using this taxonomy, we show an interesting connection between the parameterized longest previous factor (pLPF) and familiar data structures in string theory, including the border array, prefix array, longest common prefix array, and analogous p-string data structures. Exploiting this connection, we construct a multitude of data structures using the same general pLPF framework.;Before this dissertation, the p-match was defined predominately by the matching between uncompressed p-strings. Here, we introduce the compressed parameterized pattern match to find all p-matches between a pattern and a text, using only the pattern and a compressed form of the text. We present parameterized compression (p-compression) as a new way to losslessly compress data to support p-matching. Experimentally, it is shown that p-compression is competitive with standard compression schemes. Using p-compression, we address the compressed p-match independent of the underlying compression routine.;Currently, p-string theory lacks the capability to support indeterminate symbols, a staple essential for applications involving inexact matching such as in music analysis. In this work, we propose and efficiently address two new types of p-matching with indeterminate symbols. (1) We introduce the indeterminate parameterized match (ip-match) to permit matching with indeterminate holes in a p-string. We support the ip-match by introducing data structures that extend the prefix array. (2) From a different perspective, the equivalence parameterized match (e-match) evolves the p-match to consider intra-alphabet symbol classes as equivalence classes. We propose a method to perform the e-match using the p-string suffix array framework, i.e. the parameterized suffix array (pSA) and parameterized longest common prefix array (pLCP). Historically, direct constructions of the pSA and pLCP have suffered from quadratic time bounds in the worst-case. Here, we introduce new p-string theory to efficiently construct the pSA/pLCP and break the theoretical worst-case time barrier.;Biological applications have become a classical use of p-string theory. Here, we introduce the structural border array to provide a lightweight solution to the biologically-oriented variant of the p-match, i.e. the structural match (s-match) on structural strings (s-strings). Following the s-match, we show how to use s-string suffix structures to support various pattern matching problems involving RNA secondary structures. Finally, we propose/construct the forward stem matrix (FSM), a data structure to access RNA stem structures, and we apply the FSM to the detection of hairpins and pseudoknots in an RNA sequence.;This dissertation advances the state-of-the-art in p-string theory by developing data structures for p-strings/s-strings and using p-string/s-string theory in new and old contexts to address various applications. Due to the flexibility of the p-string/s-string, the data structures and algorithms in this work are also applicable to the myriad of problems in the string community that involve traditional strings.
West Virginia University Libraries
Title: Parameterized Strings: Algorithms and Applications
Description:
The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols.
A parameterized match (p-match) exists between two p-strings if the constants match exactly and there exists a bijection between the parameter symbols.
Historically, p-strings have been employed in source code cloning, plagiarism detection, and structural similarity between biological sequences.
By handling the intricacies of the parameterized suffix, we can efficiently address complex applications with data structures also reusable in traditional matching scenarios.
In this dissertation, we extend data structures for p-strings (and variants) to address sophisticated string computations.
;We introduce a taxonomy of classes for longest factor problems.
Using this taxonomy, we show an interesting connection between the parameterized longest previous factor (pLPF) and familiar data structures in string theory, including the border array, prefix array, longest common prefix array, and analogous p-string data structures.
Exploiting this connection, we construct a multitude of data structures using the same general pLPF framework.
;Before this dissertation, the p-match was defined predominately by the matching between uncompressed p-strings.
Here, we introduce the compressed parameterized pattern match to find all p-matches between a pattern and a text, using only the pattern and a compressed form of the text.
We present parameterized compression (p-compression) as a new way to losslessly compress data to support p-matching.
Experimentally, it is shown that p-compression is competitive with standard compression schemes.
Using p-compression, we address the compressed p-match independent of the underlying compression routine.
;Currently, p-string theory lacks the capability to support indeterminate symbols, a staple essential for applications involving inexact matching such as in music analysis.
In this work, we propose and efficiently address two new types of p-matching with indeterminate symbols.
(1) We introduce the indeterminate parameterized match (ip-match) to permit matching with indeterminate holes in a p-string.
We support the ip-match by introducing data structures that extend the prefix array.
(2) From a different perspective, the equivalence parameterized match (e-match) evolves the p-match to consider intra-alphabet symbol classes as equivalence classes.
We propose a method to perform the e-match using the p-string suffix array framework, i.
e.
the parameterized suffix array (pSA) and parameterized longest common prefix array (pLCP).
Historically, direct constructions of the pSA and pLCP have suffered from quadratic time bounds in the worst-case.
Here, we introduce new p-string theory to efficiently construct the pSA/pLCP and break the theoretical worst-case time barrier.
;Biological applications have become a classical use of p-string theory.
Here, we introduce the structural border array to provide a lightweight solution to the biologically-oriented variant of the p-match, i.
e.
the structural match (s-match) on structural strings (s-strings).
Following the s-match, we show how to use s-string suffix structures to support various pattern matching problems involving RNA secondary structures.
Finally, we propose/construct the forward stem matrix (FSM), a data structure to access RNA stem structures, and we apply the FSM to the detection of hairpins and pseudoknots in an RNA sequence.
;This dissertation advances the state-of-the-art in p-string theory by developing data structures for p-strings/s-strings and using p-string/s-string theory in new and old contexts to address various applications.
Due to the flexibility of the p-string/s-string, the data structures and algorithms in this work are also applicable to the myriad of problems in the string community that involve traditional strings.

Related Results

Design of Casing Strings
Design of Casing Strings
Abstract Considerable economy can be effected by designing each casing string individually for the particular set of conditions involved. The paper discusses meth...
Tree Pattern Matching
Tree Pattern Matching
Most of this book is about stringology, the study of strings. So why this chapter on trees? Why not graphs or geometry or something else? First, trees generalize strings in a very ...
Does bumblebee preference of continuous over interrupted strings in string-pulling tasks indicate means-end comprehension?
Does bumblebee preference of continuous over interrupted strings in string-pulling tasks indicate means-end comprehension?
Abstract Bumblebees (Bombus terrestris) have been shown to engage in string-pulling behavior to access rewards. The objective of this study was to elucidate whether bumblebees disp...
Integrating quantum neural networks with machine learning algorithms for optimizing healthcare diagnostics and treatment outcomes
Integrating quantum neural networks with machine learning algorithms for optimizing healthcare diagnostics and treatment outcomes
The rapid advancements in artificial intelligence (AI) and quantum computing have catalyzed an unprecedented shift in the methodologies utilized for healthcare diagnostics and trea...
Drill-String Torsional Vibration Suppression Using GA Optimized Controllers
Drill-String Torsional Vibration Suppression Using GA Optimized Controllers
Abstract Failure of drill-strings is very costly in terms of money and time and occurs more frequently than oil companies would like to see. There are many reason...
Fixed-Parameter Algorithms for Closed World Reasoning
Fixed-Parameter Algorithms for Closed World Reasoning
Closed world reasoning and circumscription are essential tasks in AI. However, their high computational complexity is a serious obstacle for their practical application. In this wo...
A comprehensive review of post-quantum cryptography: Challenges and advances
A comprehensive review of post-quantum cryptography: Challenges and advances
One of the most crucial measures to maintain data security is the use of cryptography schemes and digital signatures built upon cryptographic algorithms. The resistance of cryptogr...
Modeling Hybrid Metaheuristic Optimization Algorithm for Convergence Prediction
Modeling Hybrid Metaheuristic Optimization Algorithm for Convergence Prediction
The project aims at the design and development of six hybrid nature inspired algorithms based on Grey Wolf Optimization algorithm with Artificial Bee Colony Optimization algorithm ...

Back to Top