Javascript must be enabled to continue!

Enhancing Graph-based Machine Learning through Lyndon Partial Words

Objectives: This study integrates the combinatorial properties of Lyndon partial words with Graph-Based Machine Learning (GBML) to develop an innovative approach for sequence analysis. The research is particularly aimed at addressing challenges in fields like bioinformatics and natural language processing (NLP), where incomplete or fragmented data often hinder effective analysis. By leveraging the minimality and primitiveness inherent to Lyndon partial words, this study seeks to provide a robust framework for modeling and analyzing such data. Methods: Graphs were constructed from Lyndon partial words, where nodes represent unique partial words or their conjugates, and edges signify relationships such as lexicographical proximity or shared substrings. These graphs were subjected to advanced GBML techniques, including community detection algorithms to uncover clusters of related patterns, and similarity analysis to measure structural and semantic relationships. Data preprocessing ensured the accurate representation of partial words while maintaining their combinatorial integrity. Findings: The integration of Lyndon partial words into GBML demonstrates significant potential in pattern recognition and structural analysis, particularly for datasets characterized by fragmentation or incompleteness. The constructed graphs effectively capture underlying relationships and patterns, aiding in the discovery of meaningful insights in sequence data. This novel framework enables improved modeling of real-world scenarios, such as identifying recurring motifs in biological sequences or understanding linguistic variations in incomplete text datasets. Novelty: By combining the theoretical elegance of Lyndon partial words with the computational power of GBML, this study introduces a novel methodology for tackling incomplete data in sequence analysis. The approach highlights the adaptability of combinatorial constructs for solving practical problems, offering new avenues for research in data-intensive domains like bioinformatics and NLP. The framework also underscores the importance of interdisciplinary solutions in advancing machine learning applications for complex and fragmented datasets.

Science Research Society

R. Krishna Kumari

Communications on Applied Nonlinear Analysis

2025

Title: Enhancing Graph-based Machine Learning through Lyndon Partial Words

Description:

Objectives: This study integrates the combinatorial properties of Lyndon partial words with Graph-Based Machine Learning (GBML) to develop an innovative approach for sequence analysis.

The research is particularly aimed at addressing challenges in fields like bioinformatics and natural language processing (NLP), where incomplete or fragmented data often hinder effective analysis.

By leveraging the minimality and primitiveness inherent to Lyndon partial words, this study seeks to provide a robust framework for modeling and analyzing such data.

Methods: Graphs were constructed from Lyndon partial words, where nodes represent unique partial words or their conjugates, and edges signify relationships such as lexicographical proximity or shared substrings.

These graphs were subjected to advanced GBML techniques, including community detection algorithms to uncover clusters of related patterns, and similarity analysis to measure structural and semantic relationships.

Data preprocessing ensured the accurate representation of partial words while maintaining their combinatorial integrity.

Findings: The integration of Lyndon partial words into GBML demonstrates significant potential in pattern recognition and structural analysis, particularly for datasets characterized by fragmentation or incompleteness.

The constructed graphs effectively capture underlying relationships and patterns, aiding in the discovery of meaningful insights in sequence data.

This novel framework enables improved modeling of real-world scenarios, such as identifying recurring motifs in biological sequences or understanding linguistic variations in incomplete text datasets.

Novelty: By combining the theoretical elegance of Lyndon partial words with the computational power of GBML, this study introduces a novel methodology for tackling incomplete data in sequence analysis.

The approach highlights the adaptability of combinatorial constructs for solving practical problems, offering new avenues for research in data-intensive domains like bioinformatics and NLP.

The framework also underscores the importance of interdisciplinary solutions in advancing machine learning applications for complex and fragmented datasets.

Back

(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>

Penelitian ini bertujuan menentukan bilangan terhubung titik pelangi (rainbow vertex connection number) pada graf garis dan graf tengah yang diperoleh dari hasil operasi comb antar...

Lyndon Words and Christoffel Words

This chapter covers the lexicographical ordering of lower Christoffel words, which is equivalent to the ordering by their slopes (Borel and Laubie). Lower Christoffel words are par...

Računalno potpomognuto usmjeravanje kod dvojezičnih govornika

This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...

Relational Pretraining for the Next Generation of Graph Intelligence

The rapid advancement of foundation models has transformed the landscape of machine learning by enabling scalable, general-purpose solutions across diverse domains such as natural ...

Email:
Password:

Email:

Enhancing Graph-based Machine Learning through Lyndon Partial Words

Related Results