Javascript must be enabled to continue!
Saraiki Language Hybrid Stemmer Using Rule-Based and LSTM-Based Sequence-To-Sequence Model Approach
View through CrossRef
Converting a word to its original form, is called stemming, which is extremely important in the field of Natural language processing (NLP). It’s an integral part of the linguistic pre-processing of every Natural language processing application. Stemming converts inflectional word forms into their root word. Much work has been done for stemming in different national and regional languages like English, French, Arabic, German, Urdu, and Hindi. Many regional languages still need work to build digital resources using Natural language processing. Saraiki is one of the widely spoken regional languages in Pakistan. Almost eighty million people use this language for communication. There are very limited digital resources using the Saraiki language available to support advancement in Natural language processing technologies. The current research aims to propose a hybrid stemmer to stem Saraiki Work. The hybrid stemmer contains two hundred prefix and postfix rules and Long short-term memory based sequence-to-sequence model for converting Saraiki words into the stem. Firstly, Saraiki text * Corresponding Author: mubasher@isp.edu.pk was pre-processed, and a rule set was implemented. Secondly, the Long short-term memory based sequence-to-sequence model was deployed to stem the Saraiki word correctly. In the last step, The Saraiki Stemmer performance was evaluated by accurately finding stem word accuracy using a rule-set and Long short-term memory sequence to sequence model. After experiments, using the rule set correctly, stem word accuracy was 68.53%, while the Long short-term memory based sequence-to-sequence model produced 93.0% accuracy of correctly stem words. This work contributes significantly to the regional linguistic field by introducing stemmer for the Saraiki language.
University of Management and Technology
Title: Saraiki Language Hybrid Stemmer Using Rule-Based and LSTM-Based Sequence-To-Sequence Model Approach
Description:
Converting a word to its original form, is called stemming, which is extremely important in the field of Natural language processing (NLP).
It’s an integral part of the linguistic pre-processing of every Natural language processing application.
Stemming converts inflectional word forms into their root word.
Much work has been done for stemming in different national and regional languages like English, French, Arabic, German, Urdu, and Hindi.
Many regional languages still need work to build digital resources using Natural language processing.
Saraiki is one of the widely spoken regional languages in Pakistan.
Almost eighty million people use this language for communication.
There are very limited digital resources using the Saraiki language available to support advancement in Natural language processing technologies.
The current research aims to propose a hybrid stemmer to stem Saraiki Work.
The hybrid stemmer contains two hundred prefix and postfix rules and Long short-term memory based sequence-to-sequence model for converting Saraiki words into the stem.
Firstly, Saraiki text * Corresponding Author: mubasher@isp.
edu.
pk was pre-processed, and a rule set was implemented.
Secondly, the Long short-term memory based sequence-to-sequence model was deployed to stem the Saraiki word correctly.
In the last step, The Saraiki Stemmer performance was evaluated by accurately finding stem word accuracy using a rule-set and Long short-term memory sequence to sequence model.
After experiments, using the rule set correctly, stem word accuracy was 68.
53%, while the Long short-term memory based sequence-to-sequence model produced 93.
0% accuracy of correctly stem words.
This work contributes significantly to the regional linguistic field by introducing stemmer for the Saraiki language.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
The Critical Analysis of Saraiki Literature; Factors, Trends and Traces
The Critical Analysis of Saraiki Literature; Factors, Trends and Traces
The Saraiki language, rooted in the ancient Indus Valley civilization, presents a rich tapestry of historical, linguistic, and cultural evolution. Originating in a region encompass...
U-11 Resistance writer in Saraiki Manthoor Literature
U-11 Resistance writer in Saraiki Manthoor Literature
Saraiki Literature is thriving with astonishing speed and remarkable quality of literary qualities. Saraiki literature is touching almost every aspect of life as a subject matter i...
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Abstract
Funding Acknowledgements
Type of funding sources: None.
INTRODUCTION Patients with heart failure (HF)...
A Rule Based Stemmer
A Rule Based Stemmer
The present digital world generates enormous amount of data instantaneously. The need to effectively mine knowledge seems to be the need of the hour. Sentiment Analysis, a part of ...
High-precision blood glucose prediction and hypoglycemia warning based on the LSTM-GRU model
High-precision blood glucose prediction and hypoglycemia warning based on the LSTM-GRU model
Objective: The performance of blood glucose prediction and hypoglycemia warning based on the LSTM-GRU (Long Short Term Memory - Gated Recurrent Unit) model was evaluated. Methods: ...
Research on Machine Learning Hybrid Framework for Flood Forecasting by Integrating Physical Processes of Runoff Generation and Vectorized Flood Processes
Research on Machine Learning Hybrid Framework for Flood Forecasting by Integrating Physical Processes of Runoff Generation and Vectorized Flood Processes
One of the important non-engineering measures for flood forecasting and disaster reduction in watersheds is the application of machine learning flood prediction models, with Long S...
ANN-LSTM-A Water Consumption Prediction Based on Attention Mechanism Enhancement
ANN-LSTM-A Water Consumption Prediction Based on Attention Mechanism Enhancement
To reduce the energy consumption of domestic hot water (DHW) production, it is necessary to reasonably select a water supply plan through early predictions of DHW consumption to op...

