Javascript must be enabled to continue!
RAG Based QA for Low Resource Languages
View through CrossRef
Abstract
Question Answering (QA) has been an important research direction in Natural Language Processing (NLP) and artificial intelligence. The majority of current large language models (LLMs) that have been fine-tuned concentrate on improving performance on different NLP tasks including question answering with new dataset since the current model does not give accurate results. To adapt the LLM to a specific domain the fine-tuned method have a great impact. However, fine-tuning have a limitation of labeled data. To address such problem we use RAG with LLMs on question answering NLP tasks using different documents. We use Samuael/llama-2-7b-tebot-amharic a fine-tuned LLM for Question answering tasks including RAG techniques. We use publicly available autoregressive language models Samuael/llama-2-7b-tebot-amharic from hugging face as a base model. We use RAG because it helps to augment the knowledge of different documents such as text, doc, xml, html, pdf with large language model. We also fine-tune with LORA method on Amharic (AmharicInstructiondataset) dataset from the hugging face having a collection of more than 100000 records in different domains. Fine-tuning in AI is the process of adjusting the weights and parameters of a pre-trained model on new data to improve its performance on a specific task[2]. Experimental results on 50 test sets for named entity recognition, question answering tasks achieves superior performance compared to general LLMs. We termed a fine-tuned version of Samuael/llama-2-7b-tebot-amharic as llama-2-AmLLM that is optimized for question answering. After fine-tuning the model achieve a BLEU score of 0.4432 on the given test set, significantly exceeding previous state of the art for this task.
Springer Science and Business Media LLC
Title: RAG Based QA for Low Resource Languages
Description:
Abstract
Question Answering (QA) has been an important research direction in Natural Language Processing (NLP) and artificial intelligence.
The majority of current large language models (LLMs) that have been fine-tuned concentrate on improving performance on different NLP tasks including question answering with new dataset since the current model does not give accurate results.
To adapt the LLM to a specific domain the fine-tuned method have a great impact.
However, fine-tuning have a limitation of labeled data.
To address such problem we use RAG with LLMs on question answering NLP tasks using different documents.
We use Samuael/llama-2-7b-tebot-amharic a fine-tuned LLM for Question answering tasks including RAG techniques.
We use publicly available autoregressive language models Samuael/llama-2-7b-tebot-amharic from hugging face as a base model.
We use RAG because it helps to augment the knowledge of different documents such as text, doc, xml, html, pdf with large language model.
We also fine-tune with LORA method on Amharic (AmharicInstructiondataset) dataset from the hugging face having a collection of more than 100000 records in different domains.
Fine-tuning in AI is the process of adjusting the weights and parameters of a pre-trained model on new data to improve its performance on a specific task[2].
Experimental results on 50 test sets for named entity recognition, question answering tasks achieves superior performance compared to general LLMs.
We termed a fine-tuned version of Samuael/llama-2-7b-tebot-amharic as llama-2-AmLLM that is optimized for question answering.
After fine-tuning the model achieve a BLEU score of 0.
4432 on the given test set, significantly exceeding previous state of the art for this task.
Related Results
JADE: jawbone lesion diagnosis and decision supporting system
JADE: jawbone lesion diagnosis and decision supporting system
Abstract
Objectives
To develop and evaluate JADE, a proof-of-concept retrieval-augmented generation (RAG) diagnostic assi...
Kra-Dai Languages
Kra-Dai Languages
Kra-Dai (also called Tai-Kadai and Kam-Tai) is a family of approximately 100 languages spoken in Southeast Asia, extending from the island of Hainan, China, in the east to the Indi...
Natural language processing applications for low-resource languages
Natural language processing applications for low-resource languages
AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disp...
Physiological V(D)J Recombination is Mediated by RAG Scanning of Loop-extruded Chromatin
Physiological V(D)J Recombination is Mediated by RAG Scanning of Loop-extruded Chromatin
Abstract
RAG endonuclease initiates V(D)J recombination by cleaving paired V, D, and J gene segments flanked by complementary recombination signal sequences (RSSs). ...
SMART RESPONSE: RAG ENHANCED QUESTION ANSWERING MODEL
SMART RESPONSE: RAG ENHANCED QUESTION ANSWERING MODEL
Smart Response: RAG Enhanced Question Answering Model', aims to revolutionize question answering systems by integrating RetrievalAugmented Generation (RAG). RAG synergizes a retrie...
A Systematic Literature Review of Retrieval-Augmented Generation Implementation for Enhancing Large Language Models in Education
A Systematic Literature Review of Retrieval-Augmented Generation Implementation for Enhancing Large Language Models in Education
The rapid advancement of Large Language Models (LLM) has led to the creation of increasingly adaptive intelligent learning systems. However, many educational implementations of LLM...
Mande Languages
Mande Languages
Mande is a mid-range language family in Western Sub-Saharan Africa that includes 60 to 75 languages spoken by 30 to 40 million people. According to the glottochronological data, it...
Khoisan Languages
Khoisan Languages
The languages traditionally referred to as “Khoisan” languages are spoken in southern and eastern Africa, specifically in the Republic of South Africa, Namibia, Botswana, Angola, a...

