Javascript must be enabled to continue!

RAG Based QA for Low Resource Languages

Abstract Question Answering (QA) has been an important research direction in Natural Language Processing (NLP) and artificial intelligence. The majority of current large language models (LLMs) that have been fine-tuned concentrate on improving performance on different NLP tasks including question answering with new dataset since the current model does not give accurate results. To adapt the LLM to a specific domain the fine-tuned method have a great impact. However, fine-tuning have a limitation of labeled data. To address such problem we use RAG with LLMs on question answering NLP tasks using different documents. We use Samuael/llama-2-7b-tebot-amharic a fine-tuned LLM for Question answering tasks including RAG techniques. We use publicly available autoregressive language models Samuael/llama-2-7b-tebot-amharic from hugging face as a base model. We use RAG because it helps to augment the knowledge of different documents such as text, doc, xml, html, pdf with large language model. We also fine-tune with LORA method on Amharic (AmharicInstructiondataset) dataset from the hugging face having a collection of more than 100000 records in different domains. Fine-tuning in AI is the process of adjusting the weights and parameters of a pre-trained model on new data to improve its performance on a specific task[2]. Experimental results on 50 test sets for named entity recognition, question answering tasks achieves superior performance compared to general LLMs. We termed a fine-tuned version of Samuael/llama-2-7b-tebot-amharic as llama-2-AmLLM that is optimized for question answering. After fine-tuning the model achieve a BLEU score of 0.4432 on the given test set, significantly exceeding previous state of the art for this task.

Springer Science and Business Media LLC

Berhanu Bogale Tesfa Tegegne Solomon Teferra Gebeyehu Belay

2024

Title: RAG Based QA for Low Resource Languages

Description:

Abstract Question Answering (QA) has been an important research direction in Natural Language Processing (NLP) and artificial intelligence.

The majority of current large language models (LLMs) that have been fine-tuned concentrate on improving performance on different NLP tasks including question answering with new dataset since the current model does not give accurate results.

To adapt the LLM to a specific domain the fine-tuned method have a great impact.

However, fine-tuning have a limitation of labeled data.

To address such problem we use RAG with LLMs on question answering NLP tasks using different documents.

We use Samuael/llama-2-7b-tebot-amharic a fine-tuned LLM for Question answering tasks including RAG techniques.

We use publicly available autoregressive language models Samuael/llama-2-7b-tebot-amharic from hugging face as a base model.

We use RAG because it helps to augment the knowledge of different documents such as text, doc, xml, html, pdf with large language model.

We also fine-tune with LORA method on Amharic (AmharicInstructiondataset) dataset from the hugging face having a collection of more than 100000 records in different domains.

Fine-tuning in AI is the process of adjusting the weights and parameters of a pre-trained model on new data to improve its performance on a specific task[2].

Experimental results on 50 test sets for named entity recognition, question answering tasks achieves superior performance compared to general LLMs.

We termed a fine-tuned version of Samuael/llama-2-7b-tebot-amharic as llama-2-AmLLM that is optimized for question answering.

After fine-tuning the model achieve a BLEU score of 0.

4432 on the given test set, significantly exceeding previous state of the art for this task.

Back

Abstract Objectives To develop and evaluate JADE, a proof-of-concept retrieval-augmented generation (RAG) diagnostic assi...

Kra-Dai Languages

Kra-Dai (also called Tai-Kadai and Kam-Tai) is a family of approximately 100 languages spoken in Southeast Asia, extending from the island of Hainan, China, in the east to the Indi...

Natural language processing applications for low-resource languages

AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disp...

Physiological V(D)J Recombination is Mediated by RAG Scanning of Loop-extruded Chromatin

Abstract RAG endonuclease initiates V(D)J recombination by cleaving paired V, D, and J gene segments flanked by complementary recombination signal sequences (RSSs). ...

SMART RESPONSE: RAG ENHANCED QUESTION ANSWERING MODEL

Smart Response: RAG Enhanced Question Answering Model', aims to revolutionize question answering systems by integrating RetrievalAugmented Generation (RAG). RAG synergizes a retrie...

A Systematic Literature Review of Retrieval-Augmented Generation Implementation for Enhancing Large Language Models in Education

The rapid advancement of Large Language Models (LLM) has led to the creation of increasingly adaptive intelligent learning systems. However, many educational implementations of LLM...

Mande Languages

Mande is a mid-range language family in Western Sub-Saharan Africa that includes 60 to 75 languages spoken by 30 to 40 million people. According to the glottochronological data, it...

Khoisan Languages

The languages traditionally referred to as “Khoisan” languages are spoken in southern and eastern Africa, specifically in the Republic of South Africa, Namibia, Botswana, Angola, a...

Email:
Password:

Email:

RAG Based QA for Low Resource Languages

Related Results