Javascript must be enabled to continue!
Comparative Evaluation of Zero-Shot and Few-Shot Performance of Large Language Models in Low-Resource Language Machine Translation
View through CrossRef
Large language models (LLMs) have demonstrated remarkable translation capabilities for high-resource languages, yet their effectiveness on low-resource languages under varying prompting conditions remains insufficiently understood. This study presents a comparative evaluation of four LLMs—GPT-4, GPT-3.5-Turbo, LLaMA-2-70B, and BLOOM-176B—alongside NLLB-200-3.3B as a supervised baseline, across ten translation directions spanning four resource levels. Using the FLORES-200 devtest set as the primary benchmark and NTREX-128 for cross-validation, we assess zero-shot, one-shot, five-shot, and eight-shot configurations with BLEU, chrF++, and COMET-22 metrics. Our results reveal three principal findings. The few-shot advantage is most pronounced for low-resource languages, with GPT-4 achieving an average BLEU gain of 5.3 points when moving from zero-shot to five-shot on low-resource pairs. One-shot prompting consistently degrades performance below zero-shot baselines, with an average BLEU reduction of 1.4 points across low-resource directions. The supervised NLLB-200 baseline outperforms all LLMs in zero-shot on eight of ten directions, while five-shot GPT-4 narrows this gap to within 1.0 BLEU on mid-resource pairs. These findings provide empirical guidance for practitioners selecting prompting strategies for LLM-based translation in resource-constrained settings.
Journal of Global Engineering Review
Title: Comparative Evaluation of Zero-Shot and Few-Shot Performance of Large Language Models in Low-Resource Language Machine Translation
Description:
Large language models (LLMs) have demonstrated remarkable translation capabilities for high-resource languages, yet their effectiveness on low-resource languages under varying prompting conditions remains insufficiently understood.
This study presents a comparative evaluation of four LLMs—GPT-4, GPT-3.
5-Turbo, LLaMA-2-70B, and BLOOM-176B—alongside NLLB-200-3.
3B as a supervised baseline, across ten translation directions spanning four resource levels.
Using the FLORES-200 devtest set as the primary benchmark and NTREX-128 for cross-validation, we assess zero-shot, one-shot, five-shot, and eight-shot configurations with BLEU, chrF++, and COMET-22 metrics.
Our results reveal three principal findings.
The few-shot advantage is most pronounced for low-resource languages, with GPT-4 achieving an average BLEU gain of 5.
3 points when moving from zero-shot to five-shot on low-resource pairs.
One-shot prompting consistently degrades performance below zero-shot baselines, with an average BLEU reduction of 1.
4 points across low-resource directions.
The supervised NLLB-200 baseline outperforms all LLMs in zero-shot on eight of ten directions, while five-shot GPT-4 narrows this gap to within 1.
0 BLEU on mid-resource pairs.
These findings provide empirical guidance for practitioners selecting prompting strategies for LLM-based translation in resource-constrained settings.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Abstract
The rapid growth of open access publishing (OAP) has significantly improved the accessibility and dissemination of scientific knowledge. However, this expansion has also c...
SPECIFIC TRAITS OF HUNGARIAN-UKRAINIAN POETRY TRANSLATION (BASED ON YURII SHKROBYNETS’ TRANSLATIONS)
SPECIFIC TRAITS OF HUNGARIAN-UKRAINIAN POETRY TRANSLATION (BASED ON YURII SHKROBYNETS’ TRANSLATIONS)
The article addresses matters related to the peculiarities of Hungarian-Ukrainian poetic translation. It was noted that the quality, complexity and overall mastery of literary tran...
Translation
Translation
The theoretical, empirical, and pedagogic study of translation is the concern of the interdisciplinary and international field of scholarship known, since 1972, as translation stud...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Editorial Introduction: Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
Editorial Introduction: Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
We have entered into the era of artificial intelligence, neural machine translation, and especially large language models which have dramatically changed the landscape of human tra...

