Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Buryat historical sources: digital infrastructure of machine translation

View through CrossRef
The study is dedicated to a vast yet still underexplored corpus of Buryat historical sources in the Old Written Mongolian language, preserved in academic and archival institutions in Russia. The introduction of these documents into scientific and public circulation, reflecting all aspects of Buryat society from the 18th to the early 20th centuries, requires the application of modern digital methods. The most promising approach to addressing this task is a comprehensive strategy employing digital humanities tools, which includes digitizing sources, full-text input of texts using romanized transliteration, creating a text Mongolian-Russian corpus with metatextual, structural, and morphological markup according to TEI standards, translating sources, and forming a balanced parallel Mongolian-Russian corpus. Special attention was paid to the genre, chronological, and territorial representativeness of the corpus data. The prepared digital materials serve as a foundation for applying machine learning methods to tackle tasks in optical text recognition for Mongolian script and machine translation. To assess the prospects of machine translation for Buryat historical sources, computational experiments were conducted with transformer models trained "from scratch" and large pre-trained multilingual models (mBART-50, mT5). The scientific novelty of the work is defined by the creation of the first methodologically grounded digital platform for studying Buryat historical sources, adapted to the characteristics of the local rendition of Old Written Mongolian. Important competencies have been gained for organizing the complete cycle of digital processing of historical sources from the digitization of archival documents to the formation of a balanced parallel corpus. For the first time, an online corpus of unique historical texts has been assembled and published, equipped with analytical tools, as well as specialized datasets for training AI models. It has been established that for low-resource language pairs, the most effective strategy is fine-tuning pre-trained multilingual models rather than modifying neural network architectures. The research lays the groundwork for creating comprehensive tools for digitizing Buryat written heritage, which will open new perspectives for historical, linguistic, and cultural studies.
Title: Buryat historical sources: digital infrastructure of machine translation
Description:
The study is dedicated to a vast yet still underexplored corpus of Buryat historical sources in the Old Written Mongolian language, preserved in academic and archival institutions in Russia.
The introduction of these documents into scientific and public circulation, reflecting all aspects of Buryat society from the 18th to the early 20th centuries, requires the application of modern digital methods.
The most promising approach to addressing this task is a comprehensive strategy employing digital humanities tools, which includes digitizing sources, full-text input of texts using romanized transliteration, creating a text Mongolian-Russian corpus with metatextual, structural, and morphological markup according to TEI standards, translating sources, and forming a balanced parallel Mongolian-Russian corpus.
Special attention was paid to the genre, chronological, and territorial representativeness of the corpus data.
The prepared digital materials serve as a foundation for applying machine learning methods to tackle tasks in optical text recognition for Mongolian script and machine translation.
To assess the prospects of machine translation for Buryat historical sources, computational experiments were conducted with transformer models trained "from scratch" and large pre-trained multilingual models (mBART-50, mT5).
The scientific novelty of the work is defined by the creation of the first methodologically grounded digital platform for studying Buryat historical sources, adapted to the characteristics of the local rendition of Old Written Mongolian.
Important competencies have been gained for organizing the complete cycle of digital processing of historical sources from the digitization of archival documents to the formation of a balanced parallel corpus.
For the first time, an online corpus of unique historical texts has been assembled and published, equipped with analytical tools, as well as specialized datasets for training AI models.
It has been established that for low-resource language pairs, the most effective strategy is fine-tuning pre-trained multilingual models rather than modifying neural network architectures.
The research lays the groundwork for creating comprehensive tools for digitizing Buryat written heritage, which will open new perspectives for historical, linguistic, and cultural studies.

Related Results

Access Denied
Access Denied
Introduction As social-distancing mandates in response to COVID-19 restricted in-person data collection methods such as participant observation and interviews, researchers turned t...
Translation
Translation
The theoretical, empirical, and pedagogic study of translation is the concern of the interdisciplinary and international field of scholarship known, since 1972, as translation stud...
SPECIFIC TRAITS OF HUNGARIAN-UKRAINIAN POETRY TRANSLATION (BASED ON YURII SHKROBYNETS’ TRANSLATIONS)
SPECIFIC TRAITS OF HUNGARIAN-UKRAINIAN POETRY TRANSLATION (BASED ON YURII SHKROBYNETS’ TRANSLATIONS)
The article addresses matters related to the peculiarities of Hungarian-Ukrainian poetic translation. It was noted that the quality, complexity and overall mastery of literary tran...
Cattle in Buryat Mythology and Ritual
Cattle in Buryat Mythology and Ritual
This study addresses, on the basis of ethnographic, folkloric, linguistic, and field data, the role of cattle in Buryat myths and rites, with reference to their economic significan...
Editorial Introduction: Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
Editorial Introduction: Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
We have entered into the era of artificial intelligence, neural machine translation, and especially large language models which have dramatically changed the landscape of human tra...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Cultranslatology in China
Cultranslatology in China
Culture has long been noticed in translation practice, and theoretical research on translation and culture has a history of over 40 years. Unlike the cultural schools of translatio...
Database of Buryat Genealogies: Major Approaches and Implementation
Database of Buryat Genealogies: Major Approaches and Implementation
The article develops a database of archival sources covering Buryat family trees. Genealogic data were traditionally very important for Buryat society as they were linked to such a...

Back to Top