Javascript must be enabled to continue!
Spoken Corpora of Slavic Languages
View through CrossRef
AbstractSpoken corpora are collections of transcribed and annotated audio and /or video recordings of languages or language varieties. The aim of this paper is to present an overview of 51 spoken corpora currently available for Slavic languages and dialects, in particular Belarusian, Bulgarian, Croatian, Czech, Polish, Russian, Slovak, Slovenian, Trasianka, Ukrainian/Rusyn. We identify three groups of corpora according to the type of lect: corpora of standard languages (spoken mainly in an urban environment and existing in both written and oral form), dialects (spoken mainly in a rural environment and unwritten), and bilingual varieties (we call bilingual varieties spoken as L2 by people with different L1 languages, as well as all varieties that evolved in a multilingual environment). We survey the corpora in terms of text registers, transcription, and principles of linguistic and extralinguistic annotation. In conclusion, we suggest a list of features that linguists should take into consideration when developing a spoken corpus. Many spoken corpora are currently being created for various Slavic lects, and their developers may use this overview as a source of information on different designs and solutions.
Title: Spoken Corpora of Slavic Languages
Description:
AbstractSpoken corpora are collections of transcribed and annotated audio and /or video recordings of languages or language varieties.
The aim of this paper is to present an overview of 51 spoken corpora currently available for Slavic languages and dialects, in particular Belarusian, Bulgarian, Croatian, Czech, Polish, Russian, Slovak, Slovenian, Trasianka, Ukrainian/Rusyn.
We identify three groups of corpora according to the type of lect: corpora of standard languages (spoken mainly in an urban environment and existing in both written and oral form), dialects (spoken mainly in a rural environment and unwritten), and bilingual varieties (we call bilingual varieties spoken as L2 by people with different L1 languages, as well as all varieties that evolved in a multilingual environment).
We survey the corpora in terms of text registers, transcription, and principles of linguistic and extralinguistic annotation.
In conclusion, we suggest a list of features that linguists should take into consideration when developing a spoken corpus.
Many spoken corpora are currently being created for various Slavic lects, and their developers may use this overview as a source of information on different designs and solutions.
Related Results
Kra-Dai Languages
Kra-Dai Languages
Kra-Dai (also called Tai-Kadai and Kam-Tai) is a family of approximately 100 languages spoken in Southeast Asia, extending from the island of Hainan, China, in the east to the Indi...
UMA PROPOSTA DE WORKFLOW PARA CONSTRUÇÃO DE CORPUS DIGITAL EM LÍNGUA DE SINAIS
UMA PROPOSTA DE WORKFLOW PARA CONSTRUÇÃO DE CORPUS DIGITAL EM LÍNGUA DE SINAIS
Os corpora de línguas de sinais disponíveis atualmente em pesquisas linguísticas e em sites para acesso livre são constituídos por um módulo de gravação feita em vídeo, pois os dad...
A Taste for Corpora
A Taste for Corpora
The eleven contributions to this volume, written by expert corpus linguists, tackle corpora from a wide range of perspectives and aim to shed light on the numerous linguistic and p...
A Research Overview of Corpus-Assisted Enhancement of English Writing Proficiency
A Research Overview of Corpus-Assisted Enhancement of English Writing Proficiency
With the deep development of big data and artificial intelligence technologies, corpus research has garnered increasing attention and recognition. Initially, corpora were collectio...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Dot the pill down: Investigating the linguistic needs of foreign rugby players and lexicon of spoken rugby discourse
Dot the pill down: Investigating the linguistic needs of foreign rugby players and lexicon of spoken rugby discourse
<p>Traditionally a sport which is played predominantly in English speaking countries such as New Zealand, England, and Australia, rugby is gaining in popularity in other coun...
Phonetic Corpora
Phonetic Corpora
Technological advancements in recording, storage, and processing have reshaped the study of spoken language data over the past decades. Phonetic corpora are emerging as a key metho...
Etnolingvistiniai santykiai priešistorinėje Šiaurės rytų Europoje
Etnolingvistiniai santykiai priešistorinėje Šiaurės rytų Europoje
ETHNOLINGUISTIC SITUATION IN THE PREHISTORIC NORTH-EAST EUROPE
The hitherto known facts allow to state that in the period between the disintegration of Indo-European community an...

