Javascript must be enabled to continue!
Boosting Speech-to-Text software potential
View through CrossRef
The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input. The effort is triggered by the growing popularity of the software among professional translators, which is in line with the general trend of abandoning typing in favor of speech-to-text applications. Insisting that better effectiveness of such programs is contingent on their accuracy, the researchers analyze major factors, both linguistic and technical in nature, affecting the computer-assisted speech transcribing quality. This leads to an experiment, putting the hypothesis to a test. Based on numerical and performance data, errors and their breakdown into categories in an attempt to figure out their origins, it dwells on various approaches to dictation in a combination with several hardware options and configurations. These pave the way for recommendations on the improvement of STT performance based on the Dragon software. The authors arrive at a conclusion that it is possible to boost the STT accuracy up to 99 percent by adjusting the program profile to accommodate phonetic features of the speaker with due consideration of his accent, adding to the dictionary the most complex and rare vocabulary beforehand, and fine-tuning input hardware. Other noteworthy results include ways to overcome the most complex transcribing challenges, i.e. proper names, placenames, abbreviations, etc.
Belgorod National Research University
Title: Boosting Speech-to-Text software potential
Description:
The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input.
The effort is triggered by the growing popularity of the software among professional translators, which is in line with the general trend of abandoning typing in favor of speech-to-text applications.
Insisting that better effectiveness of such programs is contingent on their accuracy, the researchers analyze major factors, both linguistic and technical in nature, affecting the computer-assisted speech transcribing quality.
This leads to an experiment, putting the hypothesis to a test.
Based on numerical and performance data, errors and their breakdown into categories in an attempt to figure out their origins, it dwells on various approaches to dictation in a combination with several hardware options and configurations.
These pave the way for recommendations on the improvement of STT performance based on the Dragon software.
The authors arrive at a conclusion that it is possible to boost the STT accuracy up to 99 percent by adjusting the program profile to accommodate phonetic features of the speaker with due consideration of his accent, adding to the dictionary the most complex and rare vocabulary beforehand, and fine-tuning input hardware.
Other noteworthy results include ways to overcome the most complex transcribing challenges, i.
e.
proper names, placenames, abbreviations, etc.
Related Results
Perception advantages of foreign directed speech
Perception advantages of foreign directed speech
Foreign directed speech (FDS) is a listener directed speech style used when native speakers interact with non-native listeners of a language. This study considers if native and non...
Developmental Links Between Speech Perception in Noise, Singing, and Cortical Processing of Music in Children with Cochlear Implants
Developmental Links Between Speech Perception in Noise, Singing, and Cortical Processing of Music in Children with Cochlear Implants
The perception of speech in noise is challenging for children with cochlear implants (CIs). Singing and musical instrument playing have been associated with improved auditory skill...
Surrogate Speech of the Asante Ivory Trumpeters of Ghana
Surrogate Speech of the Asante Ivory Trumpeters of Ghana
Surrogate speech is a phonological system by which word tones of a spoken language are represented in tones produced on a musical instrument. Ethnomusicologists regard this as a mu...
Free Software Beyond Radical Politics: Negotiations of Creative and Craft Autonomy in Digital Visual Media Production
Free Software Beyond Radical Politics: Negotiations of Creative and Craft Autonomy in Digital Visual Media Production
Free software development and the technological practices of hackers have been broadly recognised as fundamental for the formation of political cultures that foster democracy in th...
Speech in “Paradise Lost”
Speech in “Paradise Lost”
ABSTRACT
In the sixteenth and seventeenth centuries several treatises (religious, philosophical, and rhetorical) discussed the Fall of Man as involving a corruption ...
An overview of Microsoft’s Whistler text-to-speech system
An overview of Microsoft’s Whistler text-to-speech system
The data-driven approach can significantly facilitate the process of creating text-to-speech (TTS) systems for a new language, a new voice, or a new style. As such, Whistler TTS en...
Selective auditory attention modulates cortical responses to sound location change for speech in quiet and in babble
Selective auditory attention modulates cortical responses to sound location change for speech in quiet and in babble
AbstractListeners use the spatial location or change in spatial location of coherent acoustic cues to aid in auditory object formation. From stimulus-evoked onset responses in norm...
Extraction of Color Information and Visualization of Color Differences between Digital Images through Pixel-by-Pixel Color-Difference Mapping
Extraction of Color Information and Visualization of Color Differences between Digital Images through Pixel-by-Pixel Color-Difference Mapping
A novel method of extracting color information on a pixel-by-pixel basis or by the average of the regions of interest (ROIs) from digital images is proposed and demonstrated using ...