Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Speaker-Aware Simulation Improves Conversational Speech Recognition

View through CrossRef
Abstract Automatic speech recognition (ASR) for conversational speech remains challenging due to the limited availability of large-scale, well-annotated multi-speaker dialogue data and the complex temporal dynamics of natural interactions. Speaker-aware simulated conversations (SASC) offer an effective data augmentation strategy by transforming single-speaker recordings into realistic multi-speaker dialogues. However, prior work has primarily focused on English data, leaving questions about the applicability to lower-resource languages. In this paper, we adapt and implement the SASC framework for Hungarian conversational ASR. We further propose C-SASC, an extended variant that incorporates pause modeling conditioned on utterance duration, enabling a more faithful representation of local temporal dependencies observed in human conversation while retaining the simplicity and efficiency of the original approach. We generate synthetic Hungarian dialogues from the BEA-Large corpus and combine them with real conversational data for ASR training. Both SASC and C-SASC are evaluated extensively under a wide range of simulation configurations, using conversational statistics derived from CallHome, BEA-Dialogue, and GRASS corpora. Experimental results show that speaker-aware conversational simulation consistently improves recognition performance over naive concatenation-based augmentation. While the additional duration conditioning in C-SASC yields modest but systematic gains--most notably in character-level error rates--its effectiveness depends on the match between source conversational statistics and the target domain. Overall, our findings confirm the robustness of speaker-aware conversational simulation for Hungarian ASR and highlight the benefits and limitations of increasingly detailed temporal modeling in synthetic dialogue generation.
Springer Science and Business Media LLC
Title: Speaker-Aware Simulation Improves Conversational Speech Recognition
Description:
Abstract Automatic speech recognition (ASR) for conversational speech remains challenging due to the limited availability of large-scale, well-annotated multi-speaker dialogue data and the complex temporal dynamics of natural interactions.
Speaker-aware simulated conversations (SASC) offer an effective data augmentation strategy by transforming single-speaker recordings into realistic multi-speaker dialogues.
However, prior work has primarily focused on English data, leaving questions about the applicability to lower-resource languages.
In this paper, we adapt and implement the SASC framework for Hungarian conversational ASR.
We further propose C-SASC, an extended variant that incorporates pause modeling conditioned on utterance duration, enabling a more faithful representation of local temporal dependencies observed in human conversation while retaining the simplicity and efficiency of the original approach.
We generate synthetic Hungarian dialogues from the BEA-Large corpus and combine them with real conversational data for ASR training.
Both SASC and C-SASC are evaluated extensively under a wide range of simulation configurations, using conversational statistics derived from CallHome, BEA-Dialogue, and GRASS corpora.
Experimental results show that speaker-aware conversational simulation consistently improves recognition performance over naive concatenation-based augmentation.
While the additional duration conditioning in C-SASC yields modest but systematic gains--most notably in character-level error rates--its effectiveness depends on the match between source conversational statistics and the target domain.
Overall, our findings confirm the robustness of speaker-aware conversational simulation for Hungarian ASR and highlight the benefits and limitations of increasingly detailed temporal modeling in synthetic dialogue generation.

Related Results

Funkcije komunikacijski relevantne šutnje u njemačkome
Funkcije komunikacijski relevantne šutnje u njemačkome
Additionally, this chapter presents research of silence with review of main aspects of papers in the field of conversational analysis, ethnography of communication and metaphor of ...
Speaker Verification and Identification
Speaker Verification and Identification
A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human m...
Quarantine Powers, Biodefense, and Andrew Speaker
Quarantine Powers, Biodefense, and Andrew Speaker
In January 2007, “Andrew Speaker (“Speaker”) underwent a chest X-ray and CT scan, which revealed an abnormality in his lungs.” However, tests results indicated that he did not ha...
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion
Abstract The performance of speaker recognition is very well in a clean dataset or without mismatch between training and test set. However, the performance is degraded with...
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition
Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition
Abstract Speaker recognition has crucial application in forensic science, financial areas, access control, surveillance and law enforcement. The performance of speaker reco...
State-of-the-art in Open-domain Conversational AI: A Survey
State-of-the-art in Open-domain Conversational AI: A Survey
We survey SoTA open-domain conversational AI models with the purpose of presenting the prevailing challenges that still exist to spur future research. In addition, we provide stati...

Back to Top