Javascript must be enabled to continue!
A study of prosodic features for Indonesian speech recognition
View through CrossRef
Utterance-type information has been used been used in spoken dialogue system, speech recognition system and translation machine. In a typical spoken dialogue system, a user can ask question or give information to the system. In another side, the spoken dialogue system should be capable of recognizing its user intention to give the correct response to him/her. In this dissertation, the automatic utterance-type recognizer is proposed to distinguish declarative questions from statements in Indonesian speech. Since utterances in these two types have the same words with the same order and differ only in their intonations, their classification requires not only a word recognizer, but also an intonation recognizer. At first, the utterance-type recognizer is designed based on Fujisaki model. The utterance-type recognizer uses a combination of the Fujisaki-model-parameters as the features to recognizt the two utterance type. The best performance of the Fujisaki model based utterance-type recognizer is achieved using a combination of a fraction value of F[subscript b] : F[subscript b]/100 the amplitude of last accent command, and the magnitude of last phrase command as the input of the neural neetworks. However, the Fujisaki parameters extractor is too complicated to be implemented in an automatic recognition system. Therefore, the utterance-type recognizer is developed using the polynomial coefficients of the pitch contours of the sentence's final word. The automatic utterance-type recognizer using polynomial expansion consists of a pitch contour extractor, normalizer, feature extractor, classifier, and an automatic utterance segmentation module. The pitch contour of each utterance type i analyzed to investigate the final word of the two utterance type. To create the automatic utterance segmentation module, an Indonesian acoustic model is designed. The evaluation confirms that the method using the final word and polynomial expansion is effective to distinguish declarative questions and statements in Indonesian speech.
Title: A study of prosodic features for Indonesian speech recognition
Description:
Utterance-type information has been used been used in spoken dialogue system, speech recognition system and translation machine.
In a typical spoken dialogue system, a user can ask question or give information to the system.
In another side, the spoken dialogue system should be capable of recognizing its user intention to give the correct response to him/her.
In this dissertation, the automatic utterance-type recognizer is proposed to distinguish declarative questions from statements in Indonesian speech.
Since utterances in these two types have the same words with the same order and differ only in their intonations, their classification requires not only a word recognizer, but also an intonation recognizer.
At first, the utterance-type recognizer is designed based on Fujisaki model.
The utterance-type recognizer uses a combination of the Fujisaki-model-parameters as the features to recognizt the two utterance type.
The best performance of the Fujisaki model based utterance-type recognizer is achieved using a combination of a fraction value of F[subscript b] : F[subscript b]/100 the amplitude of last accent command, and the magnitude of last phrase command as the input of the neural neetworks.
However, the Fujisaki parameters extractor is too complicated to be implemented in an automatic recognition system.
Therefore, the utterance-type recognizer is developed using the polynomial coefficients of the pitch contours of the sentence's final word.
The automatic utterance-type recognizer using polynomial expansion consists of a pitch contour extractor, normalizer, feature extractor, classifier, and an automatic utterance segmentation module.
The pitch contour of each utterance type i analyzed to investigate the final word of the two utterance type.
To create the automatic utterance segmentation module, an Indonesian acoustic model is designed.
The evaluation confirms that the method using the final word and polynomial expansion is effective to distinguish declarative questions and statements in Indonesian speech.
Related Results
Discontinuous noun phrases in Vietnamese
Discontinuous noun phrases in Vietnamese
Since Vietnamese is an isolating language, word order plays an important role in identifying the function of a particular word. Yet in some contexts word order may be flexible espe...
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study
The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study
Background: According to Vygotsky’s sociocultural theory, private speech functions both as a tool for thought regulation and as a transitional form between outer and inner speech. ...
DDA-MSLD: A Multi-Feature Speech Lie Detection Algorithm Based on a Dual-Stream Deep Architecture
DDA-MSLD: A Multi-Feature Speech Lie Detection Algorithm Based on a Dual-Stream Deep Architecture
Speech lie detection is a technique that analyzes speech signals in detail to determine whether a speaker is lying. It has significant application value and has attracted attention...
Robust speech recognition based on deep learning for sports game review
Robust speech recognition based on deep learning for sports game review
Abstract
To verify the feasibility of robust speech recognition based on deep learning in sports game review. In this paper, a robust speech recognition model is bui...
Identifying Links Between Latent Memory and Speech Recognition Factors
Identifying Links Between Latent Memory and Speech Recognition Factors
Objectives:
The link between memory ability and speech recognition accuracy is often examined by correlating summary measures of performance across various tasks, but i...

