Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A neural network based lexical stress pattern classifier

View through CrossRef
Background and Objectives: In dysprosodic speech, the prosody does not match the expected intonation pattern and can result in robotic-like speech, with each syllable produced with equal stress. These errors are manifested through inconsistent lexical stress as measured by perceptual judgments and/or acoustic variables. Lexical stress is produced through variations in syllable duration, peak intensity and fundamental frequency. The presented technique automatically evaluates the unequal lexical stress patterns Strong-Weak (SW) and Week-Strong (WS) in American English continuous speech production based upon a multi-layer feed forward neural network with seven acoustic features chosen to target the lexical stress variability between two consecutive syllables. Methods: The speech corpus used in this work is the PTDB-TUG. Five females and three males were chosen to form a training set and one female and one male for testing. The CMU pronouncing dictionary with lexical stress levels marked was used to assign stress levels to each syllable in all words in the speech corpus. Lexical stress is phonetically realized through the manipulation of signal intensity, the fundamental frequency (F0) and its dynamics and the syllable/vowel duration. The nucleus duration, syllable duration, mean pitch, maximum pitch over nucleus, the peak-to-peak amplitude integral over syllable nucleus, energy mean and maximum energy over nucleus were calculated for each syllable in the collected speech. As lexical stress errors are identified by evaluating the variability between consecutive syllables in a word, we computed the pairwise variability index ("PVI") for each acoustic measure. The PVI for any acoustic feature f_i is given by: PVI_i= (f_i1-f_i2)/(( f_i1+f_i2)/2)(1), where f_i1,f_i2 are the acoustic features of the first and second syllables consecutively. A multi-layer feed forward neural network which consisted of input, hidden and output layers was used to classify the stress patterns in the words in the database. Results: The presented system had an overall accuracy of 87.6%. It correctly classified 92.4% of the SW stress patterns and 76.5% of the WS stress pattern. Conclusions: A feed-forward neural network was used to classify between the SW and WS stress patterns in American English continuous speech with overall accuracy around 87 percent.
Title: A neural network based lexical stress pattern classifier
Description:
Background and Objectives: In dysprosodic speech, the prosody does not match the expected intonation pattern and can result in robotic-like speech, with each syllable produced with equal stress.
These errors are manifested through inconsistent lexical stress as measured by perceptual judgments and/or acoustic variables.
Lexical stress is produced through variations in syllable duration, peak intensity and fundamental frequency.
The presented technique automatically evaluates the unequal lexical stress patterns Strong-Weak (SW) and Week-Strong (WS) in American English continuous speech production based upon a multi-layer feed forward neural network with seven acoustic features chosen to target the lexical stress variability between two consecutive syllables.
Methods: The speech corpus used in this work is the PTDB-TUG.
Five females and three males were chosen to form a training set and one female and one male for testing.
The CMU pronouncing dictionary with lexical stress levels marked was used to assign stress levels to each syllable in all words in the speech corpus.
Lexical stress is phonetically realized through the manipulation of signal intensity, the fundamental frequency (F0) and its dynamics and the syllable/vowel duration.
The nucleus duration, syllable duration, mean pitch, maximum pitch over nucleus, the peak-to-peak amplitude integral over syllable nucleus, energy mean and maximum energy over nucleus were calculated for each syllable in the collected speech.
As lexical stress errors are identified by evaluating the variability between consecutive syllables in a word, we computed the pairwise variability index ("PVI") for each acoustic measure.
The PVI for any acoustic feature f_i is given by: PVI_i= (f_i1-f_i2)/(( f_i1+f_i2)/2)(1), where f_i1,f_i2 are the acoustic features of the first and second syllables consecutively.
A multi-layer feed forward neural network which consisted of input, hidden and output layers was used to classify the stress patterns in the words in the database.
Results: The presented system had an overall accuracy of 87.
6%.
It correctly classified 92.
4% of the SW stress patterns and 76.
5% of the WS stress pattern.
Conclusions: A feed-forward neural network was used to classify between the SW and WS stress patterns in American English continuous speech with overall accuracy around 87 percent.

Related Results

Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Background and Objectives: As English is a stress-timed language, lexical stress plays an important role in the perception and processing of speech by native speakers. Incorrect st...
Lukijat sanaston monimuotoisuutta määrittämässä
Lukijat sanaston monimuotoisuutta määrittämässä
Artikkelissa tarkastellaan leksikaalisen diversiteetin eli tekstin sanastollisen monimuotoisuuden rakentumista. Tavoitteena on esitellä leksikaalisen diversiteetin tutkimuksen meto...
Numeral Classifiers Used in the Cookbooks
Numeral Classifiers Used in the Cookbooks
<p>This article is aimed at describing numeral classifier used in the cookbooks. The data were collected through the observation, which is observation of the cookbooks. Throu...
The Lexical Bias Effect during Speech Production in the First and Second Language
The Lexical Bias Effect during Speech Production in the First and Second Language
The lexical bias effect is the tendency for people to make phonological speech errors that result in existing words. Several studies have argued that this effect arises from a comb...
Modified neural networks for rapid recovery of tokamak plasma parameters for real time control
Modified neural networks for rapid recovery of tokamak plasma parameters for real time control
Two modified neural network techniques are used for the identification of the equilibrium plasma parameters of the Superconducting Steady State Tokamak I from external magnetic mea...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
An Analysis of the Impact of Deviatoric Stress and Spherical Stress on the Stability of Surrounding Rocks in Roadway
An Analysis of the Impact of Deviatoric Stress and Spherical Stress on the Stability of Surrounding Rocks in Roadway
In this study, a detailed analysis was conducted to evaluate the impacts of the deviatoric stress component and spherical stress component on the stability of surrounding rocks in ...
PENGETAHUAN MAHASISWA TATA BUSANA TENTANG ZERO WASTE PATTERN
PENGETAHUAN MAHASISWA TATA BUSANA TENTANG ZERO WASTE PATTERN
Textile waste is one of the 2nd largest types of waste in the world. The increasing amount of textile waste will have an impact on the environment. There has not been much developm...

Back to Top