Javascript must be enabled to continue!

An overview of Microsoft’s Whistler text-to-speech system

The data-driven approach can significantly facilitate the process of creating text-to-speech (TTS) systems for a new language, a new voice, or a new style. As such, Whistler TTS engine was designed to benefit from automatically constructed model parameters. Efforts to improve Whistler with the use of additional training data and better learning algorithms that make full use of these data will be reviewed. Training data have been augmented for a number of speakers. To better use these data, the hidden Markov model speech recognition system has been used to segment the training corpora and select more representative acoustic units. The classification and regression tree was used for both grapheme to phoneme conversation and unseen triphone generalization. Speech signal reconstruction was based on the mixed excitation source-filter model that leads to better compression of the acoustic inventory. A number of ways to smooth the spectral parameters were also studied to minimize the concatenation distortion. To improve automatically extracted prosodic templates, the learning process was refined with an analysis-by-synthesis approach. However, the coverage remains a challenge for the data-driven approach to make Whistler produce synthetic speech that resembles the original speaker. This is especially true for the prosody model.

Acoustical Society of America (ASA)

X. D. Huang

The Journal of the Acoustical Society of America

2002

Title: An overview of Microsoft’s Whistler text-to-speech system

Description:

The data-driven approach can significantly facilitate the process of creating text-to-speech (TTS) systems for a new language, a new voice, or a new style.

As such, Whistler TTS engine was designed to benefit from automatically constructed model parameters.

Efforts to improve Whistler with the use of additional training data and better learning algorithms that make full use of these data will be reviewed.

Training data have been augmented for a number of speakers.

To better use these data, the hidden Markov model speech recognition system has been used to segment the training corpora and select more representative acoustic units.

The classification and regression tree was used for both grapheme to phoneme conversation and unseen triphone generalization.

Speech signal reconstruction was based on the mixed excitation source-filter model that leads to better compression of the acoustic inventory.

A number of ways to smooth the spectral parameters were also studied to minimize the concatenation distortion.

To improve automatically extracted prosodic templates, the learning process was refined with an analysis-by-synthesis approach.

However, the coverage remains a challenge for the data-driven approach to make Whistler produce synthetic speech that resembles the original speaker.

This is especially true for the prosody model.

Back

Foreign directed speech (FDS) is a listener directed speech style used when native speakers interact with non-native listeners of a language. This study considers if native and non...

Developmental Links Between Speech Perception in Noise, Singing, and Cortical Processing of Music in Children with Cochlear Implants

The perception of speech in noise is challenging for children with cochlear implants (CIs). Singing and musical instrument playing have been associated with improved auditory skill...

Surrogate Speech of the Asante Ivory Trumpeters of Ghana

Surrogate speech is a phonological system by which word tones of a spoken language are represented in tones produced on a musical instrument. Ethnomusicologists regard this as a mu...

Environmental Monitoring System by Using Unmanned Aerial Vehicle

This paper presents a reliable and low cost environmental monitoring system. The system uses an Unmanned Ariel Vehicle (UAV) equipped with a set of sensors, microcontroller, wirele...

Speech in “Paradise Lost”

ABSTRACT In the sixteenth and seventeenth centuries several treatises (religious, philosophical, and rhetorical) discussed the Fall of Man as involving a corruption ...

Boosting Speech-to-Text software potential

The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input. The effort is triggered by the growing popularity of the software amo...

In Memoriam: Ralph L. Vanderslice and Gunnar Fant

RALPH L. VANDERSLICE, who contributed to many areas of phonetics, died on 24 August 2008, aged 78, in Portland, Oregon. He was born on 2 January 1930 in South Bend, Indiana. He rec...

Noise Levels on Aircraft Carrier Flight Decks and Their Effects on Humans

Measurements were made of noise levels produced by four aircraft during pilot qualification exercises aboard the flight deck of USS KITTY HAWK. These measurements, on both the A- a...

Email:
Password:

Email:

An overview of Microsoft’s Whistler text-to-speech system

Related Results