Producing Speech Output

Article
08/28/2007

Producing Speech Output

The Speech Engine Services (SES) engines interpret Speech Synthesis Markup Language (SSML) from a speech application to produce audio that the user can play. SES uses two engines to produce this output: a prompt engine and a text-to-speech (TTS) synthesis engine.

Prompt Engine

The SES prompt engine compares SSML to a database of prerecorded .wav files, or prompts. (This database is a component of the application.)

To generate speech, the prompt engine searches the prompt database for a match to the text it receives from the application. It may concatenate several prompts to produce the complete output. If the prompt engine cannot match any part of the text to a prerecorded prompt, it sends that word or phrase to the text-to-speech synthesis engine for processing.

Note The prompt engine provided with Microsoft Speech Server (MSS) supports only SSML, and not SAPI TTS markup. The SSML supported by MSS and implemented in the Microsoft Speech Application SDK Version 1.1 (SASDK) is based on the World Wide Web Consortium Speech Synthesis Markup Language Specification Version 1.0 (W3C SSMLS) Working Draft of April 5, 2002.

Text-to-Speech Synthesis Engine

SpeechWorks Speechify, the text-to-speech synthesis engine used by SES, is provided by ScanSoft, Inc.

When the prompt engine passes text phrases not found in the prompt database to Speechify, the engine uses speech-synthesis techniques to approximate the audio stream for a human voice reading the source text. For U.S. English, "Jill" is the female voice (default), and "Tom" is the male voice. For information about how to change the default voice, see Modifying the Default Voice. When additional language packs are installed, different voices are provided. For Spanish (U.S.), "Javier" is the male voice and "Paulina" is the female voice. For French (Canada), "Felix" is the male voice.

By default, the text-to-speech volume (amplitude) is set to 30 percent of the maximum volume but can be adjusted through Speechify. For example, if the volume is set too high, you may experience a speech echo that produces an unintentional barge-in (interruption in a system prompt).

Note If you modify the TTS volume, you may need to change the volume in your prompt databases to match. See "Managing Prompt Databases" in the Speech Application SDK Help.

For more information on Speechify, see the Speechify User's Guide. Speechify and the user's guide are installed by default during MSS Setup. To access them, click Start, point to All Programs, and then click Speechify.

See Also

Processing Speech Recognition

Producing Speech Output