Microsoft Speech Platform

Initialize a Voice

To perform speech synthesis (TTS, text-to-speech) in the Microsoft Speech Platform, you first initialize a voice. A voice is an instance of a TTS engine that uses an installed Runtime Language to perform speech synthesis. A Runtime Language is represented in the registry by a token. See Speech Platform Overview for information about downloading Runtime Languages.

To initialize a TTS voice in the Speech Platform, you query the registry for the desired voice token, select the voice token, create a voice object, and set the voice token that voice object will use.

The Speech Platform provides helper functions that reduce the number of steps necessary to initialize a voice. You can use one or more helper functions to find, select, and create a voice using any of the following processes:

Enumerate voice tokens that match specified attributes
Find a single voice token that best matches specified attributes
Select the default voice token
Create a voice from the default voice token

Enumerate voice tokens that match specified attributes

SpEnumTokens returns a token enumerator containing all tokens from a specified category that match the specified required and optional attributes. When you Applications the Voices category, SpEnumTokens returns a list of voices ordered with the best matches listed first. In the following snippet, Language=409 is a required attribute, Gender=Female is an optional attribute.

CComPtr<IEnumSpObjectTokens> cpIEnum;
CComPtr<ISpObjectToken> cpToken;
CComPtr<ISpVoice> cpVoice;
// Enumerate voice tokens that speak US English in a female voice.
hr = SpEnumTokens(SPCAT_VOICES, L"Language=409", L"Gender=Female;", &cpIEnum;);
// Get the best matching token.
if(SUCCEEDED(hr))
{
hr = cpIEnum->Next(1, &cpToken;, NULL);
}
// Create a voice and set its token to the one we just found.
if (SUCCEEDED(hr))
{
hr = cpVoice.CoCreateInstance(CLSID_SpVoice);
}
// Set the voice.
if(SUCCEEDED(hr))
{
hr = cpVoice->SetVoice(cpToken);
}

Find a single voice token that best matches specified attributes

SpFindBestToken returns a single token from a specified category, in this case the Voices category, that best matches specified attributes. In the following snippet, Language=409 is a required attribute, VendorPreferred is an optional attribute.

// Find the best token to use for a voice that speaks US English, preferably female.
CComPtr<ISpObjectToken> cpVoiceToken;
hr = SpFindBestToken(SPCAT_VOICES, L"Language=409", L"VendorPreferred", &cpVoiceToken;);
// Create a voice and set its token to the one we just found.
CComPtr<ISpVoice> cpVoice;
if (SUCCEEDED(hr))
{
hr = cpVoice.CoCreateInstance(CLSID_SpVoice);
}
if (SUCCEEDED(hr))
{
hr = cpVoice->SetVoice(cpVoiceToken);
}

Select the default voice token

SpGetDefaultTokenFromCategoryId creates a token object from the default token in a specified category, in this case the Voices category.

CComPtr<ISpObjectToken> cpVoiceToken;
if (SUCCEEDED(hr))
{
hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpVoiceToken;);
}
// Create a voice and set its token to the one we just found.
CComPtr<ISpVoice> cpVoice;
if (SUCCEEDED(hr))
{
hr = cpVoice.CoCreateInstance(CLSID_SpVoice);
}
// Set the voice.
if(SUCCEEDED(hr))
{
hr = cpVoice->SetVoice(cpVoiceToken);
}

Setting the default voice token

SpGetDefaultTokenFromCategoryId gets the default token from a specified category, in this case Voices. The following example first sets a French-speaking token (Hortense) as the default for the Voices category using ISpObjectTokenCategory::SetDefaultTokenId.

HRESULT hr = S_OK;
CComPtr<ISpObjectToken> cpVoiceToken;
CComPtr<ISpObjectTokenCategory> cpTokenCat;
// This is the category for which we want to set the default token.
if (SUCCEEDED(hr))
{
hr = SpGetCategoryFromId(SPCAT_VOICES, &cpTokenCat;);
}
// Set the default token for the VOICES category to Hortense (French).
if (SUCCEEDED(hr))
{
hr = cpTokenCat->SetDefaultTokenId(L"HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices\Tokens\TTS_MS_fr-FR_Hortense_11.0");
}
// Get the token we just set as the default.
if (SUCCEEDED(hr))
{
hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpVoiceToken;);
}
// Create a voice.
CComPtr<ISpVoice> cpVoice;
if (SUCCEEDED(hr))
{
hr = cpVoice.CoCreateInstance(CLSID_SpVoice);
}
// Set the voice to the retrieved token.
if(SUCCEEDED(hr))
{
hr = cpVoice->SetVoice(cpVoiceToken);
}

Each of the examples above is ready to speak text after you set the output and give a speak command, as follows:

// Set the output to the default audio device.
if(SUCCEEDED(hr))
{
hr = cpVoice->SetOutput(NULL, TRUE);
}
// Speak a string directly.
if (SUCCEEDED(hr))
{
hr = cpVoice->Speak(L"Hello world.", SPF_Default, 0);
}

Note: Setting the output to the default audio device is useful for debugging. Typically, a production server application will write to a stream.