Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

Initialize a Voice

To perform speech synthesis (TTS, text-to-speech) in the Microsoft Speech Platform, you first initialize a voice. A voice is an instance of a TTS engine that uses an installed Runtime Language to perform speech synthesis. A Runtime Language is represented in the registry by a token. See Speech Platform Overview for information about downloading Runtime Languages.

To initialize a TTS voice in the Speech Platform, you query the registry for the desired voice token, select the voice token, create a voice object, and set the voice token that voice object will use.

The Speech Platform provides helper functions that reduce the number of steps necessary to initialize a voice. You can use one or more helper functions to find, select, and create a voice using any of the following processes:

  • Enumerate voice tokens that match specified attributes
  • Find a single voice token that best matches specified attributes
  • Select the default voice token
  • Create a voice from the default voice token

Enumerate voice tokens that match specified attributes

SpEnumTokens returns a token enumerator containing all tokens from a specified category that match the specified required and optional attributes. When you Applications the Voices category, SpEnumTokens returns a list of voices ordered with the best matches listed first. In the following snippet, Language=409 is a required attribute, Gender=Female is an optional attribute.

`

CComPtr<IEnumSpObjectTokens> cpIEnum;
CComPtr<ISpObjectToken> cpToken;
CComPtr<ISpVoice> cpVoice;

// Enumerate voice tokens that speak US English in a female voice. hr = SpEnumTokens(SPCAT_VOICES, L"Language=409", L"Gender=Female;", &cpIEnum;);

// Get the best matching token. if(SUCCEEDED(hr)) { hr = cpIEnum->Next(1, &cpToken;, NULL); }

// Create a voice and set its token to the one we just found. if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }

// Set the voice. if(SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpToken); }

`

Back to top

Find a single voice token that best matches specified attributes

SpFindBestToken returns a single token from a specified category, in this case the Voices category, that best matches specified attributes. In the following snippet, Language=409 is a required attribute, VendorPreferred is an optional attribute.

`

// Find the best token to use for a voice that speaks US English, preferably female.
CComPtr<ISpObjectToken> cpVoiceToken;

hr = SpFindBestToken(SPCAT_VOICES, L"Language=409", L"VendorPreferred", &cpVoiceToken;);

// Create a voice and set its token to the one we just found. CComPtr<ISpVoice> cpVoice;

if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }

if (SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpVoiceToken); }

`

Back to top

Select the default voice token

SpGetDefaultTokenFromCategoryId creates a token object from the default token in a specified category, in this case the Voices category.

`

CComPtr<ISpObjectToken> cpVoiceToken;

if (SUCCEEDED(hr)) { hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpVoiceToken;); }

// Create a voice and set its token to the one we just found. CComPtr<ISpVoice> cpVoice;

if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }

// Set the voice. if(SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpVoiceToken); }

`

Back to top

Setting the default voice token

SpGetDefaultTokenFromCategoryId gets the default token from a specified category, in this case Voices. The following example first sets a French-speaking token (Hortense) as the default for the Voices category using ISpObjectTokenCategory::SetDefaultTokenId.

`

HRESULT hr = S_OK;
CComPtr<ISpObjectToken> cpVoiceToken;
CComPtr<ISpObjectTokenCategory> cpTokenCat;

// This is the category for which we want to set the default token. if (SUCCEEDED(hr)) { hr = SpGetCategoryFromId(SPCAT_VOICES, &cpTokenCat;); }

// Set the default token for the VOICES category to Hortense (French). if (SUCCEEDED(hr)) { hr = cpTokenCat->SetDefaultTokenId(L"HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices\Tokens\TTS_MS_fr-FR_Hortense_11.0"); }

// Get the token we just set as the default. if (SUCCEEDED(hr)) { hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpVoiceToken;); }

// Create a voice. CComPtr<ISpVoice> cpVoice;

if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }

// Set the voice to the retrieved token. if(SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpVoiceToken); }

`

Back to top

Each of the examples above is ready to speak text after you set the output and give a speak command, as follows:

`

// Set the output to the default audio device.
if(SUCCEEDED(hr))
{
hr = cpVoice->SetOutput(NULL, TRUE);
}

// Speak a string directly. if (SUCCEEDED(hr)) { hr = cpVoice->Speak(L"Hello world.", SPF_Default, 0); }

`

Note: Setting the output to the default audio device is useful for debugging. Typically, a production server application will write to a stream.