Use personal voice (preview) in your application
Note
Personal voice for text to speech is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
You can use the speaker profile ID for your personal voice to synthesize speech in any of the 91 languages supported across 100+ locales. A locale tag isn't required. Personal voice uses automatic language detection at the sentence level.
Integrate personal voice in your application
You need to use speech synthesis markup language (SSML) to use personal voice in your application. SSML is an XML-based markup language that provides a standard way to mark up text for the generation of synthetic speech. SSML tags are used to control the pronunciation, volume, pitch, rate, and other attributes of the speech synthesis output.
The
speakerProfileId
property in SSML is used to specify the speaker profile ID for the personal voice.The voice name is specified in the
name
property in SSML. For personal voice, the voice name must be one of the supported base model voice names. To get a list of supported base model voice names, use the BaseModels_List operation of the custom voice API.Note
The voice names labeled with the
Latest
, such asDragonLatestNeural
orPhoenixLatestNeural
, will be updated from time to time; its performance may vary with updates for ongoing improvements. If you would like to use a fixed version, select one labeled with a version number, such asPhoenixV2Neural
.DragonLatestNeural
is a base model with superior voice cloning similarity compared toPhoenixLatestNeural
.PhoenixLatestNeural
is a base model with more accurate pronunciation and lower latency thanDragonLatestNeural
.Dragon
model doesn't support<lang xml:lang>
element in SSML.
Here's example SSML in a request for text to speech with the voice name and the speaker profile ID.
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-US'>
<voice name='DragonLatestNeural'>
<mstts:ttsembedding speakerProfileId='your speaker profile ID here'>
I'm happy to hear that you find me amazing and that I have made your trip planning easier and more fun. 我很高兴听到你觉得我很了不起,我让你的旅行计划更轻松、更有趣。Je suis heureux d'apprendre que vous me trouvez incroyable et que j'ai rendu la planification de votre voyage plus facile et plus amusante.
</mstts:ttsembedding>
</voice>
</speak>
You can use the SSML via the Speech SDK or REST API.
- Real-time speech synthesis: Use the Speech SDK or REST API to convert text to speech.
- When you use Speech SDK, don't set Endpoint Id, just like prebuild voice.
- When you use REST API, please use prebuilt neural voices endpoint.
Reference documentation
Next steps
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for