Authoring Speech experiences

APPLIES TO: Composer v1.x and v2.x

Bots are able to communicate over speech based channels, including Telephony, enabling a bot to work in contact center/IVR scenarios, and Direct Line Speech, enabling speech experiences in Web Chat, or via embedded devices.

Bots can use text-to-speech (also known as speech synthesis, and referred to as speech in this article) to convert text to human-like synthesized speech. Text is converted to a phonemic representation (the individual components of speech sounds) which are converted to waveforms that are output as speech. Composer uses Speech Synthesis Markup Language (SSML), an XML-based markup language that lets developers specify how input text is converted into synthesized speech. SSML gives developers the ability to customize different aspects of speech, like pitch, pronunciation, rate of speech, and more.

This modality lets developers create bots in Composer that can not only respond visually with text, but also audibly with speech. Bot developers and designers can create bots with a variety of voices in different languages with the speech middleware and using SSML tags.

Add Speech components to your bot responses

It's important to ensure that bot responses are optimized for the channels that they will be available on. For example, a welcome message written in text along with an Adaptive Card attachment will not be suitable when sent via a speech capable channel. For this reason, bot responses can contain both text and speech responses, with the speech response used by the channel when required.

Using the response editor, bot developers can easily add speech components to bots and customize them with SSML tags.

To add speech to a bot, complete the following steps:

  1. Open a bot project and add a Send a response action to one of your dialogs. Enter text in the Text box for a fallback text response.

  2. Now click the + next to Text. You will see three options: Speech, Attachments, and Suggested Actions. Select Speech.

    Add speech response gif

  3. When speech is added you will see Input hint: accepting next to Response variations. Select Input hint: accepting to see all of the available input hints:

    • Accepting: Indicates that your bot is passively ready for input but is not awaiting a response from the user (this is the default value if no specific InputHint value is set).
    • Ignoring: Indicates that your bot isn't ready to receive input from the user.
    • Excepting: Indicates that your bot is actively awaiting a response from the user

    For more information, see the Bot Framework SDK article Add input hints to messages with the Bot Connector API.

  4. You can add SSML tags to your speech component to customize your speech output. Select SSML tag in the command bar to see the SSML tag options.

    SSML tags

    Composer supports the following SSML tags:

    • break: Inserts pauses (or breaks) between words, or prevent pauses automatically added by the text-to-speech service.
    • prosody: Specifies changes to pitch, contour, range, rate, duration, and volume for the text-to-speech output.
    • audio: Allows you to insert MP3 audio into an SSML document.

    For more information, see the Improve synthesis with Speech Synthesis Markup Language (SSML) article.

In order for speech responses to work correctly on some channels, including Telephony and Direct Line Speech channels, there are some required SSML tags that must be present.

  • Speak - required to enable use of SSML tags.
  • Voice - defines the voice font that will be used when responses are read out by the Telephony or Direct Line Speech channels.

Tip

Visit the language and voice support for the Speech Service documentation to see a list of supported voice fonts. It's recommended that you use neural voice fonts, where available, as these sound particularly human-like.

Composer makes it as easy as possible for bot builders to develop speech applications, automatically including these SSML tags on all outgoing responses, with the ability to modify related properties in the Composer runtime settings.

To access the speech related settings, complete the following steps:

  1. Open a Composer bot project and select the Project Settings on the Navigation pane on the left

  2. Select Advanced Settings View (json) to show the JSON view of the project settings. There are two relevant speech sections, shown below.

speech middleware

  • "voiceFontName": "en-US-AriaNeural": Determines the voiceFontName your bot will use to speak, and the default is en-US-AriaNeural. You can customize this using any of the available voices and locales appropriate for you bot.
  • "fallbackToTextForSpeechIfEmpty": true: Determines whether text will be used if speech is empty, and the default is true. If you don't add SSML tags to your speech, there will be silence and instead the text will be displayed as a fallback message. To turn this off, set this to false.

Note

If you need to disable the speak / voice SSML tags being applied to all responses, you can do this by removing the setSpeak element from your bot settings completely. This will disable the related middleware within the runtime.

Connect to channels

Speech is supported by the Telephony and Direct Line Speech channels within Azure Bot Service. For information about connecting a bot to channels that support voice, see the appropriate document below:

Test speech

To test speech capabilities in your bot, connect your bot to one of the aforementioned channels and use the channel's native communication method to test your bot. For example, for the Telephony channel you would call the bot's configured phone number.

To inspect the responses being sent by your bot, including speech specific responses containing the automatically added SSML tags, plus any that you have manually added, do the following.

  1. Go to a bot project and create a few activities that have text and speech.
  2. Start your bot and test it in the Emulator. You will see SSML elements appear in the inspect activity JSON.