About voice assistants

Voice assistants using the Speech service empowers developers to create natural, human-like conversational interfaces for their applications and experiences.

The voice assistant service provides fast, reliable interaction between a device and an assistant implementation that uses either (1) the Bot Framework's Direct Line Speech channel or (2) the integrated Custom Commands (Preview) service for task completion.

Applications connect to the voice assistant service with the Speech Software Development Kit (SDK).

Conceptual diagram of the voice assistant orchestration service flow

Choosing an assistant solution

The first step to creating a voice assistant is to decide what it should do. The Speech service provides multiple, complementary solutions for crafting your assistant interactions. Whether you want the flexibility and versatility that the Bot Framework's Direct Line Speech channel provides or the simplicity of Custom Commands (Preview) for straightforward scenarios, selecting the right tools will get you started.

If you want... Then consider... For example...
Open-ended conversation with robust skills integration and full deployment control The Bot Framework's Direct Line Speech channel
  • "I need to go to Seattle"
  • "What kind of pizza can I order?"
Command and control or task-oriented conversation with simplified authoring and hosting Custom Commands (Preview)
  • "Turn on the overhead light"
  • "Make it 5 degrees warmer"

We recommend Direct Line Speech as the best default choice if you aren't yet sure what you'd like your assistant to handle. It offers integration with a rich set of tools and authoring aids such as the Virtual Assistant Solution and Enterprise Template and the QnA Maker service to build on common patterns and use your existing knowledge sources.

Custom Commands (Preview) provides a streamlined authoring and hosting experience specifically tailored for natural language command and control scenarios.

Comparison of assistant solutions

Core features

Whether you choose Direct Line Speech or Custom Commands (Preview) to create your assistant interactions, you can use a rich set of customization features to customize your assistant to your brand, product, and personality.

Category Features
Custom keyword Users can start conversations with assistants with a custom keyword like “Hey Contoso.” An app does this with a custom keyword engine in the Speech SDK, which can be configured with a custom keyword that you can generate here. Voice assistants can use service-side keyword verification to improve the accuracy of the keyword activation (versus the device alone).
Speech to text Voice assistants convert real-time audio into recognized text using Speech-to-text from the Speech service. This text is available, as it's transcribed, to both your assistant implementation and your client application.
Text to speech Textual responses from your assistant are synthesized using Text-to-speech from the Speech service. This synthesis is then made available to your client application as an audio stream. Microsoft offers the ability to build your own custom, high-quality Neural TTS voice that gives a voice to your brand. To learn more, contact us.

Getting started with voice assistants

We offer quickstarts designed to have you running code in less than 10 minutes. This table includes a list of voice assistant quickstarts, organized by language.

Quickstart Platform API reference
C#, UWP Windows Browse
Java Windows, macOS, Linux Browse
Java Android Browse

Sample code

Sample code for creating a voice assistant is available on GitHub. These samples cover the client application for connecting to your assistant in several popular programming languages.

Tutorial

A tutorial on how to voice-enable your assistant using the Speech SDK and Direct Line Speech channel.

Customization

Voice assistants built using the Speech service can use the full range of customization options available for speech-to-text, text-to-speech, and custom keyword selection.

Note

Customization options vary by language/locale (see Supported languages).

Reference docs

Next steps