What is the Speech service?

The Speech service is the unification of speech-to-text, text-to-speech, and speech translation into a single Azure subscription. It's easy to speech enable your applications, tools, and devices with the Speech CLI, Speech SDK, Speech Studio, or REST APIs.

Important

The Speech service has replaced the Bing Speech API and Translator Speech. For migration instructions, see the Migration section.

The following features are part of the Speech service. Use the links in this table to learn more about common use-cases for each feature. You can also browse the API reference.

Service Feature Description SDK REST
Speech-to-text Real-time speech-to-text Speech-to-text transcribes or translates audio streams or local files to text in real time that your applications, tools, or devices can consume or display. Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. Yes Yes
Batch speech-to-text Batch speech-to-text enables asynchronous speech-to-text transcription of large volumes of speech audio data stored in Azure Blob Storage. In addition to converting speech audio to text, batch speech-to-text allows for diarization and sentiment analysis. No Yes
Multidevice conversation Connect multiple devices or clients in a conversation to send speech- or text-based messages, with easy support for transcription and translation. Yes No
Conversation transcription Enables real-time speech recognition, speaker identification, and diarization. It's perfect for transcribing in-person meetings with the ability to distinguish speakers. Yes No
Create custom speech models If you're using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. No Yes
Pronunciation assessment Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Yes Yes
Text-to-speech Prebuilt neural voices Text-to-speech converts input text into humanlike synthesized speech by using the Speech Synthesis Markup Language (SSML). Use neural voices, which are humanlike voices powered by deep neural networks. See Language support. Yes Yes
Custom neural voices Create custom neural voice fonts unique to your brand or product. No Yes
Speech translation Speech translation Speech translation enables real-time, multilanguage translation of speech to your applications, tools, and devices. Use this feature for speech-to-speech and speech-to-text translation. Yes No
Voice assistants Voice assistants Voice assistants using the Speech service empower developers to create natural, humanlike conversational interfaces for their applications and experiences. The voice assistant feature provides fast, reliable interaction between a device and an assistant implementation that uses the Bot Framework's Direct Line Speech channel or the integrated custom commands service for task completion. Yes No
Speaker recognition Speaker verification and identification Speaker recognition provides algorithms that verify and identify speakers by their unique voice characteristics. Speaker recognition is used to answer the question, "Who is speaking?". Yes Yes

Try the Speech service for free

For the following steps, you need a Microsoft account and an Azure account. If you don't have a Microsoft account, you can sign up for one free of charge at the Microsoft account portal. Select Sign in with Microsoft. When you're asked to sign in, select Create a Microsoft account. Follow the steps to create and verify your new Microsoft account.

After you have a Microsoft account, go to the Azure sign-up page and select Start free. Create a new Azure account by using a Microsoft account. Here's a video of how to sign up for an Azure free account.

Note

When you sign up for a free Azure account, it comes with $200 in service credit that you can apply toward a paid Speech service subscription, valid for up to 30 days. Your Azure services are disabled when your credit runs out or expires at the end of the 30 days. To continue using Azure services, you must upgrade your account. For more information, see Upgrade your Azure free account.

The Speech service has two service tiers, free (f0) and subscription (s0), which have different limitations and benefits. If you use the free, low-volume Speech service tier, you can keep this free subscription even after your free trial or service credit expires. For more information, see Cognitive Services pricing - Speech service.

Create the Azure resource

To add a Speech service resource to your Azure account by using the free or paid tier:

  1. Sign in to the Azure portal by using your Microsoft account.

  2. Select Create a resource at the top left of the portal. If you don't see Create a resource, you can always find it by selecting the collapsed menu in the upper-left corner of the screen.

  3. In the New window, enter speech in the search box and select Enter.

  4. In the search results, select Speech.

    Screenshot that shows creating a Speech resource in the Azure portal.

  5. Select Create and then:

    1. Give a unique name for your new resource. The name helps you distinguish among multiple subscriptions tied to the same service.
    2. Choose the Azure subscription that the new resource is associated with to determine how the fees are billed. Here's the introduction for how to create an Azure subscription in the Azure portal.
    3. Choose the region where the resource will be used. Azure is a global cloud platform that's generally available in many regions worldwide. To get the best performance, select a region that's closest to you or where your application runs. The Speech service availabilities vary among different regions. Make sure that you create your resource in a supported region. For more information, see region support for Speech services.
    4. Choose either a free (F0) or paid (S0) pricing tier. For complete information about pricing and usage quotas for each tier, select View full pricing details or see Speech services pricing. For limits on resources, see Azure Cognitive Services limits.
    5. Create a new resource group for this Speech subscription or assign the subscription to an existing resource group. Resource groups help you keep your various Azure subscriptions organized.
    6. Select Create. This action takes you to the deployment overview and displays deployment progress messages.

It takes a few moments to deploy your new Speech resource.

Find keys and location/region

To find the keys and location/region of a completed deployment:

  1. Sign in to the Azure portal by using your Microsoft account.

  2. Select All resources, and select the name of your Cognitive Services resource.

  3. On the left pane, under RESOURCE MANAGEMENT, select Keys and Endpoint.

    1. Each subscription has two keys. You can use either key in your application. To copy and paste a key to your code editor or other location, select the copy button next to each key and switch windows to paste the clipboard contents to the desired location.

    2. Copy the LOCATION value, which is your region ID, for example, westus or westeurope, for SDK calls.

Important

These subscription keys are used to access your Cognitive Services API. Don't share your keys. Store them securely. For example, use Azure Key Vault. We also recommend that you regenerate these keys regularly. Only one key is necessary to make an API call. When you regenerate the first key, you can use the second key for continued access to the service.

Complete a quickstart

We offer quickstarts in most popular programming languages. Each quickstart is designed to teach you basic design patterns and have you running code in less than 10 minutes. See the following list for the quickstart for each feature:

After you've had a chance to get started with the Speech service, try our tutorials that show you how to solve various scenarios:

Get sample code

Sample code is available on GitHub for the Speech service. These samples cover common scenarios like reading audio from a file or stream, continuous and at-start recognition, and working with custom models. Use these links to view SDK and REST samples:

Customize your speech experience

The Speech service works well with built-in models. But you might want to further customize and tune the experience for your product or environment. Customization options range from acoustic model tuning to unique voice fonts for your brand.

Other products offer speech models tuned for specific purposes, like healthcare or insurance, but are available to everyone equally. Customization in Azure Speech becomes part of your unique competitive advantage that's unavailable to any other user or customer. In other words, your models are private and custom-tuned for your use case only.

Speech service Platform Description
Speech-to-text Custom Speech Customize speech recognition models to your needs and available data. Overcome speech recognition barriers such as speaking style, vocabulary, and background noise.
Text-to-speech Custom Voice Build a recognizable, one-of-a-kind neural voice for your text-to-speech apps with your speaking data available. You can further fine-tune the neural voice outputs by adjusting a set of neural voice parameters.

Deploy on-premises by using Docker containers

Use Speech service containers to deploy API features on-premises. By using these Docker containers, you can bring the service closer to your data for compliance, security, or other operational reasons. The Speech service offers the following containers:

  • Standard Speech-to-Text
  • Custom Speech-to-Text
  • Prebuilt Neural Text-to-Speech
  • Custom Neural Text-to-Speech (preview)
  • Speech Language Identification (preview)

Reference docs

Next steps