What is the Speech service?
The Speech service is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. It's easy to speech enable your applications, tools, and devices with the Speech CLI, Speech SDK, Speech Devices SDK, Speech Studio, or REST APIs.
The Speech service has replaced Bing Speech API and Translator Speech. See the Migration section for migration instructions.
The following features are part of the Speech service. Use the links in this table to learn more about common use-cases for each feature, or browse the API reference.
|Speech-to-Text||Real-time Speech-to-text||Speech-to-text transcribes or translates audio streams or local files to text in real time that your applications, tools, or devices can consume or display. Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands.||Yes||Yes|
|Batch Speech-to-Text||Batch Speech-to-text enables asynchronous speech-to-text transcription of large volumes of speech audio data stored in Azure Blob Storage. In addition to converting speech audio to text, Batch Speech-to-text also allows for diarization and sentiment-analysis.||No||Yes|
|Multi-device Conversation||Connect multiple devices or clients in a conversation to send speech- or text-based messages, with easy support for transcription and translation||Yes||No|
|Conversation Transcription||Enables real-time speech recognition, speaker identification, and diarization. It's perfect for transcribing in-person meetings with the ability to distinguish speakers.||Yes||No|
|Create Custom Speech Models||If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary.||No||Yes|
|Pronunciation Assessment||Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence.||Yes||Yes|
|Text-to-Speech||Text-to-speech||Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). Use neural voices, which are human-like voices powered by deep neural networks. See Language support.||Yes||Yes|
|Create Custom Voices||Create custom voice fonts unique to your brand or product.||No||Yes|
|Speech Translation||Speech translation||Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. Use this service for speech-to-speech and speech-to-text translation.||Yes||No|
|Voice assistants||Voice assistants||Voice assistants using the Speech service empower developers to create natural, human-like conversational interfaces for their applications and experiences. The voice assistant service provides fast, reliable interaction between a device and an assistant implementation that uses the Bot Framework's Direct Line Speech channel or the integrated Custom Commands service for task completion.||Yes||No|
|Speaker Recognition||Speaker verification & identification||The Speaker Recognition service provides algorithms that verify and identify speakers by their unique voice characteristics. Speaker Recognition is used to answer the question “who is speaking?”.||Yes||Yes|
Try the Speech service for free
For the following steps, you need both a Microsoft account and an Azure account. If you do not have a Microsoft account, you can sign up for one free of charge at the Microsoft account portal. Select Sign in with Microsoft and then, when asked to sign in, select Create a Microsoft account. Follow the steps to create and verify your new Microsoft account.
When you sign up for a free Azure account, it comes with $200 in service credit that you can apply toward a paid Speech service subscription, valid for up to 30 days. Your Azure services are disabled when your credit runs out or expires at the end of the 30 days. To continue using Azure services, you must upgrade your account. For more information, see How to upgrade your Azure free account.
The Speech service has two service tiers: free(f0) and subscription(s0), which have different limitations and benefits. If you use the free, low-volume Speech service tier you can keep this free subscription even after your free trial or service credit expires. For more information, see Cognitive Services pricing - Speech service.
Create the Azure resource
To add a Speech service resource (free or paid tier) to your Azure account:
Sign in to the Azure portal using your Microsoft account.
Select Create a resource at the top left of the portal. If you do not see Create a resource, you can always find it by selecting the collapsed menu in the upper left corner of the screen.
In the New window, type "speech" in the search box and press ENTER.
In the search results, select Speech.
Select Create, then:
- Give a unique name for your new resource. The name helps you distinguish among multiple subscriptions tied to the same service.
- Choose the Azure subscription that the new resource is associated with to determine how the fees are billed. Here is the introduction for how to create an Azure subscription in the Azure portal.
- Choose the region where the resource will be used. Azure is a global cloud platform that is generally available in many regions worldwide. To get the best performance, select a region that’s closest to you or where your application runs. The Speech service availabilities vary from different regions. Make sure that you create your resource in a supported region. See region support for Speech services.
- Choose either a free (F0) or paid (S0) pricing tier. For complete information about pricing and usage quotas for each tier, select View full pricing details or see speech services pricing. For limits on resources, see Azure Cognitive Services Limits.
- Create a new resource group for this Speech subscription or assign the subscription to an existing resource group. Resource groups help you keep your various Azure subscriptions organized.
- Select Create. This will take you to the deployment overview and display deployment progress messages.
It takes a few moments to deploy your new Speech resource.
Find keys and location/region
To find the keys and location/region of a completed deployment, follow these steps:
Sign in to the Azure portal using your Microsoft account.
Select All resources, and select the name of your Cognitive Services resource.
On the left pane, under RESOURCE MANAGEMENT, select Keys and Endpoint.
Each subscription has two keys; you can use either key in your application. To copy/paste a key to your code editor or other location, select the copy button next to each key, switch windows to paste the clipboard contents to the desired location.
Additionally, copy the
LOCATION value, which is your region ID (ex.
westeurope) for SDK calls.
These subscription keys are used to access your Cognitive Service API. Do not share your keys. Store them securely– for example, using Azure Key Vault. We also recommend regenerating these keys regularly. Only one key is necessary to make an API call. When regenerating the first key, you can use the second key for continued access to the service.
Complete a quickstart
We offer quickstarts in most popular programming languages, each designed to teach you basic design patterns, and have you running code in less than 10 minutes. See the following list for the quickstart for each feature.
- Speech-to-text quickstart
- Text-to-speech quickstart
- Speech translation quickstart
- Intent recognition quickstart
- Speaker recognition quickstart
After you've had a chance to get started with the Speech service, try our tutorials that show you how to solve various scenarios.
- Tutorial: Recognize intents from speech with the Speech SDK and LUIS, C#
- Tutorial: Voice enable your bot with the Speech SDK, C#
- Tutorial: Build a Flask app to translate text, analyze sentiment, and synthesize translated text to speech, REST
Get sample code
Sample code is available on GitHub for the Speech service. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. Use these links to view SDK and REST samples:
- Speech-to-text, text-to-speech, and speech translation samples (SDK)
- Batch transcription samples (REST)
- Text-to-speech samples (REST)
- Voice assistant samples (SDK)
Customize your speech experience
The Speech service works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. Customization options range from acoustic model tuning to unique voice fonts for your brand.
Other products offer speech models tuned for specific purposes like healthcare or insurance, but are available to everyone equally. Customization in Azure Speech becomes part of your unique competitive advantage that is unavailable to any other user or customer. In other words, your models are private and custom-tuned for your use-case only.
|Speech-to-Text||Custom Speech||Customize speech recognition models to your needs and available data. Overcome speech recognition barriers such as speaking style, vocabulary and background noise.|
|Text-to-Speech||Custom Voice||Build a recognizable, one-of-a-kind voice for your Text-to-Speech apps with your speaking data available. You can further fine-tune the voice outputs by adjusting a set of voice parameters.|
Deploy on premises using Docker containers
Use Speech service containers to deploy API features on-premises. These Docker containers enable you to bring the service closer to your data for compliance, security or other operational reasons. The Speech service offers the following containers:
- Standard Speech-to-text
- Custom Speech-to-text
- Standard Text-to-speech
- Neural Text-to-speech
- Custom Text-to-speech (preview)
- Speech Language Identification (preview)
- Speech SDK
- Speech Devices SDK
- REST API: Speech-to-text
- REST API: Text-to-speech
- REST API: Batch transcription and customization