What is the Speech service?
The Speech service is the unification of speech-to-text, text-to-speech, and speech translation into a single Azure subscription. It's easy to speech enable your applications, tools, and devices with the Speech CLI, Speech SDK, Speech Studio, or REST APIs.
Important
The Speech service has replaced the Bing Speech API and Translator Speech. For migration instructions, see the Migration section.
The following features are part of the Speech service. Use the links in this table to learn more about common use-cases for each feature. You can also browse the API reference.
| Service | Feature | Description | SDK | REST |
|---|---|---|---|---|
| Speech-to-text | Real-time speech-to-text | Speech-to-text transcribes or translates audio streams or local files to text in real time that your applications, tools, or devices can consume or display. Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. | Yes | Yes |
| Batch speech-to-text | Batch speech-to-text enables asynchronous speech-to-text transcription of large volumes of speech audio data stored in Azure Blob Storage. In addition to converting speech audio to text, batch speech-to-text allows for diarization and sentiment analysis. | No | Yes | |
| Multidevice conversation | Connect multiple devices or clients in a conversation to send speech- or text-based messages, with easy support for transcription and translation. | Yes | No | |
| Conversation transcription | Enables real-time speech recognition, speaker identification, and diarization. It's perfect for transcribing in-person meetings with the ability to distinguish speakers. | Yes | No | |
| Create custom speech models | If you're using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. | No | Yes | |
| Pronunciation assessment | Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. | Yes | Yes | |
| Text-to-speech | Prebuilt neural voices | Text-to-speech converts input text into humanlike synthesized speech by using the Speech Synthesis Markup Language (SSML). Use neural voices, which are humanlike voices powered by deep neural networks. See Language support. | Yes | Yes |
| Custom neural voices | Create custom neural voice fonts unique to your brand or product. | No | Yes | |
| Speech translation | Speech translation | Speech translation enables real-time, multilanguage translation of speech to your applications, tools, and devices. Use this feature for speech-to-speech and speech-to-text translation. | Yes | No |
| Voice assistants | Voice assistants | Voice assistants using the Speech service empower developers to create natural, humanlike conversational interfaces for their applications and experiences. The voice assistant feature provides fast, reliable interaction between a device and an assistant implementation that uses the Bot Framework's Direct Line Speech channel or the integrated custom commands service for task completion. | Yes | No |
| Speaker recognition | Speaker verification and identification | Speaker recognition provides algorithms that verify and identify speakers by their unique voice characteristics. Speaker recognition is used to answer the question, "Who is speaking?". | Yes | Yes |
Try the Speech service for free
For the following steps, you need a Microsoft account and an Azure account. If you don't have a Microsoft account, you can sign up for one free of charge at the Microsoft account portal. Select Sign in with Microsoft. When you're asked to sign in, select Create a Microsoft account. Follow the steps to create and verify your new Microsoft account.
After you have a Microsoft account, go to the Azure sign-up page and select Start free. Create a new Azure account by using a Microsoft account. Here's a video of how to sign up for an Azure free account.
Note
When you sign up for a free Azure account, it comes with $200 in service credit that you can apply toward a paid Speech service subscription, valid for up to 30 days. Your Azure services are disabled when your credit runs out or expires at the end of the 30 days. To continue using Azure services, you must upgrade your account. For more information, see Upgrade your Azure free account.
The Speech service has two service tiers, free (f0) and subscription (s0), which have different limitations and benefits. If you use the free, low-volume Speech service tier, you can keep this free subscription even after your free trial or service credit expires. For more information, see Cognitive Services pricing - Speech service.
Create the Azure resource
To add a Speech service resource to your Azure account by using the free or paid tier:
Sign in to the Azure portal by using your Microsoft account.
Select Create a resource at the top left of the portal. If you don't see Create a resource, you can always find it by selecting the collapsed menu in the upper-left corner of the screen.
In the New window, enter speech in the search box and select Enter.
In the search results, select Speech.
Select Create and then:
- Give a unique name for your new resource. The name helps you distinguish among multiple subscriptions tied to the same service.
- Choose the Azure subscription that the new resource is associated with to determine how the fees are billed. Here's the introduction for how to create an Azure subscription in the Azure portal.
- Choose the region where the resource will be used. Azure is a global cloud platform that's generally available in many regions worldwide. To get the best performance, select a region that's closest to you or where your application runs. The Speech service availabilities vary among different regions. Make sure that you create your resource in a supported region. For more information, see region support for Speech services.
- Choose either a free (F0) or paid (S0) pricing tier. For complete information about pricing and usage quotas for each tier, select View full pricing details or see Speech services pricing. For limits on resources, see Azure Cognitive Services limits.
- Create a new resource group for this Speech subscription or assign the subscription to an existing resource group. Resource groups help you keep your various Azure subscriptions organized.
- Select Create. This action takes you to the deployment overview and displays deployment progress messages.
It takes a few moments to deploy your new Speech resource.
Find keys and location/region
To find the keys and location/region of a completed deployment:
Sign in to the Azure portal by using your Microsoft account.
Select All resources, and select the name of your Cognitive Services resource.
On the left pane, under RESOURCE MANAGEMENT, select Keys and Endpoint.
Each subscription has two keys. You can use either key in your application. To copy and paste a key to your code editor or other location, select the copy button next to each key and switch windows to paste the clipboard contents to the desired location.
Copy the
LOCATIONvalue, which is your region ID, for example,westusorwesteurope, for SDK calls.
Important
These subscription keys are used to access your Cognitive Services API. Don't share your keys. Store them securely. For example, use Azure Key Vault. We also recommend that you regenerate these keys regularly. Only one key is necessary to make an API call. When you regenerate the first key, you can use the second key for continued access to the service.
Complete a quickstart
We offer quickstarts in most popular programming languages. Each quickstart is designed to teach you basic design patterns and have you running code in less than 10 minutes. See the following list for the quickstart for each feature:
- Speech-to-text quickstart
- Text-to-speech quickstart
- Speech translation quickstart
- Intent recognition quickstart
- Speaker recognition quickstart
After you've had a chance to get started with the Speech service, try our tutorials that show you how to solve various scenarios:
- Tutorial: Recognize intents from speech with the Speech SDK and LUIS, C#
- Tutorial: Voice enable your bot with the Speech SDK, C#
- Tutorial: Build a Flask app to translate text, analyze sentiment, and synthesize translated text to speech, REST
Get sample code
Sample code is available on GitHub for the Speech service. These samples cover common scenarios like reading audio from a file or stream, continuous and at-start recognition, and working with custom models. Use these links to view SDK and REST samples:
- Speech-to-text, text-to-speech, and speech translation samples (SDK)
- Batch transcription samples (REST)
- Text-to-speech samples (REST)
- Voice assistant samples (SDK)
Customize your speech experience
The Speech service works well with built-in models. But you might want to further customize and tune the experience for your product or environment. Customization options range from acoustic model tuning to unique voice fonts for your brand.
Other products offer speech models tuned for specific purposes, like healthcare or insurance, but are available to everyone equally. Customization in Azure Speech becomes part of your unique competitive advantage that's unavailable to any other user or customer. In other words, your models are private and custom-tuned for your use case only.
| Speech service | Platform | Description |
|---|---|---|
| Speech-to-text | Custom Speech | Customize speech recognition models to your needs and available data. Overcome speech recognition barriers such as speaking style, vocabulary, and background noise. |
| Text-to-speech | Custom Voice | Build a recognizable, one-of-a-kind neural voice for your text-to-speech apps with your speaking data available. You can further fine-tune the neural voice outputs by adjusting a set of neural voice parameters. |
Deploy on-premises by using Docker containers
Use Speech service containers to deploy API features on-premises. By using these Docker containers, you can bring the service closer to your data for compliance, security, or other operational reasons. The Speech service offers the following containers:
- Standard Speech-to-Text
- Custom Speech-to-Text
- Prebuilt Neural Text-to-Speech
- Custom Neural Text-to-Speech (preview)
- Speech Language Identification (preview)
Reference docs
- Speech SDK
- REST API: Speech-to-text
- REST API: Text-to-speech
- REST API: Batch transcription and customization