What is the Speech service?
The Speech service is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. It's easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.
The Speech service has replaced Bing Speech API and Translator Speech. See How-to guides > Migration for migration instructions.
These features make up the Speech service. Use the links in this table to learn more about common use cases for each feature or browse the API reference.
|Speech-to-Text||Real-time Speech-to-text||Speech-to-text transcribes or translates audio streams or local files to text in real time that your applications, tools, or devices can consume or display. Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands.||Yes||Yes|
|Batch Speech-to-Text||Batch Speech-to-text enables asynchronous speech-to-text transcription of large volumes of speech audio data stored in Azure Blob Storage. In addition to converting speech audio to text, Batch Speech-to-text also allows for diarization and sentiment-analysis.||No||Yes|
|Multi-device Conversation||Connect multiple devices or clients in a conversation to send speech- or text-based messages, with easy support for transcription and translation||Yes||No|
|Conversation Transcription||Enables real-time speech recognition, speaker identification, and diarization. It's perfect for transcribing in-person meetings with the ability to distinguish speakers.||Yes||No|
|Create Custom Speech Models||If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary.||No||Yes|
|Text-to-Speech||Text-to-speech||Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). Choose from standard voices and neural voices (see Language support).||Yes||Yes|
|Create Custom Voices||Create custom voice fonts unique to your brand or product.||No||Yes|
|Speech Translation||Speech translation||Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. Use this service for speech-to-speech and speech-to-text translation.||Yes||No|
|Voice assistants||Voice assistants||Voice assistants using the Speech service empower developers to create natural, human-like conversational interfaces for their applications and experiences. The voice assistant service provides fast, reliable interaction between a device and an assistant implementation that uses the Bot Framework's Direct Line Speech channel or the integrated Custom Commands (Preview) service for task completion.||Yes||No|
|Speaker Recognition||Speaker verification & identification||The Speaker Recognition service provides algorithms that verify and identify speakers by their unique voice characteristics. Speaker Recognition is used to answer the question “who is speaking?”.||Yes||Yes|
TLS 1.2 is now enforced for all HTTP requests to this service. For more information, see Azure Cognitive Services security.
Try the Speech service
We offer quickstarts in most popular programming languages, each designed to have you running code in less than 10 minutes. This table contains the most popular quickstarts for each feature. Use the left-hand navigation to explore additional languages and platforms.
Speech-to-text and text-to-speech also have REST endpoints and associated quickstarts.
After you've had a chance to use the Speech service, try our tutorials that teach you how to solve various scenarios.
- Tutorial: Recognize intents from speech with the Speech SDK and LUIS, C#
- Tutorial: Voice enable your bot with the Speech SDK, C#
- Tutorial: Build a Flask app to translate text, analyze sentiment, and synthesize translated text to speech, REST
Get sample code
Sample code is available on GitHub for the Speech service. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. Use these links to view SDK and REST samples:
- Speech-to-text, text-to-speech, and speech translation samples (SDK)
- Batch transcription samples (REST)
- Text-to-speech samples (REST)
- Voice assistant samples (SDK)
Customize your speech experience
The Speech service works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. Customization options range from acoustic model tuning to unique voice fonts for your brand.
|Speech-to-Text||Custom Speech||Customize speech recognition models to your needs and available data. Overcome speech recognition barriers such as speaking style, vocabulary and background noise.|
|Text-to-Speech||Custom Voice||Build a recognizable, one-of-a-kind voice for your Text-to-Speech apps with your speaking data available. You can further fine-tune the voice outputs by adjusting a set of voice parameters.|
- Speech SDK
- Speech Devices SDK
- REST API: Speech-to-text
- REST API: Text-to-speech
- REST API: Batch transcription and customization