What are the Speech Services?
Azure Speech Services are the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. It's easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.
Speech Services have replaced Bing Speech API, Translator Speech, and Custom Speech. See How-to guides > Migration for migration instructions.
These features make up the Azure Speech Services. Use the links in this table to learn more about common use cases for each feature or browse the API reference.
|Speech-to-Text||Speech-to-text||Speech-to-text transcribes audio streams to text in real time that your applications, tools, or devices can consume or display. Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands.||Yes||Yes|
|Batch Transcription||Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. This is a REST-based service, which uses same endpoint as customization and model management.||No||Yes|
|Customization||If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary.||No||Yes|
|Text-to-Speech||Text-to-speech||Text-to-speech converts input text into human-like synthesized speech. Choose from standard voices and neural voices (see Language support).||No||Yes|
|Customization||Create custom voice fonts unique to your brand or product.||No||Yes|
|Speech Translation||Speech translation||Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. Use this service for speech-to-speech and speech-to-text translation.||Yes||No|
News and updates
Learn what's new with the Azure Speech Services.
- February 2019 - Released Speech SDK 1.3.0 with support for Unity (beta). Added support for the
AudioInputclass, which enables you to choose the streaming source for audio. For a complete list of enhancements and known issues, see Release notes.
- December 2018 - Released Speech SDK 1.2.0 with support for Python and Node.js, as well as Ubuntu 18.04 LTS. For more information, see Release notes.
- December 2018 - Text-to-speech quickstarts added for .NET Core, Python, Node.js. Additional samples are available on GitHub.
Try Speech Services
We offer quickstarts in most popular programming languages, each designed to have you running code in less than 10 minutes. This table contains the most popular quickstarts for each feature. Use the left-hand navigation to explore additional languages and platforms.
|Speech-to-text (SDK)||Translation (SDK)||Text-to-Speech (REST)|
|C#, .NET Core (Windows)||Java (Windows, Linux)||Python (Windows, Linux, macOS)|
|Python (Windows, Linux, macOS)||C#, .NET Framework (Windows)||Node.js (Windows, Linux, macOS)|
|Java (Windows, Linux)||C++ (Windows)|
After you've had a chance to use the Speech Services, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.
Get sample code
Sample code is available on GitHub for each of the Azure Speech Services. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. Use these links to view SDK and REST samples:
- Speech-to-text and speech translation samples (SDK)
- Batch transcription samples (REST)
- Text-to-speech samples (REST)
Customize your speech experience
Azure Speech Services works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. Customization options range from acoustic model tuning to unique voice fonts for your brand. After you've built a custom model, you can use it with any of the Azure Speech Services.
|Speech-to-Text||Acoustic model||Create a custom acoustic model for applications, tools, or devices that are used in particular environments like in a car or on a factory floor, each with specific recording conditions. Examples include accented speech, specific background noises, or using a specific microphone for recording.|
|Language model||Create a custom language model to improve transcription of field-specific vocabulary and grammar, such as medical terminology, or IT jargon.|
|Pronunciation model||With a custom pronunciation model, you can define the phonetic form and display of a word or term. It's useful for handling customized terms, such as product names or acronyms. All you need to get started is a pronunciation file -- a simple .txt file.|
|Text-to-Speech||Voice font||Custom voice fonts allow you to create a recognizable, one-of-a-kind voice for your brand. It only takes a small amount of data to get started. The more data that you provide, the more natural and human-like your voice font will sound.|
- Speech SDK
- Speech Devices SDK
- REST API: Speech-to-text
- REST API: Text-to-speech
- REST API: Batch transcription and customization
We’d love to hear your thoughts. Choose the type you’d like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.