What are the Speech Services?

The Speech Services are the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. It's easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.


Speech Services have replaced Bing Speech API, Translator Speech, and Custom Speech. See How-to guides > Migration for migration instructions.

These features make up the Azure Speech Services. Use the links in this table to learn more about common use cases for each feature or browse the API reference.

Service Feature Description SDK REST
Speech-to-Text Speech-to-text Speech-to-text transcribes audio streams to text in real time that your applications, tools, or devices can consume or display. Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. Yes Yes
Batch Transcription Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. This is a REST-based service, which uses same endpoint as customization and model management. No Yes
Conversation Transcription Enables real-time speech recognition, speaker identification, and diarization. It's perfect for transcribing in-person meetings with the ability to distinguish speakers. Yes No
Create Custom Speech Models If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. No Yes
Text-to-Speech Text-to-speech Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). Choose from standard voices and neural voices (see Language support). Yes Yes
Create Custom Voices Create custom voice fonts unique to your brand or product. No Yes
Speech Translation Speech translation Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. Use this service for speech-to-speech and speech-to-text translation. Yes No
Voice-first Virtual Assistants Voice-first virtual assistants Custom virtual assistants using Azure Speech Services empower developers to create natural, human-like conversational interfaces for their applications and experiences. The Bot Framework's Direct Line Speech channel enhances these capabilities by providing a coordinated, orchestrated entry point to a compatible bot that enables voice in, voice out interaction with low latency and high reliability. Yes No

News and updates

Learn what's new with the Azure Speech Services.

Try Speech Services

We offer quickstarts in most popular programming languages, each designed to have you running code in less than 10 minutes. This table contains the most popular quickstarts for each feature. Use the left-hand navigation to explore additional languages and platforms.

Speech-to-text (SDK) Text-to-Speech (SDK) Translation (SDK)
C#, .NET Core (Windows) C#, .NET Framework (Windows) Java (Windows, Linux)
JavaScript (Browser) C++ (Windows) C#, .NET Core (Windows)
Python (Windows, Linux, macOS) C++ (Linux) C#, .NET Framework (Windows)
Java (Windows, Linux) C++ (Windows)


Speech-to-text and text-to-speech also have REST endpoints and associated quickstarts.

After you've had a chance to use the Speech Services, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.

Get sample code

Sample code is available on GitHub for each of the Azure Speech Services. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. Use these links to view SDK and REST samples:

Customize your speech experience

Azure Speech Services works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. Customization options range from acoustic model tuning to unique voice fonts for your brand. After you've built a custom model, you can use it with any of the Azure Speech Services.

Speech Service Platform Description
Speech-to-Text Custom Speech Customize speech recognition models to your needs and available data. Overcome speech recognition barriers such as speaking style, vocabulary and background noise.
Text-to-Speech Custom Voice Build a recognizable, one-of-a-kind voice for your Text-to-Speech apps with your speaking data available. You can further fine-tune the voice outputs by adjusting a set of voice parameters.

Reference docs

Next steps