What is speech-to-text?
Speech-to-text from the Speech service, also known as speech recognition, enables real-time transcription of audio streams into text. Your applications, tools, or devices can consume, display, and take action on this text as command input. This service is powered by the same recognition technology that Microsoft uses for Cortana and Office products. It seamlessly works with the translation and text-to-speech service offerings. For a full list of available speech-to-text languages, see supported languages.
The speech-to-text service defaults to using the Universal language model. This model was trained using Microsoft-owned data and is deployed in the cloud. It's optimal for conversational and dictation scenarios. When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models. Customization is helpful for addressing ambient noise or industry-specific vocabulary.
Bing Speech was decommissioned on October 15, 2019. If your applications, tools, or products are using the Bing Speech APIs or Custom Speech, we've created guides to help you migrate to the Speech service.
Get started with speech-to-text
The speech-to-text service is available via the Speech SDK. There are several common scenarios available as quickstarts, in various languages and platforms:
- Quickstart: Recognize speech with microphone input
- Quickstart: Recognize speech from a file
- Quickstart: Recognize speech stored in blob storage
If you prefer to use the speech-to-text REST service, see REST APIs.
Tutorials and sample code
After you've had a chance to use the Speech service, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.
Sample code for the Speech SDK is available on GitHub. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models.
In addition to the standard Speech service model, you can create custom models. Customization helps to overcome speech recognition barriers such as speaking style, vocabulary and background noise, see Custom Speech. Customization options vary by language/locale, see supported languages to verify support.
The Speech service provides two SDKs. The first SDK is the primary Speech SDK and provides most of the functionalities needed to interact with the Speech service. The second SDK is specific to devices, appropriately named the Speech Devices SDK. Both SDKs are available in many languages.
Speech SDK reference docs
Use the following list to find the appropriate Speech SDK reference docs:
The Speech service SDK is actively maintained and updated. To track changes, updates and feature additions refer to the Speech SDK release notes.
Speech Devices SDK reference docs
REST API references
For references of various Speech service REST APIs, refer to the listing below: