What is the Speech service?
Like the other Azure speech services, the Speech service is powered by the speech technologies used in products like Cortana and Microsoft Office.
The Speech service unites the Azure speech features previously available via the Bing Speech API, Translator Speech, Custom Speech, and Custom Voice services. Now, one subscription provides access to all of these capabilities.
Main Speech service functions
The primary functions of the Speech service are Speech to Text (also called speech recognition or transcription), Text to Speech (speech synthesis), and Speech Translation.
|Speech to Text||
|Text to Speech||
* Intent recognition requires a LUIS subscription.
Customize speech features
You can use your own data to train the models that underlie the Speech service's Speech-to-Text and Text-to-Speech features.
|Speech to Text||Acoustic model||Helps transcribe particular speakers and environments, such as cars or factories.|
|Language model||Helps transcribe field-specific vocabulary and grammar, such as medical or IT jargon.|
|Pronunciation model||Helps transcribe abbreviations and acronyms, such as "IOU" for "I owe you."|
|Text to Speech||Voice font||Gives your app a voice of its own by training the model on samples of human speech.|
You can use your custom models anywhere you use the standard models in your app's Speech-to-Text or Text-to-Speech functionality.
Use the Speech service
To simplify the development of speech-enabled applications, Microsoft provides the Speech SDK for use with the Speech service. The Speech SDK provides consistent native Speech-to-Text and Speech Translation APIs for C#, C++, and Java. If you develop with one of these languages, the Speech SDK makes development easier by handling the network details for you.
The Speech service also has a REST API that works with any programming language that can make HTTP requests. The REST interface does not offer the streaming, real-time functionality of the SDK.
|Speech SDK||Yes||No||Yes||Native APIs for C#, C++, and Java to simplify development.|
|REST||Yes||Yes||No||A simple HTTP-based API that makes it easy to add speech to your applications.|
The Speech service also has WebSocket protocols for streaming Speech to Text and Speech Translation. The Speech SDKs use these protocols to communicate with the Speech service. Use the Speech SDK instead of trying to implement your own WebSocket communication with the Speech service.
If you already have code that uses Bing Speech or Translator Speech via WebSockets, you can update it to use the Speech service. The WebSocket protocols are compatible, only the endpoints are different.
Speech Devices SDK
The Speech Devices SDK is an integrated hardware and software platform for developers of speech-enabled devices. Our hardware partner provides reference designs and development units. Microsoft provides a device-optimized SDK that takes full advantage of the hardware's capabilities.
Why move to the Speech service?
The Speech service provides all the functionality and more of the Bing Speech API and three other Azure speech services: Custom Speech, Custom Voice, and Translator Speech. We encourage users of these services to migrate to the Speech service.
The Speech service incorporates many upgrades to these other services, including:
Higher speech recognition accuracy. We regularly improve the models used in the service.
More scalable. The service is more capable of handling multiple simultaneous requests, reducing latency.
The Speech Service uses a time-based pricing model. See Speech Service pricing for details.
A single Speech Service subscription key grants access to the following features. Each is metered separately, so you're charged only for the features you use.
The Speech Service speech-to-text function integrates with the Language Understanding Service (LUIS) to recognize speaker intent. A LUIS endpoint key can also be used with the Speech Service. See the intent recognition tutorial for details.
Speech-to-text no longer requires that you specify a recognition mode.
The Speech Service supports 24-KHz voices for text-to-speech, improving audio quality. At this writing, there are two such voices (US English only):
THe Speech Service's batch transcription allows high volumes of recorded speech, such as call center recordings, to be transcribed to text efficiently, so they can be easily analyzed and searched.
When using the Speech SDK, there is no time limit on streaming speech-to-text transcription.
The Speech SDK provides a consistent API to the Speech service across several programming languages and execution environments (including Windows 10, UWP, and .NET Core), making development easier, especially on multiple platforms.
The Speech Service is compatible with the REST APIs and WebSockets protocol used by other Azure speech services, making it easy to migrate existing client applications to the Speech service.
Use cases for the Speech service include:
- Create voice-triggered apps
- Transcribe call center recordings
- Implement voice bots
Voice user interface
Voice input is a great way to make your app flexible, hands-free, and quick to use. With a voice-enabled app, users can just ask for the information they want.
If your app is intended for use by the general public, you can use the default speech recognition models. They recognize a wide variety of speakers in common environments.
If your app is used in a specific domain, for example, medicine or IT, you can create a language model. You can use this model to teach the Speech service about the special terminology used by your app.
If your app is used in a noisy environment, such as a factory, you can create a custom acoustic model. This model helps the Speech service to distinguish speech from noise.
Call center transcription
Often, call center recordings are consulted only if an issue arises with a call. With the Speech service, it's easy to transcribe every recording to text. You can easily index the text for full-text search or apply Text Analytics to detect sentiment, language, and key phrases.
If your call center recordings involve specialized terminology, such as product names or IT jargon, you can create a language model to teach the Speech service the vocabulary. A custom acoustic model can help the Speech service understand less-than-optimal phone connections.
For more information about this scenario, read more about batch transcription with the Speech service.
Bots are a popular way to connect users with the information they want and customers with businesses they like. When you add a conversational user interface to your website or app, the functionality is easier to find and quicker to access. With the Speech service, this conversation takes on a new dimension of fluency by responding to spoken queries in kind.
To add a unique personality to your voice-enabled bot, you can give it a voice of its own. Creating a custom voice is a two-step process. First, make recordings of the voice you want to use. Then submit those recordings along with a text transcript to the Speech service's voice customization portal, which does the rest. After you create your custom voice, the steps to use it in your app are straightforward.
Get a subscription key for the Speech service.