Events
May 19, 6 PM - May 23, 12 AM
Calling all developers, creators, and AI innovators to join us in Seattle @Microsoft Build May 19-22.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
In this article, you learn about the benefits and capabilities of translation with Azure AI Speech. The Speech service supports real-time, multi-language speech to speech and speech to text translation of audio streams.
By using the Speech SDK or Speech CLI, you can give your applications, tools, and devices access to source transcriptions and translation outputs for the provided audio. Interim transcription and translation results are returned as speech is detected, and the final results can be converted into synthesized speech.
For a list of languages supported for speech translation, see Language and voice support.
Tip
Go to the Speech Studio to quickly test and translate speech into other languages of your choice with low latency.
The core features of speech translation include:
The standard feature offered by the Speech service is the ability to take in an input audio stream in your specified source language, and have it translated and outputted as text in your specified target language.
As a supplement to the above feature, the Speech service also offers the option to read aloud the translated text using our large database of pretrained voices, allowing for a natural output of the input speech.
Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
Some use cases for multi-lingual speech translation include:
For a list of the supported input (source) languages, see the speech to text languages documentation. For a list of the supported output (target) languages, see the Translate to text language table in the speech translation languages documentation.
For more information on multi-lingual speech translation, see the speech translation how to guide and speech translation samples on GitHub.
In scenarios where you want output in multiple languages, the Speech service directly offers the ability for you to translate the input language into two target languages. This enables them to receive two outputs and share these translations to a wider audience with a single API call. If more output languages are required, you can create a multi-service resource or use separate translation services.
If you need translation into more than two target languages, you need to either Create an Azure AI services resource or utilize separate translation services for more languages beyond the second. If you choose to call the speech translation service with a multi-service resource, please note that translation fees apply for each language beyond the second, based on the character count of the translation.
To calculate the applied translation fee, please refer to Azure AI Translator pricing.
It's important to note that the speech translation service operates in real-time, and the intermediate speech results are translated to generate intermediate translation results. Therefore, the actual translation amount is greater than the input audio's tokens. You're charged for the speech to text transcription and the text translation for each target language.
For example, let's say that you want text translations from a one-hour audio file to three target languages. If the initial speech to text transcription contains 10,000 characters, you might be charged $2.80.
Warning
The prices in this example are for illustrative purposes only. Please refer to the Azure AI Speech pricing and Azure AI Translator pricing for the most up-to-date pricing information.
The previous example price of $2.80 was calculated by combining the speech to text transcription and the text translation costs. Here's how the calculation was done:
As your first step, try the speech translation quickstart. The speech translation service is available via the Speech SDK and the Speech CLI.
You find Speech SDK speech to text and translation samples on GitHub. These samples cover common scenarios, such as reading audio from a file or stream, continuous and single-shot recognition and translation, and working with custom models.
Events
May 19, 6 PM - May 23, 12 AM
Calling all developers, creators, and AI innovators to join us in Seattle @Microsoft Build May 19-22.
Register todayTraining
Module
Translate speech with the Azure AI Speech service - Training
Translate speech with the Azure AI Speech service
Certification
Microsoft Certified: Azure AI Fundamentals - Certifications
Demonstrate fundamental AI concepts related to the development of software and services of Microsoft Azure to create AI solutions.