What is speech translation?

Speech translation from the Speech service enables real-time, multi-language speech-to-speech and speech-to-text translation of audio streams. With the Speech SDK, your applications, tools, and devices have access to source transcriptions and translation outputs for provided audio. Interim transcription and translation results are returned as speech is detected, and finals results can be converted into synthesized speech.

Microsoft's translation engine is powered by two different approaches: statistical machine translation (SMT) and neural machine translation (NMT). SMT uses advanced statistical analysis to estimate the best possible translations given the context of a few words. With NMT, neural networks are used to provide more accurate, natural-sounding translations by using the full context of sentences to translate words.

Today, Microsoft uses NMT for translation to most popular languages. All languages available for speech-to-speech translation are powered by NMT. Speech-to-text translation may use SMT or NMT depending on the language pair. When the target language is supported by NMT, the full translation is NMT-powered. When the target language isn't supported by NMT, the translation is a hybrid of NMT and SMT, using English as a "pivot" between the two languages.

Core features

Here are the features available via the Speech SDK and REST APIs:

Use case SDK REST
Speech-to-text translation with recognition results. Yes No
Speech-to-speech translation. Yes No
Interim recognition and translation results. Yes No

Get started with speech translation

We offer quickstarts designed to have you running code in less than 10 minutes. This table includes a list of speech translation quickstarts organized by language.

Quickstart Platform API reference
C#, .NET Core Windows Browse
C#, .NET Framework Windows Browse
C#, UWP Windows Browse
C++ Windows Browse
Java Windows, Linux, macOS Browse

Sample code

Sample code for the Speech SDK is available on GitHub. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition/translation, and working with custom models.

Migration guides

If your applications, tools, or products are using the Translator Speech API, we've created guides to help you migrate to the Speech service.

Reference docs

Next steps