Quickstart: Translate speech-to-text

In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Update the SpeechConfig object to specify the source and target languages.
  • Create a TranslationRecognizer object using the SpeechConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK C# Samples on GitHub. Otherwise, let's get started.

Choose your target environment

Prerequisites

Before you get started, make sure to:

Add sample code

  1. Open Program.cs, and replace all the code in it with the following.

    using System;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Translation;
    
    namespace helloworld
    {
        class Program
        {
            public static async Task TranslateSpeechToText()
            {
                // Creates an instance of a speech translation config with specified subscription key and service region.
                // Replace with your own subscription key and service region (e.g., "westus").
                var config = SpeechTranslationConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
                // Sets source and target languages.
                // Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
                string fromLanguage = "en-US";
                string toLanguage = "de";
                config.SpeechRecognitionLanguage = fromLanguage;
                config.AddTargetLanguage(toLanguage);
    
                // Creates a translation recognizer using the default microphone audio input device.
                using (var recognizer = new TranslationRecognizer(config))
                {
                    // Starts translation, and returns after a single utterance is recognized. The end of a
                    // single utterance is determined by listening for silence at the end or until a maximum of 15
                    // seconds of audio is processed. The task returns the recognized text as well as the translation.
                    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                    // shot recognition like command or query.
                    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                    Console.WriteLine("Say something...");
                    var result = await recognizer.RecognizeOnceAsync();
    
                    // Checks result.
                    if (result.Reason == ResultReason.TranslatedSpeech)
                    {
                        Console.WriteLine($"RECOGNIZED '{fromLanguage}': {result.Text}");
                        Console.WriteLine($"TRANSLATED into '{toLanguage}': {result.Translations[toLanguage]}");
                    }
                    else if (result.Reason == ResultReason.RecognizedSpeech)
                    {
                        Console.WriteLine($"RECOGNIZED '{fromLanguage}': {result.Text} (text could not be translated)");
                    }
                    else if (result.Reason == ResultReason.NoMatch)
                    {
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                    }
                    else if (result.Reason == ResultReason.Canceled)
                    {
                        var cancellation = CancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                    }
                }
            }
    
            static void Main(string[] args)
            {
                TranslateSpeechToText().Wait();
            }
        }
    }
    
  2. In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  4. From the menu bar, choose File > Save All.

Build and run the application

  1. From the menu bar, select Build > Build Solution to build the application. The code should compile without errors now.

  2. Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. Speak an English phrase or sentence. The application transmits your speech to the Speech Services, which translates and transcribes to text (in this case, to German). The Speech Services then sends the text back to the application for display.

Say something...
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

Next steps

In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Update the SpeechConfig object to specify the source and target languages.
  • Create a TranslationRecognizer object using the SpeechConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK C++ Samples on GitHub. Otherwise, let's get started.

Prerequisites

Before you get started, make sure to:

Add sample code

  1. Open the source file helloworld.cpp.

  2. Replace all the code with the following snippet:

    #include <iostream>
    #include <vector>
    #include <speechapi_cxx.h>
    
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Translation;
    
    void TranslateSpeechToText()
    {
        // Creates an instance of a speech translation config with specified subscription key and service region.
        // Replace with your own subscription key and service region (e.g., "westus").
        auto config = SpeechTranslationConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
        // Sets source and target languages.
        // Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
        auto fromLanguage = "en-US";
        auto toLanguage = "de";
        config->SetSpeechRecognitionLanguage(fromLanguage);
        config->AddTargetLanguage(toLanguage);
    
        // Creates a translation recognizer using the default microphone audio input device.
        auto recognizer = TranslationRecognizer::FromConfig(config);
    
        // Starts translation, and returns after a single utterance is recognized. The end of a
        // single utterance is determined by listening for silence at the end or until a maximum of 15
        // seconds of audio is processed. The task returns the recognized text as well as the translation.
        // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
        // shot recognition like command or query.
        // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
        cout << "Say something...\n";
        auto result = recognizer->RecognizeOnceAsync().get();
    
        // Checks result.
        if (result->Reason == ResultReason::TranslatedSpeech)
        {
            cout << "RECOGNIZED '" << fromLanguage << "': " << result->Text << std::endl;
            cout << "TRANSLATED into '" << toLanguage << "': " << result->Translations.at(toLanguage) << std::endl;
        }
        else if (result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED '" << fromLanguage << "' " << result->Text << " (text could not be translated)" << std::endl;
        }
        else if (result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
        else if (result->Reason == ResultReason::Canceled)
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
    }
    
    int wmain()
    {
        TranslateSpeechToText();
        return 0;
    }
    
  3. In the same file, replace the string YourSubscriptionKey with your subscription key.

  4. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  5. From the menu bar, choose File > Save All.

Build and run the application

  1. From the menu bar, select Build > Build Solution to build the application. The code should compile without errors now.

  2. Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. Speak an English phrase or sentence. The application transmits your speech to the Speech Services, which translates and transcribes to text (in this case, to German). The Speech Services then sends the text back to the application for display.

Say something...
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

Next steps


In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Update the SpeechConfig object to specify the source and target languages.
  • Create a TranslationRecognizer object using the SpeechConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK Java Samples on GitHub. Otherwise, let's get started.

Prerequisites

Before you get started, make sure to:

Add sample code

  1. To add a new empty class to your Java project, select File > New > Class.

  2. In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    Screenshot of New Java Class window

  3. Replace all code in Main.java with the following snippet:

    package quickstart;
    
    import java.io.IOException;
    import java.util.concurrent.Future;
    import java.util.concurrent.ExecutionException;
    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.translation.*;
    
    public class Main {
    
        public static void translationWithMicrophoneAsync() throws InterruptedException, ExecutionException, IOException
        {
            // Creates an instance of a speech translation config with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
    
            int exitCode = 1;
            SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription(("YourSubscriptionKey",  "YourServiceRegion");
            assert(config != null);
    
            // Sets source and target languages.
            String fromLanguage = "en-US";
            String toLanguage = "de";
            config.setSpeechRecognitionLanguage(fromLanguage);
            config.addTargetLanguage(toLanguage);
    
            // Creates a translation recognizer using the default microphone audio input device.
            TranslationRecognizer recognizer = new TranslationRecognizer(config);
            assert(recognizer != null);
    
            System.out.println("Say something...");
    
            // Starts translation, and returns after a single utterance is recognized. The end of a
            // single utterance is determined by listening for silence at the end or until a maximum of 15
            // seconds of audio is processed. The task returns the recognized text as well as the translation.
            // Note: Since recognizeOnceAsync() returns only a single utterance, it is suitable only for single
            // shot recognition like command or query.
            // For long-running multi-utterance recognition, use startContinuousRecognitionAsync() instead.
            Future<TranslationRecognitionResult> task = recognizer.recognizeOnceAsync();
            assert(task != null);
    
            TranslationRecognitionResult result = task.get();
            assert(result != null);
    
            if (result.getReason() == ResultReason.TranslatedSpeech) {
                System.out.println("RECOGNIZED '" + fromLanguage + "': " + result.getText());
                System.out.println("TRANSLATED into '" + toLanguage + "': " + result.getTranslations().get(toLanguage));
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("RECOGNIZED '" + fromLanguage + "': " + result.getText() + "(text could not be translated)");
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (result.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(result);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                if (cancellation.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }
            }
    
            recognizer.close();
    
            System.exit(exitCode);
        }
    
        public static void main(String[] args) {
            try {
                translationWithMicrophoneAsync();
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. Replace the string YourSubscriptionKey with your subscription key.

  5. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  6. Save changes to the project.

Build and run the app

Press F11, or select Run > Debug.

  1. Speak an English phrase or sentence. The application transmits your speech to the Speech Services, which translates and transcribes to text (in this case, to German). The Speech Services then sends the text back to the application for display.
Say something...
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

Next steps


In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Update the SpeechConfig object to specify the source and target languages.
  • Create a TranslationRecognizer object using the SpeechConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK Python Samples on GitHub. Otherwise, let's get started.

Prerequisites

Before you get started, make sure to:

Add sample code

  1. Open quickstart.py, and replace all the code in it with the following.

    import azure.cognitiveservices.speech as speechsdk
    
    speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
    
    def translate_speech_to_text():
    
        # Creates an instance of a speech translation config with specified subscription key and service region.
        # Replace with your own subscription key and service region (e.g., "westus").
        translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)
    
        # Sets source and target languages.
        # Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
        fromLanguage = 'en-US'
        toLanguage = 'de'
        translation_config.speech_recognition_language = fromLanguage
        translation_config.add_target_language(toLanguage)
    
        # Creates a translation recognizer using and audio file as input.
        recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config)
    
        # Starts translation, and returns after a single utterance is recognized. The end of a
        # single utterance is determined by listening for silence at the end or until a maximum of 15
        # seconds of audio is processed. It returns the recognized text as well as the translation.
        # Note: Since recognize_once() returns only a single utterance, it is suitable only for single
        # shot recognition like command or query.
        # For long-running multi-utterance recognition, use start_continuous_recognition() instead.
        print("Say something...")
        result = recognizer.recognize_once()
    
        # Check the result
        if result.reason == speechsdk.ResultReason.TranslatedSpeech:
            print("RECOGNIZED '{}': {}".format(fromLanguage, result.text))
            print("TRANSLATED into {}: {}".format(toLanguage, result.translations['de']))
        elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("RECOGNIZED: {} (text could not be translated)".format(result.text))
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("NOMATCH: Speech could not be recognized: {}".format(result.no_match_details))
        elif result.reason == speechsdk.ResultReason.Canceled:
            print("CANCELED: Reason={}".format(result.cancellation_details.reason))
            if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("CANCELED: ErrorDetails={}".format(result.cancellation_details.error_details))
    
    translate_speech_to_text()
    
  2. In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  4. Save the changes you've made to quickstart.py.

Build and run your app

  1. Run the sample from the console or in your IDE:

    python quickstart.py
    
  2. Speak an English phrase or sentence. The application transmits your speech to the Speech Services, which translates and transcribes to text (in this case, to German). The Speech Services then sends the text back to the application for display.

    Say something...
    RECOGNIZED 'en-US': What's the weather in Seattle?
    TRANSLATED into 'de': Wie ist das Wetter in Seattle?
    

Next steps

View or download all Speech SDK Samples on GitHub.

Additional language and platform support

If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. Don't worry, we have additional quickstart materials and code samples available on GitHub. Use the table to find the right sample for your programming language and platform/OS combination.

Language Code samples
C++ Quickstarts, Samples
C# .NET Framework, .NET Core, UWP, Unity, Xamarin
Java Android, JRE
Javascript Browser
Node.js Windows, Linux, macOS
Objective-C iOS, macOS
Python Windows, Linux, macOS
Swift iOS, macOS