Quickstart: Translate speech-to-speech

In this quickstart you will use the Speech SDK to interactively translate speech from one language to speech in another language. After satisfying a few prerequisites, translating speech-to-speech only takes six steps:

  • Create a SpeechTranslationConfig object from your subscription key and region.
  • Update the SpeechTranslationConfig object to specify the source and target languages.
  • Update the SpeechTranslationConfig object to specify the speech output voice name.
  • Create a TranslationRecognizer object using the SpeechTranslationConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

You can view or download all Speech SDK C# Samples on GitHub.

Choose your target environment

Prerequisites

Before you get started, make sure to:

Add sample code

  1. Open Program.cs, and replace all the code in it with the following.

    using System;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Translation;
    
    namespace helloworld
    {
        class Program
        {
            public static async Task TranslateSpeechToSpeech()
            {
                // Creates an instance of a speech translation config with specified subscription key and service region.
                // Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
                var config = SpeechTranslationConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
                // Sets source and target languages.
                // Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
                string fromLanguage = "en-US";
                string toLanguage = "de";
                config.SpeechRecognitionLanguage = fromLanguage;
                config.AddTargetLanguage(toLanguage);
    
                // Sets the synthesis output voice name.
                // Replace with the languages of your choice, from list found here: https://aka.ms/speech/tts-languages
                config.VoiceName = "de-DE-Hedda";
    
                // Creates a translation recognizer using the default microphone audio input device.
                using (var recognizer = new TranslationRecognizer(config))
                {
                    // Prepare to handle the synthesized audio data.
                    recognizer.Synthesizing += (s, e) =>
                    {
                        var audio = e.Result.GetAudio();
                        Console.WriteLine(audio.Length != 0
                            ? $"AUDIO SYNTHESIZED: {audio.Length} byte(s)"
                            : $"AUDIO SYNTHESIZED: {audio.Length} byte(s) (COMPLETE)");
                    };
    
                    // Starts translation, and returns after a single utterance is recognized. The end of a
                    // single utterance is determined by listening for silence at the end or until a maximum of 15
                    // seconds of audio is processed. The task returns the recognized text as well as the translation.
                    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                    // shot recognition like command or query.
                    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                    Console.WriteLine("Say something...");
                    var result = await recognizer.RecognizeOnceAsync();
    
                    // Checks result.
                    if (result.Reason == ResultReason.TranslatedSpeech)
                    {
                        Console.WriteLine($"RECOGNIZED '{fromLanguage}': {result.Text}");
                        Console.WriteLine($"TRANSLATED into '{toLanguage}': {result.Translations[toLanguage]}");
                    }
                    else if (result.Reason == ResultReason.RecognizedSpeech)
                    {
                        Console.WriteLine($"RECOGNIZED '{fromLanguage}': {result.Text} (text could not be translated)");
                    }
                    else if (result.Reason == ResultReason.NoMatch)
                    {
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                    }
                    else if (result.Reason == ResultReason.Canceled)
                    {
                        var cancellation = CancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                    }
                }
            }
    
            static void Main(string[] args)
            {
                TranslateSpeechToSpeech().Wait();
            }
        }
    }
    
  2. In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  4. From the menu bar, choose File > Save All.

Build and run the application

  1. From the menu bar, select Build > Build Solution to build the application. The code should compile without errors now.

  2. Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. Speak an English phrase or sentence. The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). The Speech service then sends the synthesized audio and the text back to the application for display.

Say something...
AUDIO SYNTHESIZED: 76784 byte(s)
AUDIO SYNTHESIZED: 0 byte(s)(COMPLETE)
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

Next steps

In this quickstart you will use the Speech SDK to interactively translate speech from one language to speech in another language. After satisfying a few prerequisites, translating speech-to-speech only takes six steps:

  • Create a SpeechTranslationConfig object from your subscription key and region.
  • Update the SpeechTranslationConfig object to specify the source and target languages.
  • Update the SpeechTranslationConfig object to specify the speech output voice name.
  • Create a TranslationRecognizer object using the SpeechTranslationConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

You can view or download all Speech SDK C++ Samples on GitHub.

Prerequisites

Before you get started, make sure to:

Add sample code

  1. Open the source file helloworld.cpp.

  2. Replace all the code with the following snippet:

    #include <iostream>
    #include <vector>
    #include <speechapi_cxx.h>
    
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Translation;
    
    void TranslateSpeechToSpeech()
    {
        // Creates an instance of a speech translation config with specified subscription key and service region.
        // Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
        auto config = SpeechTranslationConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
        // Sets source and target languages.
        // Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
        auto fromLanguage = "en-US";
        auto toLanguage = "de";
        config->SetSpeechRecognitionLanguage(fromLanguage);
        config->AddTargetLanguage(toLanguage);
    
        // Sets the synthesis output voice name.
        // Replace with the languages of your choice, from list found here: https://aka.ms/speech/tts-languages
        config->SetVoiceName("de-DE-Hedda");
    
        // Creates a translation recognizer using the default microphone audio input device.
        auto recognizer = TranslationRecognizer::FromConfig(config);
    
        // Prepare to handle the synthesized audio data.
        recognizer->Synthesizing.Connect([](const TranslationSynthesisEventArgs& e)
        {
            auto size = e.Result->Audio.size();
            cout << "AUDIO SYNTHESIZED: " << size << " byte(s)" << (size == 0 ? "(COMPLETE)" : "") << std::endl;
        });
    
        // Starts translation, and returns after a single utterance is recognized. The end of a
        // single utterance is determined by listening for silence at the end or until a maximum of 15
        // seconds of audio is processed. The task returns the recognized text as well as the translation.
        // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
        // shot recognition like command or query.
        // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
        cout << "Say something...\n";
        auto result = recognizer->RecognizeOnceAsync().get();
    
        // Checks result.
        if (result->Reason == ResultReason::TranslatedSpeech)
        {
            cout << "RECOGNIZED '" << fromLanguage << "': " << result->Text << std::endl;
            cout << "TRANSLATED into '" << toLanguage << "': " << result->Translations.at(toLanguage) << std::endl;
        }
        else if (result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED '" << fromLanguage << "' " << result->Text << " (text could not be translated)" << std::endl;
        }
        else if (result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
        else if (result->Reason == ResultReason::Canceled)
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
    }
    
    int wmain()
    {
        TranslateSpeechToSpeech();
        return 0;
    }
    
  3. In the same file, replace the string YourSubscriptionKey with your subscription key.

  4. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  5. From the menu bar, choose File > Save All.

Build and run the application

  1. From the menu bar, select Build > Build Solution to build the application. The code should compile without errors now.

  2. Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. Speak an English phrase or sentence. The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). The Speech service then sends the synthesized audio and the text back to the application for display.

Say something...
AUDIO SYNTHESIZED: 76784 byte(s)
AUDIO SYNTHESIZED: 0 byte(s)(COMPLETE)
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

Next steps


In this quickstart you will use the Speech SDK to interactively translate speech from one language to speech in another language. After satisfying a few prerequisites, translating speech-to-speech only takes six steps:

  • Create a SpeechTranslationConfig object from your subscription key and region.
  • Update the SpeechTranslationConfig object to specify the source and target languages.
  • Update the SpeechTranslationConfig object to specify the speech output voice name.
  • Create a TranslationRecognizer object using the SpeechTranslationConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

You can view or download all Speech SDK Java Samples on GitHub.

Prerequisites

Before you get started, make sure to:

Add sample code

  1. To add a new empty class to your Java project, select File > New > Class.

  2. In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    Screenshot of New Java Class window

  3. Replace all code in Main.java with the following snippet:

    package quickstart;
    
    import java.io.IOException;
    import java.util.concurrent.Future;
    import java.util.concurrent.ExecutionException;
    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.translation.*;
    
    public class Main {
    
        public static void translateSpeechToSpeech() throws InterruptedException, ExecutionException, IOException
        {
            // Creates an instance of a speech translation config with specified
            // subscription key and service region. Replace with your own subscription key
            // and region identifier from here: https://aka.ms/speech/sdkregion
    
            int exitCode = 1;
            SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription("YourSubscriptionKey",  "YourServiceRegion");
            assert(config != null);
    
            // Sets source and target languages.
            String fromLanguage = "en-US";
            String toLanguage = "de";
            config.setSpeechRecognitionLanguage(fromLanguage);
            config.addTargetLanguage(toLanguage);
    
            // Sets the synthesis output voice name.
            // Replace with the languages of your choice, from list found here: https://aka.ms/speech/tts-languages
            config.setVoiceName("de-DE-Hedda");
    
            // Creates a translation recognizer using the default microphone audio input device.
            TranslationRecognizer recognizer = new TranslationRecognizer(config);
            assert(recognizer != null);
    
            // Prepare to handle the synthesized audio data.
            recognizer.synthesizing.addEventListener((s, e) -> {
                int size = e.getResult().getAudio().length;
                System.out.println(size != 0
                    ? "AUDIO SYNTHESIZED: " + size + " byte(s)"
                    : "AUDIO SYNTHESIZED: " + size + " byte(s) (COMPLETE)");
            });
    
            System.out.println("Say something...");
    
            // Starts translation, and returns after a single utterance is recognized. The end of a
            // single utterance is determined by listening for silence at the end or until a maximum of 15
            // seconds of audio is processed. The task returns the recognized text as well as the translation.
            // Note: Since recognizeOnceAsync() returns only a single utterance, it is suitable only for single
            // shot recognition like command or query.
            // For long-running multi-utterance recognition, use startContinuousRecognitionAsync() instead.
            Future<TranslationRecognitionResult> task = recognizer.recognizeOnceAsync();
            assert(task != null);
    
            TranslationRecognitionResult result = task.get();
            assert(result != null);
    
            if (result.getReason() == ResultReason.TranslatedSpeech) {
                System.out.println("RECOGNIZED '" + fromLanguage + "': " + result.getText());
                System.out.println("TRANSLATED into '" + toLanguage + "': " + result.getTranslations().get(toLanguage));
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("RECOGNIZED '" + fromLanguage + "': " + result.getText() + "(text could not be  translated)");
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (result.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(result);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                if (cancellation.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }
            }
    
            recognizer.close();
    
            System.exit(exitCode);
        }
    
        public static void main(String[] args) {
            try {
                translateSpeechToSpeech();
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. Replace the string YourSubscriptionKey with your subscription key.

  5. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  6. Save changes to the project.

Build and run the app

Press F11, or select Run > Debug.

  1. Speak an English phrase or sentence. The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). The Speech service then sends the synthesized audio and the text back to the application for display.
Say something...
AUDIO SYNTHESIZED: 76784 byte(s)
AUDIO SYNTHESIZED: 0 byte(s)(COMPLETE)
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

Next steps


In this quickstart you will use the Speech SDK to interactively translate speech from one language to speech in another language. After satisfying a few prerequisites, translating speech-to-speech only takes six steps:

  • Create a SpeechTranslationConfig object from your subscription key and region.
  • Update the SpeechTranslationConfig object to specify the source and target languages.
  • Update the SpeechTranslationConfig object to specify the speech output voice name.
  • Create a TranslationRecognizer object using the SpeechTranslationConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

You can view or download all Speech SDK Python Samples on GitHub.

Prerequisites

Before you get started, make sure to:

Add sample code

  1. Open quickstart.py, and replace all the code in it with the following.

    import azure.cognitiveservices.speech as speechsdk
    
    speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
    
    def translate_speech_to_speech():
    
        # Creates an instance of a speech translation config with specified subscription key and service region.
        # Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
        translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)
    
        # Sets source and target languages.
        # Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
        fromLanguage = 'en-US'
        toLanguage = 'de'
        translation_config.speech_recognition_language = fromLanguage
        translation_config.add_target_language(toLanguage)
    
        # Sets the synthesis output voice name.
        # Replace with the languages of your choice, from list found here: https://aka.ms/speech/tts-languages
        translation_config.voice_name = "de-DE-Hedda"
    
        # Creates a translation recognizer using and audio file as input.
        recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config)
    
        # Prepare to handle the synthesized audio data.
        def synthesis_callback(evt):
            size = len(evt.result.audio)
            print('AUDIO SYNTHESIZED: {} byte(s) {}'.format(size, '(COMPLETED)' if size == 0 else ''))
    
        recognizer.synthesizing.connect(synthesis_callback)
    
        # Starts translation, and returns after a single utterance is recognized. The end of a
        # single utterance is determined by listening for silence at the end or until a maximum of 15
        # seconds of audio is processed. It returns the recognized text as well as the translation.
        # Note: Since recognize_once() returns only a single utterance, it is suitable only for single
        # shot recognition like command or query.
        # For long-running multi-utterance recognition, use start_continuous_recognition() instead.
        print("Say something...")
        result = recognizer.recognize_once()
    
        # Check the result
        if result.reason == speechsdk.ResultReason.TranslatedSpeech:
            print("RECOGNIZED '{}': {}".format(fromLanguage, result.text))
            print("TRANSLATED into {}: {}".format(toLanguage, result.translations['de']))
        elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("RECOGNIZED: {} (text could not be translated)".format(result.text))
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("NOMATCH: Speech could not be recognized: {}".format(result.no_match_details))
        elif result.reason == speechsdk.ResultReason.Canceled:
            print("CANCELED: Reason={}".format(result.cancellation_details.reason))
            if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("CANCELED: ErrorDetails={}".format(result.cancellation_details.error_details))
    
    translate_speech_to_speech()
    
  2. In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  4. Save the changes you've made to quickstart.py.

Build and run your app

  1. Run the sample from the console or in your IDE:

    python quickstart.py
    
  2. Speak an English phrase or sentence. The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). The Speech service then sends the synthesized audio and the text back to the application for display.

    Say something...
    AUDIO SYNTHESIZED: 76784 byte(s)
    AUDIO SYNTHESIZED: 0 byte(s) (COMPLETE)
    RECOGNIZED 'en-US': What's the weather in Seattle?
    TRANSLATED into 'de': Wie ist das Wetter in Seattle?
    

Next steps

In this quickstart you will use the Speech SDK to interactively translate speech from one language to speech in another language. After satisfying a few prerequisites, translating speech-to-speech only takes six steps:

  • Create a SpeechTranslationConfig object from your subscription key and region.
  • Update the SpeechTranslationConfig object to specify the source and target languages.
  • Update the SpeechTranslationConfig object to specify the speech output voice name.
  • Create a TranslationRecognizer object using the SpeechTranslationConfig object from above.
  • Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • Inspect the TranslationRecognitionResult returned.

You can view or download all Speech SDK JavaScript Samples on GitHub.

Prerequisites

Before you get started:

Create a new Website folder

Create a new, empty folder. In case you want to host the sample on a web server, make sure that the web server can access the folder.

Unpack the Speech SDK for JavaScript into that folder

Download the Speech SDK as a .zip package and unpack it into the newly created folder. This results in two files being unpacked, microsoft.cognitiveservices.speech.sdk.bundle.js and microsoft.cognitiveservices.speech.sdk.bundle.js.map. The latter file is optional, and is useful for debugging into the SDK code.

Create an index.html page

Create a new file in the folder, named index.html and open this file with a text editor.

  1. Create the following HTML skeleton:
<!DOCTYPE html>
<html>
<head>
  <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
  <meta charset="utf-8" />
</head>
<body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
  <!-- <uidiv> -->
  <div id="warning">
    <h1 style="font-weight:500;">Speech Recognition Speech SDK not found (microsoft.cognitiveservices.speech.sdk.bundle.js missing).</h1>
  </div>
  
  <div id="content" style="display:none">
    <table width="100%">
      <tr>
        <td></td>
        <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
      </tr>
      <tr>
        <td align="right"><a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
        <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
      </tr>
      <tr>
        <td align="right">Region</td>
        <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
      </tr>
      <tr>
        <td align="right">Source Language</td>
        <td><select id="languageSourceOptions">
          <option value="ar-EG">Arabic - EG</option>
          <option selected="selected" value="de-DE">German - DE</option>
          <option value="en-US">English - US</option>
          <option value="es-ES">Spanish - ES</option>
          <option value="fi-FI">Finnish - FI</option>
          <option value="fr-FR">French - FR</option>
          <option value="hi-IN">Hindi - IN</option>
          <option value="it-IT">Italian - IT</option>
          <option value="ja-JP">Japanese - JP</option>
          <option value="ko-KR">Korean - KR</option>
          <option value="pl-PL">Polish - PL</option>
          <option value="pt-BR">Portuguese - BR</option>
          <option value="ru-RU">Russian - RU</option>
          <option value="sv-SE">Swedish - SE</option>
          <option value="zh-CN">Chinese - CN</option>
        </select></td>
      </tr>
      <tr>
        <td align="right">Target Language</td>
        <td><select id="languageTargetOptions">
          <option value="ar-EG">Arabic - EG</option>
          <option selected="selected" value="de-DE">German - DE</option>
          <option value="en-US">English - US</option>
          <option value="es-ES">Spanish - ES</option>
          <option value="fi-FI">Finnish - FI</option>
          <option value="fr-FR">French - FR</option>
          <option value="hi-IN">Hindi - IN</option>
          <option value="it-IT">Italian - IT</option>
          <option value="ja-JP">Japanese - JP</option>
          <option value="ko-KR">Korean - KR</option>
          <option value="pl-PL">Polish - PL</option>
          <option value="pt-BR">Portuguese - BR</option>
          <option value="ru-RU">Russian - RU</option>
          <option value="sv-SE">Swedish - SE</option>
          <option value="zh-CN">Chinese - CN</option>
        </select></td>
      </tr>
      <tr>
        <td></td>
        <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
      </tr>
      <tr>
        <td align="right" valign="top">Results</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
      </tr>
    </table>
  </div>
  <!-- </uidiv> -->

  <!-- <speechsdkdiv> -->
  <!-- Speech SDK reference sdk. -->
  <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>
  <!-- </speechsdkdiv> -->

  <!-- <authorizationfunction> -->
  <!-- Speech SDK Authorization token -->
  <script>
  // Note: Replace the URL with a valid endpoint to retrieve
  //       authorization tokens for your subscription.
  var authorizationEndpoint = "token.php";

  function RequestAuthorizationToken() {
    if (authorizationEndpoint) {
      var a = new XMLHttpRequest();
      a.open("GET", authorizationEndpoint);
      a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
      a.send("");
      a.onload = function() {
        var token = JSON.parse(atob(this.responseText.split(".")[1]));
        serviceRegion.value = token.region;
        authorizationToken = this.responseText;
        subscriptionKey.disabled = true;
        subscriptionKey.value = "using authorization token (hit F5 to refresh)";
        console.log("Got an authorization token: " + token);
      }
    }
  }
  </script>
  <!-- </authorizationfunction> -->

  <!-- <quickstartcode> -->
  <!-- Speech SDK USAGE -->
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var startRecognizeOnceAsyncButton;

    // subscription key and region for speech services.
    var subscriptionKey, serviceRegion, languageTargetOptions, languageSourceOptions;
    var authorizationToken;
    var SpeechSDK;
    var recognizer;

    document.addEventListener("DOMContentLoaded", function () {
      startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
      subscriptionKey = document.getElementById("subscriptionKey");
      serviceRegion = document.getElementById("serviceRegion");
      languageTargetOptions = document.getElementById("languageTargetOptions");
      languageSourceOptions = document.getElementById("languageSourceOptions");
      phraseDiv = document.getElementById("phraseDiv");

      startRecognizeOnceAsyncButton.addEventListener("click", function () {
        var soundContext = undefined;
        try {
          var AudioContext = window.AudioContext || window.webkitAudioContext || false;
          if (AudioContext) {
            soundContext = new AudioContext();
          } else {
            alert("AudioContext not supported");
          }
        } catch (e) {
          window.console.log("no sound context found, no audio output. " + e);
        }

        startRecognizeOnceAsyncButton.disabled = true;
        phraseDiv.innerHTML = "";

        // if we got an authorization token, use the token. Otherwise use the provided subscription key
        var speechConfig;
        if (authorizationToken) {
          speechConfig = SpeechSDK.SpeechTranslationConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
        } else {
          if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
            alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
            startRecognizeOnceAsyncButton.disabled = false;
            return;
          }
          speechConfig = SpeechSDK.SpeechTranslationConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
        }

        speechConfig.speechRecognitionLanguage = languageSourceOptions.value;
        var language = languageTargetOptions.value;
        speechConfig.addTargetLanguage(language);

        speechConfig.setProperty(SpeechSDK.PropertyId.SpeechServiceConnection_TranslationVoice, languageTargetOptions.value);
        var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
        recognizer = new SpeechSDK.TranslationRecognizer(speechConfig, audioConfig);

        // Signals an audio payload of synthesized speech is ready for playback.
        // If the event result contains valid audio, it's reason will be ResultReason.SynthesizingAudio
        // Once a complete phrase has been synthesized, the event will be called with ResultReason.SynthesizingAudioComplete and a 0 byte audio payload.
        recognizer.synthesizing = function (s, e) {
          window.console.log(e);

          var audioSize = e.result.audio === undefined ? 0 : e.result.audio.byteLength;

          phraseDiv.innerHTML += "(synthesizing) Reason: " + SpeechSDK.ResultReason[e.result.reason] + " " + audioSize + " bytes\r\n";

          if (e.result.audio && soundContext) {
            var source = soundContext.createBufferSource();
            soundContext.decodeAudioData(e.result.audio, function (newBuffer) {
              source.buffer = newBuffer;
              source.connect(soundContext.destination);
              source.start(0);
            });
          }
        };
        recognizer.recognizeOnceAsync(
          function (result) {
            startRecognizeOnceAsyncButton.disabled = false;
            var languageKey = language.substring(0,2)
            var translation = result.translations.get(languageKey);
            window.console.log(translation);
            phraseDiv.innerHTML += translation;

            recognizer.close();
            recognizer = undefined;
          },
          function (err) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += err;
            window.console.log(err);

            recognizer.close();
            recognizer = undefined;
          });
      });

      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        startRecognizeOnceAsyncButton.disabled = false;

        document.getElementById('content').style.display = 'block';
        document.getElementById('warning').style.display = 'none';

        // in case we have a function for getting an authorization token, call it.
        if (typeof RequestAuthorizationToken === "function") {
          RequestAuthorizationToken();
        }
      }
    });
  </script>
  <!-- </quickstartcode> -->
</body>
</html>

Create the token source (optional)

In case you want to host the web page on a web server, you can optionally provide a token source for your demo application. That way, your subscription key will never leave your server while allowing users to use speech capabilities without entering any authorization code themselves.

Create a new file named token.php. In this example we assume your web server supports the PHP scripting language with curl enabled. Enter the following code:

<?php
header('Access-Control-Allow-Origin: ' . $_SERVER['SERVER_NAME']);

// Replace with your own subscription key and service region (e.g., "westus").
$subscriptionKey = 'YourSubscriptionKey';
$region = 'YourServiceRegion';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://' . $region . '.api.cognitive.microsoft.com/sts/v1.0/issueToken');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, '{}');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json', 'Ocp-Apim-Subscription-Key: ' . $subscriptionKey));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
?>

Note

Authorization tokens only have a limited lifetime. This simplified example does not show how to refresh authorization tokens automatically. As a user, you can manually reload the page or hit F5 to refresh.

Build and run the sample locally

To launch the app, double-click on the index.html file or open index.html with your favorite web browser. It will present a simple GUI allowing you to enter your subscription key and region and trigger text transcription of the input speech.

Build and run the sample via a web server

To launch your app, open your favorite web browser and point it to the public URL that you host the folder on, enter your region, and trigger text transcription of the input speech. If configured, it will acquire a token from your token source.

Next steps

View or download all Speech SDK Samples on GitHub.

Additional language and platform support

If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. Don't worry, we have additional quickstart materials and code samples available on GitHub. Use the table to find the right sample for your programming language and platform/OS combination.

Language Code samples
C++ Quickstarts, Samples
C# .NET Framework, .NET Core, UWP, Unity, Xamarin
Java Android, JRE
Javascript Browser
Node.js Windows, Linux, macOS
Objective-C iOS, macOS
Python Windows, Linux, macOS
Swift iOS, macOS