Quickstart: Recognize speech from an audio file

In this quickstart you will use the Speech SDK to recognize speech from an audio file. After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create an AudioConfig object that specifies the .WAV file name.
  • Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK C# Samples on GitHub. Otherwise, let's get started.

Prerequisites

Before you get started, make sure to:

Supported audio input format

The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed below are also supported. Additional configuration is needed to enable the formats listed below.

  • MP3
  • OPUS/OGG
  • FLAC
  • ALAW in wav container
  • MULAW in wav container

Open your project in Visual Studio

The first step is to make sure that you have your project open in Visual Studio.

  1. Launch Visual Studio 2019.
  2. Load your project and open Program.cs.
  3. Download the whatstheweatherlike.wav and add it to your project.
    • Save the whatstheweatherlike.wav file next to the Program.cs file.
    • From the Solution Explorer right-click on the project, select Add > Existing item.
    • Select the whatstheweatherlike.wav file, then select the Add button.
    • Right-click on the newly added file, select Properties.
    • Change the Copy to Output Directory to Copy always.

Start with some boilerplate code

Let's add some code that works as a skeleton for our project. Make note that you've created an async method called RecognizeSpeechAsync().

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace HelloWorld
{
    class Program
    {
        static async Task Main()
        {
            await RecognizeSpeechAsync();
        }

        static async Task RecognizeSpeechAsync()
        {
        }
    }
}

Create a Speech configuration

Before you can initialize a SpeechRecognizer object, you need to create a configuration that uses your subscription key and subscription region. Insert this code in the RecognizeSpeechAsync() method.

Note

This sample uses the FromSubscription() method to build the SpeechConfig. For a full list of available methods, see SpeechConfig Class. The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

// Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

Create an Audio configuration

Now, you need to create an AudioConfig object that points to your audio file. This object is created inside of a using statement to ensure the proper release of unmanaged resources. Insert this code in the RecognizeSpeechAsync() method, right below your Speech configuration.

using (var audioInput = AudioConfig.FromWavFileInput("whatstheweatherlike.wav"))
{
}

Initialize a SpeechRecognizer

Now, let's create the SpeechRecognizer object using the SpeechConfig and AudioConfig objects created earlier. This object is also created inside of a using statement to ensure the proper release of unmanaged resources. Insert this code in the RecognizeSpeechAsync() method, inside the using statement that wraps your AudioConfig object.

using (var recognizer = new SpeechRecognizer(config, audioInput))
{
}

Recognize a phrase

From the SpeechRecognizer object, you're going to call the RecognizeOnceAsync() method. This method lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech.

Inside the using statement, add this code:

Console.WriteLine("Recognizing first result...");
var result = await recognizer.RecognizeOnceAsync();

Display the recognition results (or errors)

When the recognition result is returned by the Speech service, you'll want to do something with it. We're going to keep it simple and print the result to console.

Inside the using statement, below RecognizeOnceAsync(), add this code:

switch (result.Reason)
{
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"We recognized: {result.Text}");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
        break;
}

Check your code

At this point, your code should look like this:

//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace HelloWorld
{
    class Program
    {
        static async Task Main()
        {
            await RecognizeSpeechAsync();
        }

        static async Task RecognizeSpeechAsync()
        {
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

            using (var audioInput = AudioConfig.FromWavFileInput("whatstheweatherlike.wav"))
            using (var recognizer = new SpeechRecognizer(config, audioInput))
            {
                Console.WriteLine("Recognizing first result...");
                var result = await recognizer.RecognizeOnceAsync();

                switch (result.Reason)
                {
                    case ResultReason.RecognizedSpeech:
                        Console.WriteLine($"We recognized: {result.Text}");
                        break;
                    case ResultReason.NoMatch:
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                        break;
                    case ResultReason.Canceled:
                        var cancellation = CancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
                
                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                        break;
                }
            }
        }
    }
}

Build and run your app

Now you're ready to build your app and test our speech recognition using the Speech service.

  1. Compile the code: From the menu bar of Visual Studio, choose Build > Build Solution.

  2. Start your app: From the menu bar, choose Debug > Start Debugging or press F5.

  3. Start recognition: Your audio file is sent to the Speech service, transcribed as text, and rendered in the console.

    Recognizing first result...
    We recognized: What's the weather like?
    

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


In this quickstart you will use the Speech SDK to recognize speech from an audio file. After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create an AudioConfig object that specifies the .WAV file name.
  • Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK C++ Samples on GitHub. Otherwise, let's get started.

Choose your target environment

Prerequisites

Before you get started, make sure to:

Supported audio input format

The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed below are also supported. Additional configuration is needed to enable the formats listed below.

  • MP3
  • OPUS/OGG
  • FLAC
  • ALAW in wav container
  • MULAW in wav container

Add sample code

  1. Create a C++ source file named helloworld.cpp, and paste the following code into it.

    #include <iostream>
    #include <speechapi_cxx.h>
    
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Audio;
    
    void recognizeSpeechFromWavFile()
    {
        // Creates an instance of a speech config with specified subscription key and service region.
        // Replace with your own subscription key and service region (e.g., "westus").
        auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
        // Creates a speech recognizer using file as audio input.
        // Replace with your own audio file name.
        auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
        auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);
    
        // Starts speech recognition, and returns after a single utterance is recognized. The end of a
        // single utterance is determined by listening for silence at the end or until a maximum of 15
        // seconds of audio is processed.  The task returns the recognition text as result. 
        // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
        // shot recognition like command or query. 
        // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
        auto result = recognizer->RecognizeOnceAsync().get();
    
        // Checks result.
        if (result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "We recognized: " << result->Text << std::endl;
        }
        else if (result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
        else if (result->Reason == ResultReason::Canceled)
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error) 
            {
                cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
    }
    
    int wmain()
    {
        recognizeSpeechFromWavFile();
        cout << "Please press a key to continue.\n";
        cin.get();
        return 0;
    }
    
  2. In this new file, replace the string YourSubscriptionKey with your Speech service subscription key.

  3. Replace the string YourServiceRegion with the Region identifier from region associated with your subscription (for example, westus for the free trial subscription).

  4. Replace the string whatstheweatherlike.wav with your own filename.

Note

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Build the app

Note

Make sure to enter the commands below as a single command line. The easiest way to do that is to copy the command by using the Copy button next to each command, and then paste it at your shell prompt.

  • On an x64 (64-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/x64" -l:libasound.so.2
    
  • On an x86 (32-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/x86" -l:libasound.so.2
    
  • On an ARM64 (64-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/arm64" -l:libasound.so.2
    

Run the app

  1. Configure the loader's library path to point to the Speech SDK library.

    • On an x64 (64-bit) system, enter the following command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/x64"
      
    • On an x86 (32-bit) system, enter this command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/x86"
      
    • On an ARM64 (64-bit) system, enter the following command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/arm64"
      
  2. Run the application.

    ./helloworld
    
  3. Your audio file is transmitted to the Speech service and the first utterance in the file is transcribed to text, which appears in the same window.

    Recognizing first result...
    We recognized: What's the weather like?
    

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart you will use the Speech SDK to recognize speech from an audio file. After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create an AudioConfig object that specifies the .WAV file name.
  • Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK Java Samples on GitHub. Otherwise, let's get started.

Prerequisites

Supported audio input format

The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed below are also supported. Additional configuration is needed to enable the formats listed below.

  • MP3
  • OPUS/OGG
  • FLAC
  • ALAW in wav container
  • MULAW in wav container

Add sample code

  1. To add a new empty class to your Java project, select File > New > Class.

  2. In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    Screenshot of New Java Class window

  3. Replace all code in Main.java with the following snippet:

    package speechsdk.quickstart;
    
    import java.util.concurrent.Future;
    import com.microsoft.cognitiveservices.speech.*;
    
    /**
     * Quickstart: recognize speech using the Speech SDK for Java.
     */
    public class Main {
    
        /**
         * @param args Arguments are ignored in this sample.
         */
        public static void main(String[] args) {
            try {
                // Replace below with your own subscription key
                String speechSubscriptionKey = "YourSubscriptionKey";
    
                // Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
                String serviceRegion = "YourServiceRegion";
    
                // Replace below with your own filename.
                String audioFileName = "whatstheweatherlike.wav";
    
                int exitCode = 1;
                SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
                assert(config != null);
    
                AudioConfig audioInput = AudioConfig.fromWavFileInput(audioFileName);
                assert(audioInput != null);
    
                SpeechRecognizer reco = new SpeechRecognizer(config, audioInput);
                assert(reco != null);
    
                System.out.println("Recognizing first result...");
    
                Future<SpeechRecognitionResult> task = reco.recognizeOnceAsync();
                assert(task != null);
    
                SpeechRecognitionResult result = task.get();
                assert(result != null);
    
                switch (result.getReason()) {
                    case ResultReason.RecognizedSpeech: {
                            System.out.println("We recognized: " + result.getText());
                            exitCode = 0;
                        }
                        break;
                    case ResultReason.NoMatch:
                        System.out.println("NOMATCH: Speech could not be recognized.");
                        break;
                    case ResultReason.Canceled: {
                            CancellationDetails cancellation = CancellationDetails.fromResult(result);
                            System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                            if (cancellation.getReason() == CancellationReason.Error) {
                                System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                                System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                                System.out.println("CANCELED: Did you update the subscription info?");
                            }
                        }
                        break;
                }
    
                reco.close();
    
                System.exit(exitCode);
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
    
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. Replace the string YourSubscriptionKey with your subscription key.

  5. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  6. Replace the string whatstheweatherlike.wav with your own filename.

  7. Save changes to the project.

Note

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Build and run the app

Press F11, or select Run > Debug. The first 15 seconds of speech input from your audio file will be recognized and logged in the console window.

Recognizing first result...
We recognized: What's the weather like?

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


In this quickstart you will use the Speech SDK to recognize speech from an audio file. After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create an AudioConfig object that specifies the .WAV file name.
  • Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK Python Samples on GitHub. Otherwise, let's get started.

Prerequisites

Before you get started, make sure to:

Supported audio input format

The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed below are also supported. Additional configuration is needed to enable the formats listed below.

  • MP3
  • OPUS/OGG
  • FLAC
  • ALAW in wav container
  • MULAW in wav container

Support and updates

Updates to the Speech SDK Python package are distributed via PyPI and announced in the Release notes. If a new version is available, you can update to it with the command pip install --upgrade azure-cognitiveservices-speech. Check which version is currently installed by inspecting the azure.cognitiveservices.speech.__version__ variable.

If you have a problem, or you're missing a feature, see Support and help options.

Create a Python application that uses the Speech SDK

Run the sample

You can copy the sample code from this quickstart to a source file quickstart.py and run it in your IDE or in the console:

python quickstart.py

Or you can download this quickstart tutorial as a Jupyter notebook from the Speech SDK sample repository and run it as a notebook.

Sample code

Note

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates an audio configuration that points to an audio file.
# Replace with your own audio filename.
audio_filename = "whatstheweatherlike.wav"
audio_input = speechsdk.audio.AudioConfig(filename=audio_filename)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)

print("Recognizing first result...")

# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed.  The task returns the recognition text as result. 
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query. 
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

Install and use the Speech SDK with Visual Studio Code

  1. Download and install a 64-bit version of Python, 3.5 to 3.8, on your computer.

  2. Download and install Visual Studio Code.

  3. Open Visual Studio Code and install the Python extension. Select File > Preferences > Extensions from the menu. Search for Python.

    Install the Python extension

  4. Create a folder to store the project in. An example is by using Windows Explorer.

  5. In Visual Studio Code, select the File icon. Then open the folder you created.

    Open a folder

  6. Create a new Python source file, speechsdk.py, by selecting the new file icon.

    Create a file

  7. Copy, paste, and save the Python code to the newly created file.

  8. Insert your Speech service subscription information.

  9. If selected, a Python interpreter displays on the left side of the status bar at the bottom of the window. Otherwise, bring up a list of available Python interpreters. Open the command palette Ctrl+Shift+P and enter Python: Select Interpreter. Choose an appropriate one.

  10. You can install the Speech SDK Python package from within Visual Studio Code. Do that if it's not installed yet for the Python interpreter you selected. To install the Speech SDK package, open a terminal. Bring up the command palette again Ctrl+Shift+P and enter Terminal: Create New Integrated Terminal. In the terminal that opens, enter the command python -m pip install azure-cognitiveservices-speech or the appropriate command for your system.

  11. To run the sample code, right-click somewhere inside the editor. Select Run Python File in Terminal. The first 15 seconds of speech input from your audio file will be recognized and logged in the console window.

    Recognizing first result...
    We recognized: What's the weather like?
    

If you have issues following these instructions, refer to the more extensive Visual Studio Code Python tutorial.

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart you will use the Speech SDK to recognize speech from an audio file. After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create an AudioConfig object that specifies the .WAV file name.
  • Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

If you prefer to jump right in, view or download all Speech SDK JavaScript Samples on GitHub. Otherwise, let's get started.

Choose your target environment

Prerequisites

Before you get started:

Start with some boilerplate code

Let's add some code that works as a skeleton for our project.

    <!DOCTYPE html>
    <html>
    <head>
    <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
    <meta charset="utf-8" />
    </head>
    <body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
    </body>
    </html>

Add UI Elements

Now we'll add some basic UI for input boxes, reference the Speech SDK's JavaScript, and grab an authorization token if available.

  <div id="content" style="display:none">
    <table width="100%">
      <tr>
        <td></td>
        <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
      </tr>
      <tr>
        <td align="right"><a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
        <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
      </tr>
      <tr>
        <td align="right">Region</td>
        <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
      </tr>
      <tr>
        <td align="right">File</td>
        <td><input type="file" id="filePicker" accept=".wav" style="display:none" /></td>
      </tr>
      <tr>
        <td></td>
        <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
      </tr>
      <tr>
        <td align="right" valign="top">Results</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
      </tr>
    </table>
  </div>

  <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>

   <script>
  // Note: Replace the URL with a valid endpoint to retrieve
  //       authorization tokens for your subscription.
  var authorizationEndpoint = "token.php";

  function RequestAuthorizationToken() {
    if (authorizationEndpoint) {
      var a = new XMLHttpRequest();
      a.open("GET", authorizationEndpoint);
      a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
      a.send("");
      a.onload = function() {
          var token = JSON.parse(atob(this.responseText.split(".")[1]));
          serviceRegion.value = token.region;
          authorizationToken = this.responseText;
          subscriptionKey.disabled = true;
          subscriptionKey.value = "using authorization token (hit F5 to refresh)";
          console.log("Got an authorization token: " + token);
      }
    }
  }
  </script>
  
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var startRecognizeOnceAsyncButton;

    // subscription key and region for speech services.
    var subscriptionKey, serviceRegion;
    var authorizationToken;
    var SpeechSDK;
    var recognizer;
    var filePicker;
    var audioFile;

    document.addEventListener("DOMContentLoaded", function () {
      startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
      subscriptionKey = document.getElementById("subscriptionKey");
      serviceRegion = document.getElementById("serviceRegion");
      phraseDiv = document.getElementById("phraseDiv");
      filePicker = document.getElementById('filePicker');
      
      filePicker.addEventListener("change", function () {
                audioFile = filePicker.files[0];
            });

      startRecognizeOnceAsyncButton.addEventListener("click", function () {
        startRecognizeOnceAsyncButton.disabled = true;
        phraseDiv.innerHTML = "";

      });

      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        startRecognizeOnceAsyncButton.disabled = false;

        document.getElementById('content').style.display = 'block';
        document.getElementById('warning').style.display = 'none';

        // in case we have a function for getting an authorization token, call it.
        if (typeof RequestAuthorizationToken === "function") {
            RequestAuthorizationToken();
        }
      }
    });
  </script>

Create a Speech configuration

Before you can initialize a SpeechRecognizer object, you need to create a configuration that uses your subscription key and subscription region. Insert this code in the startRecognizeOnceAsyncButton.addEventListener() method.

Note

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

        // if we got an authorization token, use the token. Otherwise use the provided subscription key
        var speechConfig;
        if (authorizationToken) {
          speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
        } else {
          if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
            alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
            return;
          }
          speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
        }

        speechConfig.speechRecognitionLanguage = "en-US";

Create an Audio configuration

Now, you need to create an AudioConfig object that points to your audio file. Insert this code in the startRecognizeOnceAsyncButton.addEventListener() method, right below your Speech configuration.

        var audioConfig  = SpeechSDK.AudioConfig.fromWavFileInput(audioFile);

Initialize a SpeechRecognizer

Now, let's create the SpeechRecognizer object using the SpeechConfig and AudioConfig objects created earlier. Insert this code in the startRecognizeOnceAsyncButton.addEventListener() method.

        recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

Recognize a phrase

From the SpeechRecognizer object, you're going to call the recognizeOnceAsync() method. This method lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech.

recognizer.recognizeOnceAsync(
          function (result) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += result.text;
            window.console.log(result);

            recognizer.close();
            recognizer = undefined;
          },
          function (err) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += err;
            window.console.log(err);

            recognizer.close();
            recognizer = undefined;
          });

Check your code

<!DOCTYPE html>
<html>
<head>
  <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
  <meta charset="utf-8" />
</head>
<body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
  <!-- <uidiv> -->
  <div id="warning">
    <h1 style="font-weight:500;">Speech Recognition Speech SDK not found (microsoft.cognitiveservices.speech.sdk.bundle.js missing).</h1>
  </div>
  
  <div id="content" style="display:none">
    <table width="100%">
      <tr>
        <td></td>
        <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
      </tr>
      <tr>
        <td align="right"><a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
        <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
      </tr>
      <tr>
        <td align="right">Region</td>
        <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
      </tr>
      <tr>
        <td align="right">File</td>
        <td><input type="file" id="filePicker" accept=".wav" style="display:none" /></td>
      </tr>
      <tr>
        <td></td>
        <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
      </tr>
      <tr>
        <td align="right" valign="top">Results</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
      </tr>
    </table>
  </div>
  <!-- </uidiv> -->

  <!-- <speechsdkref> -->
  <!-- Speech SDK reference sdk. -->
  <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>
  <!-- </speechsdkref> -->

  <!-- <authorizationfunction> -->
  <!-- Speech SDK Authorization token -->
  <script>
  // Note: Replace the URL with a valid endpoint to retrieve
  //       authorization tokens for your subscription.
  var authorizationEndpoint = "token.php";

  function RequestAuthorizationToken() {
    if (authorizationEndpoint) {
      var a = new XMLHttpRequest();
      a.open("GET", authorizationEndpoint);
      a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
      a.send("");
      a.onload = function() {
          var token = JSON.parse(atob(this.responseText.split(".")[1]));
          serviceRegion.value = token.region;
          authorizationToken = this.responseText;
          subscriptionKey.disabled = true;
          subscriptionKey.value = "using authorization token (hit F5 to refresh)";
          console.log("Got an authorization token: " + token);
      }
    }
  }
  </script>
  <!-- </authorizationfunction> -->

  <!-- <quickstartcode> -->
  <!-- Speech SDK USAGE -->
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var startRecognizeOnceAsyncButton;

    // subscription key and region for speech services.
    var subscriptionKey, serviceRegion;
    var authorizationToken;
    var SpeechSDK;
    var recognizer;
    var filePicker;
    var audioFile;

    document.addEventListener("DOMContentLoaded", function () {
      startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
      subscriptionKey = document.getElementById("subscriptionKey");
      serviceRegion = document.getElementById("serviceRegion");
      phraseDiv = document.getElementById("phraseDiv");
      filePicker = document.getElementById('filePicker');
      
      filePicker.addEventListener("change", function () {
                audioFile = filePicker.files[0];
            });

      startRecognizeOnceAsyncButton.addEventListener("click", function () {
        startRecognizeOnceAsyncButton.disabled = true;
        phraseDiv.innerHTML = "";

        // if we got an authorization token, use the token. Otherwise use the provided subscription key
        var speechConfig;
        if (authorizationToken) {
          speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
        } else {
          if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
            alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
            return;
          }
          speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
        }

        speechConfig.speechRecognitionLanguage = "en-US";
        var audioConfig  = SpeechSDK.AudioConfig.fromFile(audioFile);
        recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

        recognizer.recognizeOnceAsync(
          function (result) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += result.text;
            window.console.log(result);

            recognizer.close();
            recognizer = undefined;
          },
          function (err) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += err;
            window.console.log(err);

            recognizer.close();
            recognizer = undefined;
          });
      });

      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        startRecognizeOnceAsyncButton.disabled = false;

        document.getElementById('content').style.display = 'block';
        document.getElementById('warning').style.display = 'none';

        // in case we have a function for getting an authorization token, call it.
        if (typeof RequestAuthorizationToken === "function") {
            RequestAuthorizationToken();
        }
      }
    });
  </script>
  <!-- </quickstartcode> -->
</body>
</html>

Create the token source (optional)

In case you want to host the web page on a web server, you can optionally provide a token source for your demo application. That way, your subscription key will never leave your server while allowing users to use speech capabilities without entering any authorization code themselves.

Create a new file named token.php. In this example we assume your web server supports the PHP scripting language with curl enabled. Enter the following code:

<?php
header('Access-Control-Allow-Origin: ' . $_SERVER['SERVER_NAME']);

// Replace with your own subscription key and service region (e.g., "westus").
$subscriptionKey = 'YourSubscriptionKey';
$region = 'YourServiceRegion';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://' . $region . '.api.cognitive.microsoft.com/sts/v1.0/issueToken');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, '{}');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json', 'Ocp-Apim-Subscription-Key: ' . $subscriptionKey));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
?>

Note

Authorization tokens only have a limited lifetime. This simplified example does not show how to refresh authorization tokens automatically. As a user, you can manually reload the page or hit F5 to refresh.

Build and run the sample locally

To launch the app, double-click on the index.html file or open index.html with your favorite web browser. It will present a simple GUI allowing you to enter your subscription key and region and trigger a recognition using the microphone.

Note

This method doesn't work on the Safari browser. On Safari, the sample web page needs to be hosted on a web server; Safari doesn't allow websites loaded from a local file to use the microphone.

Build and run the sample via a web server

To launch your app, open your favorite web browser and point it to the public URL that you host the folder on, enter your region, and trigger a recognition using the microphone. If configured, it will acquire a token from your token source.

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart, you use the Speech CLI from the command line to recognize speech recorded in an audio file, and produce a text transcription. It's easy to use the Speech CLI to perform common recognition tasks, such as transcribing conversations. After a one-time configuration, the Speech CLI lets you transcribe audio into text interactively with a microphone or from files using a batch script.

Prerequisites

The only prerequisite is an Azure Speech subscription. See the guide on creating a new subscription if you don't already have one.

Download and install

Follow these steps to install the Speech CLI on Windows:

  1. Install either .NET Framework 4.7 or .NET Core 3.0
  2. Download the Speech CLI zip archive, then extract it.
  3. Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

Note

Powershell does not check the local directory when looking for a command. In Powershell, change directory to the location of spx and call the tool by entering .\spx. If you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

Create subscription config

To start using the Speech CLI, you first need to enter your Speech subscription key and region information. See the region support page to find your region identifier. Once you have your subscription key and region identifier (ex. eastus, westus), run the following commands.

spx config @key --set YOUR-SUBSCRIPTION-KEY
spx config @region --set YOUR-REGION-ID

Your subscription authentication is now stored for future SPX requests. If you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

Find a file that contains speech

The Speech CLI can recognize speech in many file formats and natural languages. For this quickstart, you can use a WAV file (16kHz or 8kHz, 16-bit, and mono PCM) that contains English speech.

  1. Download the whatstheweatherlike.wav .
  2. Copy the whatstheweatherlike.wav file to the same directory as the Speech CLI binary file.

Run the Speech CLI

Now you're ready to run the Speech CLI to recognize speech found in the sound file.

From the command line, change to the directory that contains the Speech CLI binary file, and type:

spx recognize --file whatstheweatherlike.wav

Note

The Speech CLI defaults to English. You can choose a different language from the Speech-to-text table. For example, add --source de-DE to recognize German speech.

The Speech CLI will show a text transcription of the speech on the screen. Then the Speech CLI will close.

Next steps

Continue exploring the basics to learn about other features of the Speech CLI.

View or download all Speech SDK Samples on GitHub.

Additional language and platform support

If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. Don't worry, we have additional quickstart materials and code samples available on GitHub. Use the table to find the right sample for your programming language and platform/OS combination.

Language Additional Quickstarts Code samples
C# From mic, From blob .NET Framework, .NET Core, UWP, Unity, Xamarin
C++ From mic, From blob Windows, Linux, macOS
Java From mic, From blob Android, JRE
JavaScript Browser from mic, Node.js from file Windows, Linux, macOS
Objective-C iOS from mic, macOS from mic iOS, macOS
Python From mic, From blob Windows, Linux, macOS
Swift iOS from mic, macOS from mic iOS, macOS