Quickstart: Recognize speech from a microphone

In this quickstart, you use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.

After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create a SpeechRecognizer object using the SpeechConfig object from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

You can view or download all Speech SDK C# Samples on GitHub.

Choose your target environment

Prerequisites

Before you get started:

Open your project in Visual Studio

The first step is to make sure that you have your project open in Visual Studio.

  1. Launch Visual Studio 2019.
  2. Load your project and open Program.cs.

Source code

Replace the contents of the Program.cs file with the following C# code.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

namespace Speech.Recognition
{
    class Program
    {
        static async Task Main()
        {
            await RecognizeSpeechAsync();

            Console.WriteLine("Please press any key to continue...");
            Console.ReadLine();
        }

        static async Task RecognizeSpeechAsync()
        {
            var config =
                SpeechConfig.FromSubscription(
                    "YourSubscriptionKey",
                    "YourServiceRegion");

            using var recognizer = new SpeechRecognizer(config);
            
            var result = await recognizer.RecognizeOnceAsync();
            switch (result.Reason)
            {
                case ResultReason.RecognizedSpeech:
                    Console.WriteLine($"We recognized: {result.Text}");
                    break;
                case ResultReason.NoMatch:
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                    break;
                case ResultReason.Canceled:
                    var cancellation = CancellationDetails.FromResult(result);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
                    }
                    break;
            }
        }
    }
}

Replace the YourSubscriptionKey and YourServiceRegion values with actual values from the Speech resource.

  • Navigate to the Azure portal , and open the Speech resource
  • Under the Keys on the left, there are two available subscription keys
    • Use either one as the YourSubscriptionKey value replacement
  • Under the Overview on the left, note the region and map it to the region identifier
    • Use the Region identifier as the YourServiceRegion value replacement, for example: "westus" for West US

Code explanation

The Speech resource subscription key and region are required to create a speech configuration object. The configuration object is needed to instantiate a speech recognizer object.

The recognizer instance exposes multiple ways to recognize speech. In this example, speech is recognized once. This functionality lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech. Once the result is yielded, the code will write the recognition reason to the console.

Tip

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Build and run app

Now you're ready to rebuild your app and test the speech recognition functionality using the Speech service.

  1. Compile the code - From the menu bar of Visual Studio, choose Build > Build Solution.
  2. Start your app - From the menu bar, choose Debug > Start Debugging or press F5.
  3. Start recognition - It will prompt you to speak a phrase in English. Your speech is sent to the Speech service, transcribed as text, and rendered in the console.

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart, you use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.

After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create a SpeechRecognizer object using the SpeechConfig object from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

You can view or download all Speech SDK C++ Samples on GitHub.

Choose your target environment

Prerequisites

Before you get started:

Source code

Create a C++ source file named helloworld.cpp, and paste the following code into it.

#include <iostream> // cin, cout
#include <speechapi_cxx.h>

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

void recognizeSpeech() {
    // Creates an instance of a speech config with specified subscription key and service region.
    // Replace with your own subscription key and service region (e.g., "westus").
    auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

    // Creates a speech recognizer
    auto recognizer = SpeechRecognizer::FromConfig(config);
    cout << "Say something...\n";

    // Starts speech recognition, and returns after a single utterance is recognized. The end of a
    // single utterance is determined by listening for silence at the end or until a maximum of 15
    // seconds of audio is processed.  The task returns the recognition text as result. 
    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
    // shot recognition like command or query. 
    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
    auto result = recognizer->RecognizeOnceAsync().get();

    // Checks result.
    if (result->Reason == ResultReason::RecognizedSpeech) {
        cout << "We recognized: " << result->Text << std::endl;
    }
    else if (result->Reason == ResultReason::NoMatch) {
        cout << "NOMATCH: Speech could not be recognized." << std::endl;
    }
    else if (result->Reason == ResultReason::Canceled) {
        auto cancellation = CancellationDetails::FromResult(result);
        cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;

        if (cancellation->Reason == CancellationReason::Error) {
            cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
            cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
            cout << "CANCELED: Did you update the subscription info?" << std::endl;
        }
    }
}

int main(int argc, char **argv) {
    setlocale(LC_ALL, "");
    recognizeSpeech();
    return 0;
}

Replace the YourSubscriptionKey and YourServiceRegion values with actual values from the Speech resource.

  • Navigate to the Azure portal , and open the Speech resource
  • Under the Keys on the left, there are two available subscription keys
    • Use either one as the YourSubscriptionKey value replacement
  • Under the Overview on the left, note the region and map it to the region identifier
    • Use the Region identifier as the YourServiceRegion value replacement, for example: "westus" for West US

Code explanation

The Speech resource subscription key and region are required to create a speech configuration object. The configuration object is needed to instantiate a speech recognizer object.

The recognizer instance exposes multiple ways to recognize speech. In this example, speech is recognized once. This functionality lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech. Once the result is yielded, the code will write the recognition reason to the console.

Tip

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Build the app

Note

Make sure to enter the commands below as a single command line. The easiest way to do that is to copy the command by using the Copy button next to each command, and then paste it at your shell prompt.

  • On an x64 (64-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/x64" -l:libasound.so.2
    
  • On an x86 (32-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/x86" -l:libasound.so.2
    
  • On an ARM64 (64-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/arm64" -l:libasound.so.2
    

Run the app

  1. Configure the loader's library path to point to the Speech SDK library.

    • On an x64 (64-bit) system, enter the following command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/x64"
      
    • On an x86 (32-bit) system, enter this command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/x86"
      
    • On an ARM64 (64-bit) system, enter the following command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/arm64"
      
  2. Run the application.

    ./helloworld
    
  3. In the console window, a prompt appears, requesting that you say something. Speak an English phrase or sentence. Your speech is transmitted to the Speech service and transcribed to text, which appears in the same window.

    Say something...
    We recognized: What's the weather like?
    

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart, you use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.

After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create a SpeechRecognizer object using the SpeechConfig object from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

You can view or download all Speech SDK Java Samples on GitHub.

Choose your target environment

Prerequisites

Before you get started:

Source code

To add a new empty class to your Java project, select File > New > Class. In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

Screenshot of New Java Class window

Replace the contents of the Main.java file with the following snippet:

package speechsdk.quickstart;

import java.util.concurrent.Future;
import com.microsoft.cognitiveservices.speech.*;

/**
 * Quickstart: recognize speech using the Speech SDK for Java.
 */
public class Main {

    /**
     * @param args Arguments are ignored in this sample.
     */
    public static void main(String[] args) {
        try {
            // Replace below with your own subscription key
            String speechSubscriptionKey = "YourSubscriptionKey";
            // Replace below with your own service region (e.g., "westus").
            String serviceRegion = "YourServiceRegion";

            int exitCode = 1;
            SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
            assert(config != null);

            SpeechRecognizer reco = new SpeechRecognizer(config);
            assert(reco != null);

            System.out.println("Say something...");

            Future<SpeechRecognitionResult> task = reco.recognizeOnceAsync();
            assert(task != null);

            SpeechRecognitionResult result = task.get();
            assert(result != null);

            if (result.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("We recognized: " + result.getText());
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (result.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(result);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());

                if (cancellation.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }
            }

            reco.close();

            System.exit(exitCode);
        } catch (Exception ex) {
            System.out.println("Unexpected exception: " + ex.getMessage());

            assert(false);
            System.exit(1);
        }
    }
}

Replace the YourSubscriptionKey and YourServiceRegion values with actual values from the Speech resource.

  • Navigate to the Azure portal , and open the Speech resource
  • Under the Keys on the left, there are two available subscription keys
    • Use either one as the YourSubscriptionKey value replacement
  • Under the Overview on the left, note the region and map it to the region identifier
    • Use the Region identifier as the YourServiceRegion value replacement, for example: "westus" for West US

Code explanation

The Speech resource subscription key and region are required to create a speech configuration object. The configuration object is needed to instantiate a speech recognizer object.

The recognizer instance exposes multiple ways to recognize speech. In this example, speech is recognized once. This functionality lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech. Once the result is yielded, the code will write the recognition reason to the console.

Tip

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Build and run app

Press F11, or select Run > Debug. The next 15 seconds of speech input from your microphone will be recognized and logged in the console window.

Screenshot of console output after successful recognition

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart, you use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.

After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create a SpeechRecognizer object using the SpeechConfig object from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

You can view or download all Speech SDK Python Samples on GitHub.

Prerequisites

Before you get started:

Source code

Create a file named quickstart.py and paste the following Python code in it.

import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")


# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed.  The task returns the recognition text as result. 
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query. 
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

Replace the YourSubscriptionKey and YourServiceRegion values with actual values from the Speech resource.

  • Navigate to the Azure portal , and open the Speech resource
  • Under the Keys on the left, there are two available subscription keys
    • Use either one as the YourSubscriptionKey value replacement
  • Under the Overview on the left, note the region and map it to the region identifier
    • Use the Region identifier as the YourServiceRegion value replacement, for example: "westus" for West US

Code explanation

The Speech resource subscription key and region are required to create a speech configuration object. The configuration object is needed to instantiate a speech recognizer object.

The recognizer instance exposes multiple ways to recognize speech. In this example, speech is recognized once. This functionality lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech. Once the result is yielded, the code will write the recognition reason to the console.

Tip

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Build and run app

Now you're ready to test speech recognition using the Speech service.

If you're running this on macOS and it's the first Python app you've built that uses a microphone, you'll probably need to give Terminal access to the microphone. Open System Settings and select Security & Privacy. Next, select Privacy and locate Microphone in the list. Last, select Terminal and save.

  1. Start your app - From the command line, type:
    python quickstart.py
    
  2. Start recognition - It will prompt you to speak a phrase in English. Your speech is sent to the Speech service, transcribed as text, and rendered in the console.

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

In this quickstart, you use the Speech CLI from the command line to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to use the Speech CLI to perform common recognition tasks, such as transcribing conversations. After a one-time configuration, the Speech CLI lets you transcribe audio into text interactively with a microphone or from files using a batch script.

Prerequisites

The only prerequisite is an Azure Speech subscription. See the guide on creating a new subscription if you don't already have one.

Download and install

Follow these steps to install the Speech CLI on Windows:

  1. Install either .NET Framework 4.7 or .NET Core 3.0
  2. Download the Speech CLI zip archive, then extract it.
  3. Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

Note

On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows Terminal supports all fonts produced interactively by the Speech CLI. If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

Note

Powershell does not check the local directory when looking for a command. In Powershell, change directory to the location of spx and call the tool by entering .\spx. If you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

Create subscription config

To start using the Speech CLI, you first need to enter your Speech subscription key and region information. See the region support page to find your region identifier. Once you have your subscription key and region identifier (ex. eastus, westus), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

Your subscription authentication is now stored for future SPX requests. If you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

Enable microphone

Plug in and turn on your PC microphone, and turn off any apps that might also use the microphone. Some computers have a built-in microphone, while others require configuration of a Bluetooth device.

Run the Speech CLI

Now you're ready to run the Speech CLI to recognize speech from your microphone.

  1. Start your app - From the command line, change to the directory that contains the Speech CLI binary file, and type:

    spx recognize --microphone
    

    Note

    The Speech CLI defaults to English. You can choose a different language from the Speech-to-text table. For example, add --source de-DE to recognize German speech.

  2. Start recognition - Speak into the microphone. You will see transcription of your words into text in real-time. The Speech CLI will stop after a period of silence, or when you press ctrl-C.

Next steps

Continue exploring the basics to learn about other features of the Speech CLI.

In this quickstart, you use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.

After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create a SpeechRecognizer object using the SpeechConfig object from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

You can view or download all Speech SDK Go Samples on GitHub.

Prerequisites

Before you get started:

Setup your environment

Update the go.mod file with the latest SDK version by adding this line

require (
    github.com/Microsoft/cognitive-services-speech-sdk-go v1.13.0
)

Start with some boilerplate code

  1. Replace the contents of your source file (e.g. sr-quickstart.go) with the below, which includes:
  • "main" package definition
  • importing the necessary modules from the Speech SDK
  • variables for storing the subscription information that will be replaced later in this quickstart
  • a simple implementation using the microphone for audio input
  • event handlers for various events that take place during a speech recognition
package recognizer

import (
    "bufio"
    "fmt"
    "os"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func recognizingHandler(event speech.SpeechRecognitionEventArgs) {
    defer event.Close()
    fmt.Println("Recognizing:", event.Result.Text)
}

func recognizedHandler(event speech.SpeechRecognitionEventArgs) {
    defer event.Close()
    fmt.Println("Recognized:", event.Result.Text)
}

func cancelledHandler(event speech.SpeechRecognitionCanceledEventArgs) {
    defer event.Close()
    fmt.Println("Received a cancellation: ", event.ErrorDetails)
}

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_SUBSCRIPTIONKEY_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer speechRecognizer.Close()
    speechRecognizer.Recognizing(recognizingHandler)
    speechRecognizer.Recognized(recognizedHandler)
    speechRecognizer.Canceled(cancelledHandler)
    speechRecognizer.StartContinuousRecognitionAsync()
    defer speechRecognizer.StopContinuousRecognitionAsync()
    bufio.NewReader(os.Stdin).ReadBytes('\n')
}

Replace the YOUR_SUBSCRIPTION_KEY and YOUR_SUBSCRIPTIONKEY_REGION values with actual values from the Speech resource.

  • Navigate to the Azure portal, and open the Speech resource
  • Under the Keys on the left, there are two available subscription keys
    • Use either one as the YOUR_SUBSCRIPTION_KEY value replacement
  • Under the Overview on the left, note the region and map it to the region identifier
  • Use the Region identifier as the YOUR_SUBSCRIPTIONKEY_REGION value replacement, for example: "westus" for West US

Code explanation

The Speech subscription key and region are required to create a speech configuration object. The configuration object is needed to instantiate a speech recognizer object.

The recognizer instance exposes multiple ways to recognize speech. In this example, speech is continuously recognized. This functionality lets the Speech service know that you're sending many phrases for recognition, and when the program terminates to stop recognizing speech. As results are yielded, the code will write them to the console.

Build and run

You're now set up to build your project and test speech recognition using the Speech service.

  1. Build your project, e.g. "go build"
  2. Run the module and speak a phrase or sentence into your device's microphone. Your speech is transmitted to the Speech service and transcribed to text, which appears in the output.

Note

The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

Next steps

In this quickstart, you use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.

After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:

  • Create a SpeechConfig object from your subscription key and region.
  • Create a SpeechRecognizer object using the SpeechConfig object from above.
  • Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • Inspect the SpeechRecognitionResult returned.

You can view or download all Speech SDK JavaScript Samples on GitHub.

Prerequisites

Before you get started:

Create a new Website folder

Create a new, empty folder. In case you want to host the sample on a web server, make sure that the web server can access the folder.

Unpack the Speech SDK for JavaScript into that folder

Download the Speech SDK as a .zip package and unpack it into the newly created folder. This results in two files being unpacked, microsoft.cognitiveservices.speech.sdk.bundle.js and microsoft.cognitiveservices.speech.sdk.bundle.js.map. The latter file is optional, and is useful for debugging into the SDK code.

Create an index.html page

Create a new file in the folder, named index.html and open this file with a text editor.

  1. Create the following HTML skeleton:
<!DOCTYPE html>
<html>
<head>
  <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
  <meta charset="utf-8" />
</head>
<body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
  <!-- <uidiv> -->
  <div id="warning">
    <h1 style="font-weight:500;">Speech Recognition Speech SDK not found (microsoft.cognitiveservices.speech.sdk.bundle.js missing).</h1>
  </div>
  
  <div id="content" style="display:none">
    <table width="100%">
      <tr>
        <td></td>
        <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
      </tr>
      <tr>
        <td align="right"><a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
        <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
      </tr>
      <tr>
        <td align="right">Region</td>
        <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
      </tr>
      <tr>
        <td></td>
        <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
      </tr>
      <tr>
        <td align="right" valign="top">Results</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
      </tr>
    </table>
  </div>
  <!-- </uidiv> -->

  <!-- <speechsdkref> -->
  <!-- Speech SDK reference sdk. -->
  <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>
  <!-- </speechsdkref> -->

  <!-- <authorizationfunction> -->
  <!-- Speech SDK Authorization token -->
  <script>
  // Note: Replace the URL with a valid endpoint to retrieve
  //       authorization tokens for your subscription.
  var authorizationEndpoint = "token.php";

  function RequestAuthorizationToken() {
    if (authorizationEndpoint) {
      var a = new XMLHttpRequest();
      a.open("GET", authorizationEndpoint);
      a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
      a.send("");
      a.onload = function() {
          var token = JSON.parse(atob(this.responseText.split(".")[1]));
          serviceRegion.value = token.region;
          authorizationToken = this.responseText;
          subscriptionKey.disabled = true;
          subscriptionKey.value = "using authorization token (hit F5 to refresh)";
          console.log("Got an authorization token: " + token);
      }
    }
  }
  </script>
  <!-- </authorizationfunction> -->

  <!-- <quickstartcode> -->
  <!-- Speech SDK USAGE -->
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var startRecognizeOnceAsyncButton;

    // subscription key and region for speech services.
    var subscriptionKey, serviceRegion;
    var authorizationToken;
    var SpeechSDK;
    var recognizer;

    document.addEventListener("DOMContentLoaded", function () {
      startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
      subscriptionKey = document.getElementById("subscriptionKey");
      serviceRegion = document.getElementById("serviceRegion");
      phraseDiv = document.getElementById("phraseDiv");

      startRecognizeOnceAsyncButton.addEventListener("click", function () {
        startRecognizeOnceAsyncButton.disabled = true;
        phraseDiv.innerHTML = "";

        // if we got an authorization token, use the token. Otherwise use the provided subscription key
        var speechConfig;
        if (authorizationToken) {
          speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
        } else {
          if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
            alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
            return;
          }
          speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
        }

        speechConfig.speechRecognitionLanguage = "en-US";
        var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
        recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

        recognizer.recognizeOnceAsync(
          function (result) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += result.text;
            window.console.log(result);

            recognizer.close();
            recognizer = undefined;
          },
          function (err) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += err;
            window.console.log(err);

            recognizer.close();
            recognizer = undefined;
          });
      });

      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        startRecognizeOnceAsyncButton.disabled = false;

        document.getElementById('content').style.display = 'block';
        document.getElementById('warning').style.display = 'none';

        // in case we have a function for getting an authorization token, call it.
        if (typeof RequestAuthorizationToken === "function") {
            RequestAuthorizationToken();
        }
      }
    });
  </script>
  <!-- </quickstartcode> -->
</body>
</html>

Create the token source (optional)

In case you want to host the web page on a web server, you can optionally provide a token source for your demo application. That way, your subscription key will never leave your server while allowing users to use speech capabilities without entering any authorization code themselves.

Create a new file named token.php. In this example we assume your web server supports the PHP scripting language with curl enabled. Enter the following code:

<?php
header('Access-Control-Allow-Origin: ' . $_SERVER['SERVER_NAME']);

// Replace with your own subscription key and service region (e.g., "westus").
$subscriptionKey = 'YourSubscriptionKey';
$region = 'YourServiceRegion';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://' . $region . '.api.cognitive.microsoft.com/sts/v1.0/issueToken');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, '{}');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json', 'Ocp-Apim-Subscription-Key: ' . $subscriptionKey));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
?>

Note

Authorization tokens only have a limited lifetime. This simplified example does not show how to refresh authorization tokens automatically. As a user, you can manually reload the page or hit F5 to refresh.

Build and run the sample locally

To launch the app, double-click on the index.html file or open index.html with your favorite web browser. It will present a simple GUI allowing you to enter your subscription key and region and trigger a recognition using the microphone.

Note

This method doesn't work on the Safari browser. On Safari, the sample web page needs to be hosted on a web server; Safari doesn't allow websites loaded from a local file to use the microphone.

Build and run the sample via a web server

To launch your app, open your favorite web browser and point it to the public URL that you host the folder on, enter your region, and trigger a recognition using the microphone. If configured, it will acquire a token from your token source.

Next steps

With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

View or download all Speech SDK Samples on GitHub.

Additional language and platform support

If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. Don't worry, we have additional quickstart materials and code samples available on GitHub. Use the table to find the right sample for your programming language and platform/OS combination.

Language Additional Quickstarts Code samples
C# From file, From blob .NET Framework, .NET Core, UWP, Unity, Xamarin
C++ From file, From blob Windows, Linux, macOS
Java From file, From blob Android, JRE
JavaScript Browser from mic, Node.js from file Windows, Linux, macOS
Objective-C iOS, macOS iOS, macOS
Python From file, From blob Windows, Linux, macOS
Swift iOS, macOS iOS, macOS