How to recognize intents from speech using the Speech SDK for C#

The Cognitive Services Speech SDK integrates with the Language Understanding service (LUIS) to provide intent recognition. An intent is something the user wants to do: book a flight, check the weather, or make a call. The user can use whatever terms feel natural. Using machine learning, LUIS maps user requests to the intents you've defined.

Note

A LUIS application defines the intents and entities you want to recognize. It's separate from the C# application that uses the Speech service. In this article, "app" means the LUIS app, while "application" means the C# code.

In this guide, you use the Speech SDK to develop a C# console application that derives intents from user utterances through your device's microphone. You'll learn how to:

  • Create a Visual Studio project referencing the Speech SDK NuGet package
  • Create a speech configuration and get an intent recognizer
  • Get the model for your LUIS app and add the intents you need
  • Specify the language for speech recognition
  • Recognize speech from a file
  • Use asynchronous, event-driven continuous recognition

Prerequisites

Be sure you have the following items before you begin this guide:

LUIS and speech

LUIS integrates with the Speech Services to recognize intents from speech. You don't need a Speech Services subscription, just LUIS.

LUIS uses three kinds of keys:

Key type Purpose
Authoring Lets you create and modify LUIS apps programmatically
Starter Lets you test your LUIS application using text only
Endpoint Authorizes access to a particular LUIS app

For this guide, you need the endpoint key type. This guide uses the example Home Automation LUIS app, which you can create by following the Use prebuilt Home automation app quickstart. If you've created a LUIS app of your own, you can use it instead.

When you create a LUIS app, LUIS automatically generates a starter key so you can test the app using text queries. This key doesn't enable the Speech Services integration and won't work with this guide. Create a LUIS resource in the Azure dashboard and assign it to the LUIS app. You can use the free subscription tier for this guide.

After you create the LUIS resource in the Azure dashboard, log into the LUIS portal, choose your application on the My Apps page, then switch to the app's Manage page. Finally, select Keys and Endpoints in the sidebar.

LUIS portal keys and endpoint settings

On the Keys and Endpoint settings page:

  1. Scroll down to the Resources and Keys section and select Assign resource.

  2. In the Assign a key to your app dialog box, make the following changes:

    • Under Tenant, choose Microsoft.
    • Under Subscription Name, choose the Azure subscription that contains the LUIS resource you want to use.
    • Under Key, choose the LUIS resource that you want to use with the app.

    In a moment, the new subscription appears in the table at the bottom of the page.

  3. Select the icon next to a key to copy it to the clipboard. (You may use either key.)

LUIS app subscription keys

Create a speech project in Visual Studio

To create a Visual Studio project for Windows development, you need to create the project, set up Visual Studio for .NET desktop development, install the Speech SDK, and choose the target architecture.

Create the project and add the workload

To start, create the project in Visual Studio, and make sure that Visual Studio is set up for .NET desktop development:

  1. Open Visual Studio 2019.

  2. In the Start window, select Create a new project.

  3. In the Create a new project window, choose Console App (.NET Framework), and then select Next.

  4. In the Configure your new project window, enter helloworld in Project name, choose or create the directory path in Location, and then select Create.

  5. From the Visual Studio menu bar, select Tools > Get Tools and Features, which opens Visual Studio Installer and displays the Modifying dialog box.

  6. Check whether the .NET desktop development workload is available. If the workload hasn't been installed, select the check box next to it, and then select Modify to start the installation. It may take a few minutes to download and install.

    If the check box next to .NET desktop development is already selected, select Close to exit the dialog box.

    Enable .NET desktop development

  7. Close Visual Studio Installer.

Install the Speech SDK

The next step is to install the Speech SDK NuGet package, so you can reference it in the code.

  1. In the Solution Explorer, right-click the helloworld project, and then select Manage NuGet Packages to show the NuGet Package Manager.

    NuGet Package Manager

  2. In the upper-right corner, find the Package Source drop-down box, and make sure that nuget.org is selected.

  3. In the upper-left corner, select Browse.

  4. In the search box, type Microsoft.CognitiveServices.Speech and select Enter.

  5. From the search results, select the Microsoft.CognitiveServices.Speech package, and then select Install to install the latest stable version.

    Install Microsoft.CognitiveServices.Speech NuGet package

  6. Accept all agreements and licenses to start the installation.

    After the package is installed, a confirmation appears in the Package Manager Console window.

Choose the target architecture

Now, to build and run the console application, create a platform configuration matching your computer's architecture.

  1. From the menu bar, select Build > Configuration Manager. The Configuration Manager dialog box appears.

    Configuration Manager dialog box

  2. In the Active solution platform drop-down box, select New. The New Solution Platform dialog box appears.

  3. In the Type or select the new platform drop-down box:

    • If you're running 64-bit Windows, select x64.
    • If you're running 32-bit Windows, select x86.
  4. Select OK and then Close.

Add the code

Next, you add code to the project.

  1. From Solution Explorer, open the file Program.cs.

  2. Replace the block of using statements at the beginning of the file with the following declarations:

    using System;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    using Microsoft.CognitiveServices.Speech.Intent;
    
  3. Inside the provided Main() method, add the following code:

    RecognizeIntentAsync().Wait();
    Console.WriteLine("Please press Enter to continue.");
    Console.ReadLine();
    
  4. Create an empty asynchronous method RecognizeIntentAsync(), as shown here:

    static async Task RecognizeIntentAsync()
    {
    }
    
  5. In the body of this new method, add this code:

    // Creates an instance of a speech config with specified subscription key
    // and service region. Note that in contrast to other services supported by
    // the Cognitive Services Speech SDK, the Language Understanding service
    // requires a specific subscription key from https://www.luis.ai/.
    // The Language Understanding service calls the required key 'endpoint key'.
    // Once you've obtained it, replace with below with your own Language Understanding subscription key
    // and service region (e.g., "westus").
    // The default language is "en-us".
    var config = SpeechConfig.FromSubscription("YourLanguageUnderstandingSubscriptionKey", "YourLanguageUnderstandingServiceRegion");
    
    // Creates an intent recognizer using microphone as audio input.
    using (var recognizer = new IntentRecognizer(config))
    {
        // Creates a Language Understanding model using the app id, and adds specific intents from your model
        var model = LanguageUnderstandingModel.FromAppId("YourLanguageUnderstandingAppId");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName1", "id1");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName2", "id2");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName3", "any-IntentId-here");
    
        // Starts recognizing.
        Console.WriteLine("Say something...");
    
        // Starts intent recognition, and returns after a single utterance is recognized. The end of a
        // single utterance is determined by listening for silence at the end or until a maximum of 15
        // seconds of audio is processed.  The task returns the recognition text as result. 
        // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
        // shot recognition like command or query. 
        // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
        var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
    
        // Checks result.
        if (result.Reason == ResultReason.RecognizedIntent)
        {
            Console.WriteLine($"RECOGNIZED: Text={result.Text}");
            Console.WriteLine($"    Intent Id: {result.IntentId}.");
            Console.WriteLine($"    Language Understanding JSON: {result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)}.");
        }
        else if (result.Reason == ResultReason.RecognizedSpeech)
        {
            Console.WriteLine($"RECOGNIZED: Text={result.Text}");
            Console.WriteLine($"    Intent not recognized.");
        }
        else if (result.Reason == ResultReason.NoMatch)
        {
            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        }
        else if (result.Reason == ResultReason.Canceled)
        {
            var cancellation = CancellationDetails.FromResult(result);
            Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
            if (cancellation.Reason == CancellationReason.Error)
            {
                Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                Console.WriteLine($"CANCELED: Did you update the subscription info?");
            }
        }
    }
    
  6. Replace the placeholders in this method with your LUIS subscription key, region, and app ID as follows.

    Placeholder Replace with
    YourLanguageUnderstandingSubscriptionKey Your LUIS endpoint key. Again, you must get this item from your Azure dashboard, not a "starter key." You can find it on your app's Keys and Endpoints page (under Manage) in the LUIS portal.
    YourLanguageUnderstandingServiceRegion The short identifier for the region your LUIS subscription is in, such as westus for West US. See Regions.
    YourLanguageUnderstandingAppId The LUIS app ID. You can find it on your app's Settings page in the LUIS portal.

With these changes made, you can build (Control+Shift+B) and run (F5) the application. When you're prompted, try saying "Turn off the lights" into your PC's microphone. The application displays the result in the console window.

The following sections include a discussion of the code.

Create an intent recognizer

First, you need to create a speech configuration from your LUIS endpoint key and region. You can use speech configurations to create recognizers for the various capabilities of the Speech SDK. The speech configuration has multiple ways to specify the subscription you want to use; here, we use FromSubscription, which takes the subscription key and region.

Note

Use the key and region of your LUIS subscription, not of a Speech Services subscription.

Next, create an intent recognizer using new IntentRecognizer(config). Since the configuration already knows which subscription to use, you don't need to specify the subscription key and endpoint again when creating the recognizer.

Import a LUIS model and add intents

Now import the model from the LUIS app using LanguageUnderstandingModel.FromAppId() and add the LUIS intents that you wish to recognize via the recognizer's AddIntent() method. These two steps improve the accuracy of speech recognition by indicating words that the user is likely to use in their requests. You don't have to add all the app's intents if you don't need to recognize them all in your application.

To add intents, you must provide three arguments: the LUIS model (which has been created and is named model), the intent name, and an intent ID. The difference between the ID and the name is as follows.

AddIntent() argument Purpose
intentName The name of the intent as defined in the LUIS app. This value must match the LUIS intent name exactly.
intentID An ID assigned to a recognized intent by the Speech SDK. This value can be whatever you like; it doesn't need to correspond to the intent name as defined in the LUIS app. If multiple intents are handled by the same code, for instance, you could use the same ID for them.

The Home Automation LUIS app has two intents: one for turning on a device, and another for turning off a device. The lines below add these intents to the recognizer; replace the three AddIntent lines in the RecognizeIntentAsync() method with this code.

recognizer.AddIntent(model, "HomeAutomation.TurnOff", "off");
recognizer.AddIntent(model, "HomeAutomation.TurnOn", "on");

Instead of adding individual intents, you can also use the AddAllIntents method to add all the intents in a model to the recognizer.

Start recognition

With the recognizer created and the intents added, recognition can begin. The Speech SDK supports both single-shot and continuous recognition.

Recognition mode Methods to call Result
Single-shot RecognizeOnceAsync() Returns the recognized intent, if any, after one utterance.
Continuous StartContinuousRecognitionAsync()
StopContinuousRecognitionAsync()
Recognizes multiple utterances; emits events (for example, IntermediateResultReceived) when results are available.

The application uses single-shot mode and so calls RecognizeOnceAsync() to begin recognition. The result is an IntentRecognitionResult object containing information about the intent recognized. You extract the LUIS JSON response by using the following expression:

result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)

The application doesn't parse the JSON result. It only displays the JSON text in the console window.

Single LUIS recognition results

Specify recognition language

By default, LUIS recognizes intents in US English (en-us). By assigning a locale code to the SpeechRecognitionLanguage property of the speech configuration, you can recognize intents in other languages. For example, add config.SpeechRecognitionLanguage = "de-de"; in our application before creating the recognizer to recognize intents in German. For more information, see Supported Languages.

Continuous recognition from a file

The following code illustrates two additional capabilities of intent recognition using the Speech SDK. The first, previously mentioned, is continuous recognition, where the recognizer emits events when results are available. These events can then be processed by event handlers that you provide. With continuous recognition, you call the recognizer's StartContinuousRecognitionAsync() method to start recognition instead of RecognizeOnceAsync().

The other capability is reading the audio containing the speech to be processed from a WAV file. Implementation involves creating an audio configuration that can be used when creating the intent recognizer. The file must be single-channel (mono) with a sampling rate of 16 kHz.

To try out these features, delete or comment out the body of the RecognizeIntentAsync() method, and add the following code in its place.

// Creates an instance of a speech config with specified subscription key
// and service region. Note that in contrast to other services supported by
// the Cognitive Services Speech SDK, the Language Understanding service
// requires a specific subscription key from https://www.luis.ai/.
// The Language Understanding service calls the required key 'endpoint key'.
// Once you've obtained it, replace with below with your own Language Understanding subscription key
// and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourLanguageUnderstandingSubscriptionKey", "YourLanguageUnderstandingServiceRegion");

// Creates an intent recognizer using file as audio input.
// Replace with your own audio file name.
using (var audioInput = AudioConfig.FromWavFileInput("whatstheweatherlike.wav"))
{
    using (var recognizer = new IntentRecognizer(config, audioInput))
    {
        // The TaskCompletionSource to stop recognition.
        var stopRecognition = new TaskCompletionSource<int>();

        // Creates a Language Understanding model using the app id, and adds specific intents from your model
        var model = LanguageUnderstandingModel.FromAppId("YourLanguageUnderstandingAppId");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName1", "id1");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName2", "id2");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName3", "any-IntentId-here");

        // Subscribes to events.
        recognizer.Recognizing += (s, e) => {
            Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
        };

        recognizer.Recognized += (s, e) => {
            if (e.Result.Reason == ResultReason.RecognizedIntent)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                Console.WriteLine($"    Intent Id: {e.Result.IntentId}.");
                Console.WriteLine($"    Language Understanding JSON: {e.Result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)}.");
            }
            else if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                Console.WriteLine($"    Intent not recognized.");
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            }
        };

        recognizer.Canceled += (s, e) => {
            Console.WriteLine($"CANCELED: Reason={e.Reason}");

            if (e.Reason == CancellationReason.Error)
            {
                Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Console.WriteLine($"CANCELED: Did you update the subscription info?");
            }

            stopRecognition.TrySetResult(0);
        };

        recognizer.SessionStarted += (s, e) => {
            Console.WriteLine("\n    Session started event.");
        };

        recognizer.SessionStopped += (s, e) => {
            Console.WriteLine("\n    Session stopped event.");
            Console.WriteLine("\nStop recognition.");
            stopRecognition.TrySetResult(0);
        };


        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        Console.WriteLine("Say something...");
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });

        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    }
}

Revise the code to include your LUIS endpoint key, region, and app ID and to add the Home Automation intents, as before. Change whatstheweatherlike.wav to the name of your recorded audio file. Then build, copy the audio file to the build directory, and run the application.

For example, if you say "Turn off the lights", pause, and then say "Turn on the lights" in your recorded audio file, console output similar to the following may appear:

Audio file LUIS recognition results

Get the samples

For the latest samples, see the Cognitive Services Speech SDK sample code repository on GitHub.

Look for the code from this article in the samples/csharp/sharedcontent/console folder.

Next steps