Quickstart: Speech recognition (HTML)

Article
08/31/2015

Use speech recognition to provide input, specify an action or command, and accomplish tasks in your Universal Windows app.

Note Voice commands and speech recognition are not supported by Windows Store apps in Windows 8 and Windows 8.1.

Speech recognition is made up of a speech runtime, recognition APIs for programming the runtime, ready-to-use grammars for dictation and web search, and a default system UI that helps users discover and use speech recognition features.

Objective: To learn how to enable speech recognition.

Prerequisites

If you're new to developing apps using JavaScript:

To complete this tutorial, have a look through these links to get familiar with the technologies discussed here:

Install Microsoft Visual Studio.
Get a developer license. For instructions, see Develop using Visual Studio 2013.
Create your first app using JavaScript.
Roadmap for Windows Store apps using JavaScript
Learn about events with Quickstart: adding HTML controls and handling events

User experience guidelines:

See Speech design guidelines for helpful tips on designing a useful and engaging speech-enabled app.

Instructions

1. Set up the audio feed

Ensure that your device has a microphone or the equivalent.

Set the Microphone device capability (DeviceCapability) in the App package manifest (package.appxmanifest file) to get access to the microphone’s audio feed. This allows the app to record audio from connected microphones.

See App capability declarations.

2. Recognize speech input

A constraint defines the words and phrases (vocabulary) that an app recognizes in speech input. Constraints are at the core of speech recognition and give your app great over the accuracy of speech recognition.

You can use various types of constraints when performing speech recognition:

Predefined grammars (SpeechRecognitionTopicConstraint).

Predefined dictation and web-search grammars provide speech recognition for your app without requiring you to author a grammar. When using these grammars, speech recognition is performed by a remote web service and the results are returned to the device.

The default free-text dictation grammar can recognize most words and phrases that a user can say in a particular language, and is optimized to recognize short phrases. The predefined dictation grammar is used if you don't specify any constraints for your SpeechRecognizer object. Free-text dictation is useful when you don't want to limit the kinds of things a user can say. Typical uses include creating notes or dictating the content for a message.

The web-search grammar, like a dictation grammar, contains a large number of words and phrases that a user might say. However, it is optimized to recognize terms that people typically use when searching the web.

Note Because predefined dictation and web-search grammars can be large, and because they are online (not on the device), performance might not be as fast as with a custom grammar installed on the device.

These predefined grammars can be used to recognize up to 10 seconds of speech input and require no authoring effort on your part. However, they do require connection to a network.

Important

To use web-service constraints, speech input and dictation support must be enabled in Settings by turning on the "Get to know me" option in the Settings -> Privacy -> Speech, inking, and typing page.

Open this settings page by calling Windows.System.Launcher.LaunchUriAsync(uri); where uri is defined as var uri = new Windows.Foundation.Uri("ms-settings:privacy-accounts");
Programmatic list constraints (SpeechRecognitionListConstraint).

Programmatic list constraints provide a lightweight approach to creating simple grammars using a list of words or phrases. A list constraint works well for recognizing short, distinct phrases. Explicitly specifying all words in a grammar also improves recognition accuracy, as the speech recognition engine must only process speech to confirm a match. The list can also be programmatically updated.

A list constraint consists of an array of strings that represents speech input that your app will accept for a recognition operation. You can create a list constraint in your app by creating a speech-recognition list-constraint object and passing an array of strings. Then add that object to the constraints collection of the recognizer. Recognition is successful when the speech recognizer recognizes any one of the strings in the array.
SRGS grammars (SpeechRecognitionGrammarFileConstraint).

An Speech Recognition Grammar Specification (SRGS) grammar is a static document that, unlike a programmatic list constraint, uses the XML format defined by the SRGS Version 1.0. An SRGS grammar provides the greatest control over the speech recognition experience by letting you capture multiple semantic meanings in a single recognition.
Voice command constraints (SpeechRecognitionVoiceCommandDefinitionConstraint)

Use a Voice Command Definition (VCD) XML file to define the commands that the user can say to initiate actions when activating your app. For more detail, see Quickstart: Voice commands.

Note Which type of constraint type you use depends on the complexity of the recognition experience you want to create. Any could be the best choice for a specific recognition task, and you might find uses for all types of constraints in your app.

To get started with constraints, see How to define custom recognition constraints.

The predefined Universal Windows app dictation grammar recognizes most words and short phrases in a language. It is activated by default when a speech recognizer object is instantiated without custom constraints.

In this example, we show how to:

Create a speech recognizer.
Compile the default Windows Phone constraints (no grammars have been added to the speech recognizer's grammar set).
Start listening for speech by using the basic recognition UI and TTS feedback provided by the RecognizeWithUIAsync method. Use the RecognizeAsync method if the default UI is not required.

function buttonSpeechRecognizerClick() {
    // Create an instance of SpeechRecognizer.
    var speechRecognizer =
      new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Compile the default dictation grammar.
    speechRecognizer.compileConstraintsAsync().done(
      // Success function.
      function (result) {
          // Start recognition.
          speechRecognizer.recognizeWithUIAsync().done(
            // Success function.
            function (speechRecognitionResult) {
                // Do something with the recognition result.
                var messageDialog =
                  new Windows.UI.Popups.MessageDialog(
                  speechRecognitionResult.text, "Text spoken");
                messageDialog.showAsync();
            },
            // Error function.
            function (err) {
                WinJS.log && WinJS.log("Speech recognition failed.");
            });
      },
      // Error function.
      function (err) {
          WinJS.log && WinJS.log("Constraint compilation failed.");
      });
    speechRecognizer.close();
}

3. Customize the recognition UI

When your app attempts speech recognition by calling SpeechRecognizer.RecognizeWithUIAsync, several screens are shown in the following order.

If you're using a constraint based on a predefined grammar (dictation or web search):

The Listening screen.
The Thinking screen.
The Heard you say screen or the error screen.

If you're using a constraint based on a list of words or phrases, or a constraint based on a SGRS grammar file:

The Listening screen.
The Did you say screen, if what the user said could be interpreted as more than one potential result.
The Heard you say screen or the error screen.

The following image shows an example of the flow between screens for a speech recognizer that uses a constraint based on a SGRS grammar file. In this example, speech recognition was successful.

Screens for a constraint based on a SGRS grammar file

The Listening screen can provide examples of words or phrases that the app can recognize. Here we show how to use the properties of the SpeechRecognizerUIOptions class (obtained by calling the SpeechRecognizer.UIOptions property) to customize content on the Listening screen.

function buttonSpeechRecognizerSRGSConstraintClick() {
    // Create an instance of SpeechRecognizer.
    var speechRecognizer =
      new Windows.Media.SpeechRecognition.SpeechRecognizer();

    speechRecognizer.uiOptions.audiblePrompt = "Say what you want to search for...";
    speechRecognizer.uiOptions.exampleText = "Ex. 'yes', 'no'";

    // Add a grammar file constraint to the recognizer.
    var uri = new Windows.Foundation.Uri("ms-appx:///data/srgs.grxml");
    var storageFile =
        Windows.Storage.StorageFile.getFileFromApplicationUriAsync(uri).then(
        // Success function.
        function (srgs) {
            var grammarfileConstraint =
                new Windows.Media.SpeechRecognition.SpeechRecognitionGrammarFileConstraint(srgs, "yesorno");
            speechRecognizer.constraints.append(grammarfileConstraint);
            // Compile the default dictation grammar.
            speechRecognizer.compileConstraintsAsync().then(
              // Success function.
              function (result) {
                  // Start recognition.
                  speechRecognizer.recognizeWithUIAsync().done(
                    // Success function.
                    function (speechRecognitionResult) {
                        // Do something with the recognition result.
                        var messageDialog =
                          new Windows.UI.Popups.MessageDialog(
                          speechRecognitionResult.text, "Text spoken");
                        messageDialog.showAsync();
                    },
                    // Error function.
                    function (err) {
                        WinJS.log && WinJS.log("Speech recognition failed.");
                    });
              },
              // Error function.
              function (err) {
                  WinJS.log && WinJS.log("Constraint compilation failed.");
              });
        },
        // Error function.
        function (err) {
            WinJS.log && WinJS.log("File retrieval failed.");
        });
    speechRecognizer.close();
}

Summary and next steps

Here, you learned how to implement basic speech recognition by using the predefined grammars and speech-recognition UI provided with Universal Windows app.

Next, you might want to know how to define custom recognition constraints and how to How to enable continuous dictation.

Responding to speech interactions

Designers

Speech design guidelines