Speech Recognition - Exploring Grammar Based Recognition

Article
01/17/2012

In my two previous posts, I covered how to create speech recognition engines and use them to parse through WAV files containing a sample “Hello World” recording.

This post will focus on two things. First, simple real-time recognition with a simple hardcoded grammar and second, a way to let you dynamically improve the system by updating the grammar.

Configuring Windows to Enable Speech Recognition

The core Windows operating system has had built in speech recognition since Vista. You must enable the OS speech recognition in order for this tutorial to work.

On Windows 7:

Follow the instructions here: https://windows.microsoft.com/en-US/windows7/Set-up-your-microphone-for-Speech-Recognition
Important: Don’t forget what was chosen to enable speech recognition!
Once the wizard is complete, you can enable and disable speech using the method specified or with the UI that shows up:
Try it out by following the User Experience tutorial.

Listen Up: Writing a Basic Real-Time Speech Recognition WPF App

This is a simple modification on the great MSDN tutorial for the System.Speech namespace. It is an application that will listen for colors that are predefined in a Speech Choices object and will write the output to a text box if the color was correctly recognized. Inside of this app I will show how to:

Initialize a simple grammar based on a string array
Initialize a Speech Recognizer object, which enables basic real-time speech recognition
Wiring up a way to update the current grammar

And as always, the Windows SDK is a requirement. Please install it to follow along with this tutorial.

Step 1: Initialize the Simple Grammar and Speech Recognizer

First, launch Visual Studio and start with a blank WPF app. In the designer, add two text boxes, three labels and one button like so:

Set the larger text box to read only to get the best results from this tutorial. Once the UI objects have been created and laid out open up the window’s code behind file.

Next, add System.Speech to the project references:

Now, add following code to the Window constructor:

SpeechRecognizer sr;
List<String> colorList;
Choices colors;
public MainWindow()
{
InitializeComponent();

sr = new SpeechRecognizer();
colors = new Choices();
colorList = new List<string>() { "red", "yellow", "green", "blue" };
InitializeSpeechRecogonition();
}

Once that’s done, it is time to create the InitializeSpeechRecognizer and the LoadGrammar helper methods:

private void InitializeSpeechRecogonition()
{
//First, load the grammar then wire up the Speech Recognition events
LoadGrammar();
sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sr_SpeechRecognized);
}

private void LoadGrammar()
{
//Load up the Choices object with the contents of the Color list, populate the GrammarBuilder,
//create a Grammar with the Grammar builder helper and load it up into the SpeechRecognizer
colors.Add(colorList.ToArray());
GrammarBuilder grammarBuilder = new GrammarBuilder(colors);
Grammar testGrammar = new Grammar(grammarBuilder);
sr.LoadGrammar(testGrammar);

}

There are three objects of interest. The SpeechRecognizer object, Choices, and the Grammar. SpeechRecognizer hooks into the operating system’s shared recognizer but initializes its own engine to handle recognition events. The benefit of using SpeechRecognizer is that it makes it doesn’t require the developer to worry about the audio input, but it doesn’t allow for more advanced recognition scenarios.

Choices is a basic way to create simple grammars without having to define a detailed external grammar file. For this tutorial and basic scenarios requiring only simple recognition this is a great way to get started.

The Grammar object is instantiated from the GrammarBuilder, which will create a root rule for the string list that is contained within the Choices object. This is *essential* for the Speech Recognition engine. Without a grammar, there is nothing that the recognition engine can use to determine what was said.

Now that all the Speech Recognition objects have been wired up, set the Speech Recognized handler to write the output to the output text box:

void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
textBox1.Text = textBox1.Text + e.Result.Text + "\r\n";
}

Once all of this is completed, the app will listen for whatever colors you defined in the colorList. In this example those colors are red, yellow, green and blue.

Step 2: Getting the App to Listen and Listen Better

Launch the application and enable Windows Speech Recognition. Make sure that if you have to click outside of your application to enable recognition, you click back into the test app.

Now try it out! Speak into your microphone and say one of the words that exist in the list. A few things will happen:

The word that was said wasn’t recognized. Either it was the correct word and the engine couldn’t figure it out or a word that wasn’t in the list was said.
The word was recognized and showed up in the text box. Awesome!
A different word was said, but a word in the list was recognized instead.

This is an example of ‘bad’ recognition happening to me:

In this case of bad recognition, apparently the word I said and they way I said it was enough to recognize a word that was in the grammar.

Using the text box and button that were also added to the app, I am going to enable the user to improve the recognition experience by adding words to the list and reloading the grammar.

All you need to do is wire up the button’s click event and add the following code:

private void button1_Click(object sender, RoutedEventArgs e)
{
if (!String.IsNullOrEmpty(textBox2.Text))
{
colorList.Add(textBox2.Text);
LoadGrammar();
textBlock1.Text = "Added: " + textBox2.Text;
textBox2.Text = string.Empty;
}
}

Now, when users add a word and click the Add button, the Speech Recognizer’s grammar will be reloaded with the new word.

The grammar is updated and new words are recognized based on user input.

Summary

Try out other grammar scenarios. My previous two posts use a grammar file that can be modified and loaded up externally.

There are some useful tutorials and code examples over on MSDN. You can check them out here:

Getting Started With Speech Recognition: https://msdn.microsoft.com/en-us/library/hh361683.aspx
Exploring Speech Recognition from MSDN magazine (note - this was written in the Vista timeframe, so there are updates that apply only to Win7): https://msdn.microsoft.com/en-us/magazine/cc163663.aspx

Check out my other blog posts on the Speech Recognition Engine:

Speech 101 - Using C++: https://blogs.msdn.com/b/rlucero/archive/2011/12/12/speech-101-getting-the-computer-to-recognize-hello-world.aspx
Speech 101 - Using C#: https://blogs.msdn.com/b/rlucero/archive/2012/01/10/speech-101-part-2-using-c-to-recognize-hello-world.aspx

Special thanks to Steve Meyer for reviewing this post and suggesting the blog title!