Getting started with Microsoft speech recognition in C# for .NET on Windows
This page shows how to develop a basic Windows application that uses Microsoft speech recognition API to convert spoken audio to text. Using the client library allows for real-time streaming, which means that when your client application sends audio to the service, it simultaneously and asynchronously receives partial recognition results back.
The C# desktop library can be used by developers who want to use Microsoft Speech Service from apps running on any device. To use the library, you need to install NuGet package Microsoft.ProjectOxford.SpeechRecognition-x86 for 32-bit platform and NuGet packageMicrosoft.ProjectOxford.SpeechRecognition-x64 for 64-bit platform. For client library API reference, see Microsoft Speech C# Desktop Library.
The following sections describe how to install, build, and run the C# sample application using C# desktop library.
The following example has been developed for Windows 8+ and .NET Framework 4.5+ using Visual Studio 2015, Community Edition.
Get the sample application
You may clone the sample from the Speech C# Desktop Library Sample repository.
Subscribe to speech recognition API and get a free trial subscription key
You must have a subscription key before using speech client libraries.
Microsoft Speech API is part of Microsoft Cognitive Services on Azure(previously Project Oxford). You can get free trial subscription keys from the Cognitive Services Subscription page. After you select the Speech API, click Get API Key to get the key. It returns a primary and secondary key. Both keys are tied to the same quota, so you may use either key.
If you want to use Recognition with intent, you also need to sign up Language Understanding Intelligent Service (LUIS).
Step 1: Install the example application
- Start Microsoft Visual Studio 2015 and click
- Browse to the folder where you saved the downloaded speech recognition API files. Click on
Windows, and then the
- Double-click to open the Visual Studio 2015 Solution (.sln) file named
SpeechToText-WPF-Samples.sln. This opens the solution in Visual Studio.
Step 2: Build the example application
- Press Ctrl+Shift+B, or click
Buildon the ribbon menu, then select
Step 3: run the example application
- After the build is complete, press F5 or click
Starton the ribbon menu to run the example.
Project Oxford Speech to Textwindow with the text edit box reading "Paste your subscription key here to start". Paste your subscription key into the text box as shown in below screenshot. You may choose to persist your subscription key on your PC or laptop by clicking the
Save Keybutton. When you want to delete the subscription key from the system, click
Delete Keyto remove it from your PC or laptop.
Speech Recognition Sourcechoose one of the six speech sources, which fall into two main input categories.
- Using your computer's microphone, or an attached microphone, to capture speech.
- Playing an audio file.
Each category has three recognition modes.
- ShortPhrase Mode: an utterance up to 15 seconds long. As data is sent to the server, the client receives multiple partial results and one final result with multiple N-best choices.
- LongDictation Mode: an utterance up to 2 minutes long. As data is sent to the server, the client receives multiple partial results and multiple final results, based on where the server indicates sentence pauses.
- Intent Detection: the server returns additional structured information about the speech input. To use Intent, you need to first train a model using LUIS.
There are sample audio files to be used with this example application. You find the files in the repository you downloaded with this example under
samples/SpeechRecognitionServiceExample folder. These example audio files run automatically if no other files are chosen when selecting the
Use wav file for Shortphrase mode or
Use wav file for Longdictation mode as your speech input. Currently only wav audio format is supported.
Partial Results Events: this event gets called every time when the Speech Service predicts what you might be saying - even before you finish speaking (if you are using
MicrophoneRecognitionClient) or have finished sending data (if you are using
Error Events: called when the service detects an Error.
Intent Events: called on "WithIntent" clients (only in ShortPhrase mode) after the final recognition result has been parsed into a structured JSON intent.
ShortPhrasemode, this event is called and returns n-best results after you finish speaking.
LongDictationmode, the event handler is called multiple times, based on where the service identifies sentence pauses.
- For each of the n-best choices, a confidence value and a few different forms of the recognized text are returned. For more information, see the output format page.
Event handlers are already pointed out in the code in form of code comments.
- Microsoft Speech Desktop Library Reference
- Get started with Microsoft speech recognition API in Java on Android
- Get started with Microsoft speech recognition API in Objective C on iOS
- Get started with Microsoft speech recognition API via REST