Get started with the Speech Recognition API in JavaScript

You can develop applications that convert spoken audio to text by using the Speech Recognition API. The JavaScript client library uses the Speech Service WebSocket protocol, which allows you to talk and receive transcribed text simultaneously. This article helps you to get started with the Speech Recognition API in JavaScript.

Prerequisites

Subscribe to the Speech Recognition API, and get a free trial subscription key

The Speech API is part of Cognitive Services. You can get free trial subscription keys from the Cognitive Services subscription page. After you select the Speech API, select Get API Key to get the key. It returns a primary and secondary key. Both keys are tied to the same quota, so you can use either key.

Important

Get a subscription key. Before you can use Speech client libraries, you must have a subscription key.

Get started

In this section we will walk you through the necessary steps to load a sample HTML page. The sample is located in our github repository. You can open the sample directly from the repository, or open the sample from a local copy of the repository.

Note

Some browsers block microphone access on un-secure origin. So, it is recommended to host the 'sample'/'your app' on https to get it working on all supported browsers.

Open the sample directly

Acquire a subscription key as described above. Then open the link to the sample. This will load the page into your default browser (Rendered using htmlPreview).

Open the sample from a local copy

To try the sample locally, clone this repository:

git clone https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript

compile the TypeScript sources and bundle/browserfy them into a single JavaScript file (npm needs to be installed on your machine). Change into the root of the cloned repository and run the commands:

cd SpeechToText-WebSockets-Javascript && npm run bundle

Open samples\browser\Sample.html in your favorite browser.

Next steps

More information on how to include the SDK into your own webpage is available here.

Remarks

  • The Speech Recognition API supports three recognition modes. You can switch the mode by updating the Setup() function found in the Sample.html file. The sample sets the mode to Interactive by default. To change the mode, update the parameter SR.RecognitionMode.Interactive to another mode. For example, change the parameter to SR.RecognitionMode.Conversation.
  • For a complete list of supported languages, see Supported languages.