Quickstart: Recognize speech with the Speech SDK for Unity (Beta)

Quickstarts are also available for text-to-speech.

Use this guide to create a speech-to-text application using Unity and the Speech SDK for Unity (Beta). When finished, you can talk into your device to transcribe speech to text in real time. If you're new to Unity, we suggest you study the Unity User Manual before developing your application.

Note

The Speech SDK for Unity is currently in beta. It supports Windows Desktop (x86 and x64) or Universal Windows Platform (x86, x64, ARM/ARM64), and Android (x86, ARM32/64).

Prerequisites

To complete this project, you'll need:

Create a Unity project

  1. Open Unity. If you're using Unity for the first time, the Unity Hub window appears. (You can also open Unity Hub directly to get to this window.)

    Unity Hub window

  2. Select New. The Create a new project with Unity window appears.

    Create a new project in Unity Hub

  3. In Project Name, enter csharp-unity.

  4. In Templates, if 3D isn't already selected, select it.

  5. In Location, select or create a folder to save the project in.

  6. Select Create.

After a bit of time, the Unity Editor window appears.

Install the Speech SDK

To install the Speech SDK for Unity, follow these steps:

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

  1. Download and open the Speech SDK for Unity (Beta), which is packaged as a Unity asset package (.unitypackage). When the asset package is opened, the Import Unity Package dialog box appears.

    Import Unity Package dialog box in the Unity Editor

  2. Ensure that all files are selected, and select Import. After a few moments, the Unity asset package is imported into your project.

For more information about importing asset packages into Unity, see the Unity documentation.

Add UI

Now let's add a minimal UI to our scene. This UI consists of a button to trigger speech recognition and a text field to display the result. In the Hierarchy window, a sample scene is shown that Unity created with the new project.

  1. At the top of the Hierarchy window, select Create > UI > Button.

    This action creates three game objects that you can see in the Hierarchy window: a Button object, a Canvas object containing the button, and an EventSystem object.

    Unity Editor environment

  2. Navigate the Scene view so you have a good view of the canvas and the button in the Scene view.

  3. In the Inspector window (by default on the right), set the Pos X and Pos Y properties to 0, so the button is centered in the middle of the canvas.

  4. In the Hierarchy window, select Create > UI > Text to create a Text object.

  5. In the Inspector window, set the Pos X and Pos Y properties to 0 and 120, and set the Width and Height properties to 240 and 120. These values ensure that the text field and the button don't overlap.

When you're done, the Scene view should look similar to this screenshot:

Scene view in the Unity Editor

Add the sample code

To add the sample script code for the Unity project, follow these steps:

  1. In the Project window, select Create > C# script to add a new C# script.

    Project window in the Unity Editor

  2. Name the script HelloWorld.

  3. Double-click HelloWorld to edit the newly created script.

    Note

    To configure the code editor to be used by Unity for editing, select Edit > Preferences, and then go to the External Tools preferences. For more information, see the Unity User Manual.

  4. Replace the existing script with the following code:

    using UnityEngine;
    using UnityEngine.UI;
    using Microsoft.CognitiveServices.Speech;
    #if PLATFORM_ANDROID
    using UnityEngine.Android;
    #endif
    
    public class HelloWorld : MonoBehaviour
    {
        // Hook up the two properties below with a Text and Button object in your UI.
        public Text outputText;
        public Button startRecoButton;
    
        private object threadLocker = new object();
        private bool waitingForReco;
        private string message;
    
        private bool micPermissionGranted = false;
    
    #if PLATFORM_ANDROID
        // Required to manifest microphone permission, cf.
        // https://docs.unity3d.com/Manual/android-manifest.html
        private Microphone mic;
    #endif
    
        public async void ButtonClick()
        {
            // Creates an instance of a speech config with specified subscription key and service region.
            // Replace with your own subscription key and service region (e.g., "westus").
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
            // Make sure to dispose the recognizer after use!
            using (var recognizer = new SpeechRecognizer(config))
            {
                lock (threadLocker)
                {
                    waitingForReco = true;
                }
    
                // Starts speech recognition, and returns after a single utterance is recognized. The end of a
                // single utterance is determined by listening for silence at the end or until a maximum of 15
                // seconds of audio is processed.  The task returns the recognition text as result.
                // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                // shot recognition like command or query.
                // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
    
                // Checks result.
                string newMessage = string.Empty;
                if (result.Reason == ResultReason.RecognizedSpeech)
                {
                    newMessage = result.Text;
                }
                else if (result.Reason == ResultReason.NoMatch)
                {
                    newMessage = "NOMATCH: Speech could not be recognized.";
                }
                else if (result.Reason == ResultReason.Canceled)
                {
                    var cancellation = CancellationDetails.FromResult(result);
                    newMessage = $"CANCELED: Reason={cancellation.Reason} ErrorDetails={cancellation.ErrorDetails}";
                }
    
                lock (threadLocker)
                {
                    message = newMessage;
                    waitingForReco = false;
                }
            }
        }
    
        void Start()
        {
            if (outputText == null)
            {
                UnityEngine.Debug.LogError("outputText property is null! Assign a UI Text element to it.");
            }
            else if (startRecoButton == null)
            {
                message = "startRecoButton property is null! Assign a UI Button to it.";
                UnityEngine.Debug.LogError(message);
            }
            else
            {
                // Continue with normal initialization, Text and Button objects are present.
    
    #if PLATFORM_ANDROID
                // Request to use the microphone, cf.
                // https://docs.unity3d.com/Manual/android-RequestingPermissions.html
                message = "Waiting for mic permission";
                if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
                {
                    Permission.RequestUserPermission(Permission.Microphone);
                }
    #else
                micPermissionGranted = true;
                message = "Click button to recognize speech";
    #endif
                startRecoButton.onClick.AddListener(ButtonClick);
            }
        }
    
        void Update()
        {
    #if PLATFORM_ANDROID
            if (!micPermissionGranted && Permission.HasUserAuthorizedPermission(Permission.Microphone))
            {
                micPermissionGranted = true;
                message = "Click button to recognize speech";
            }
    #endif
    
            lock (threadLocker)
            {
                if (startRecoButton != null)
                {
                    startRecoButton.interactable = !waitingForReco && micPermissionGranted;
                }
                if (outputText != null)
                {
                    outputText.text = message;
                }
            }
        }
    }
    
  5. Find and replace the string YourSubscriptionKey with your Speech Services subscription key.

  6. Find and replace the string YourServiceRegion with the region associated with your subscription. For example, if you're using the free trial, the region is westus.

  7. Save the changes to the script.

Now return to the Unity Editor and add the script as a component to one of your game objects:

  1. In the Hierarchy window, select the Canvas object.

  2. In the Inspector window, select the Add Component button.

    Inspector window in the Unity Editor

  3. In the drop-down list, search for the HelloWorld script we created above and add it. A Hello World (Script) section appears in the Inspector window, listing two uninitialized properties, Output Text and Start Reco Button. These Unity component properties match public properties of the HelloWorld class.

  4. Select the Start Reco Button property's object picker (the small circle icon to the right of the property), and choose the Button object you created earlier.

  5. Select the Output Text property's object picker, and choose the Text object you created earlier.

    Note

    The button also has a nested text object. Make sure you do not accidentally pick it for text output (or rename one of the text objects using the Name field in the Inspector window to avoid confusion).

Run the application in the Unity Editor

Now you're ready to run the application within the Unity Editor.

  1. In the Unity Editor toolbar (below the menu bar), select the Play button (a right-pointing triangle).

  2. Go to Game view, and wait for the Text object to display Click button to recognize speech. (It displays New Text when the application hasn't started or isn't ready to respond.)

  3. Select the button and speak an English phrase or sentence into your computer's microphone. Your speech is transmitted to the Speech Services and transcribed to text, which appears in the Game view.

    Game view in the Unity Editor

  4. Check the Console window for debug messages. If the Console window isn't showing, go to the menu bar and select Window > General > Console to display it.

  5. When you're done recognizing speech, select the Play button in the Unity Editor toolbar to stop the application.

Additional options to run this application

This application can also be deployed to as an Android app, a Windows stand-alone app, or a UWP application. For more information, see our sample repository. The quickstart/csharp-unity folder describes the configuration for these additional targets.

Next steps

See also