Quickstart: Recognize speech with the Speech SDK for Unity (Beta)

Use this guide to create a speech-to-text application using Unity and the Speech SDK for Unity (Beta). When finished, you can use your computer's microphone to transcribe speech to text in real time. If you are not familiar with Unity, it is recommended to study the Unity User Manual before starting your application development.

Note

The Speech SDK for Unity is currently in beta. It supports Windows Desktop (x86 and x64) or Universal Windows Platform (x86, x64, ARM/ARM64), and Android (x86, ARM32/64).

Prerequisites

To complete this project, you'll need:

Create a Unity project

  • Start Unity and under the Projects tab select New.
  • Specify Project name as csharp-unity, Template as 3D and pick a location. Then select Create project.
  • After a bit of time, the Unity Editor window should pop up.

Install the Speech SDK

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

  • The Speech SDK for Unity (Beta) is packaged as a Unity asset package (.unitypackage). Download it from here.

  • Import the Speech SDK by selecting Assets > Import Package > Custom Package. Check out the Unity documentation for details.

  • In the file picker, select the Speech SDK .unitypackage file that you downloaded above.

  • Ensure that all files are selected and click Import:

    Screenshot of the Unity Editor when importing the Speech SDK Unity asset package

Add UI

We add a minimal UI to our scene, consisting of a button to trigger speech recognition and a text field to display the result.

  • In the Hierarchy Window (by default on the left), a sample scene is shown that Unity created with the new project.
  • Click the Create button at the top of the Hierarchy Window, and select UI > Button.
  • This creates three game objects that you can see in the Hierarchy Window: a Button object nested within a Canvas object, and an EventSystem object.
  • Navigate the Scene View so you have a good view of the canvas and the button in the Scene View.
  • Click the Button object in the Hierarchy Window to display its settings in the Inspector Window (by default on the right).
  • Set the Pos X and Pos Y properties to 0, so the button is centered in the middle of the canvas.
  • Click the Create button at the top of the Hierarchy Window again, and select UI > Text to create a text field.
  • Click the Text object in the Hierarchy Window to display its settings in the Inspector Window (by default on the right).
  • Set the Pos X and Pos Y properties to 0 and 120, and set the Width and Height properties to 240 and 120 to ensure that the text field and the button do not overlap.

When you're done, the UI should look similar to this screenshot:

Screenshot of the quickstart user interface in the Unity Editor

Add the sample code

  1. In the Project Window (by default on the left bottom), click the Create button and then select C# script. Name the script HelloWorld.

  2. Edit the script by double-clicking it.

    Note

    You can configure which code editor will be launched under Edit > Preferences, see the Unity User Manual.

  3. Replace all code with the following:

    using UnityEngine;
    using UnityEngine.UI;
    using Microsoft.CognitiveServices.Speech;
    #if PLATFORM_ANDROID
    using UnityEngine.Android;
    #endif
    
    public class HelloWorld : MonoBehaviour
    {
        // Hook up the two properties below with a Text and Button object in your UI.
        public Text outputText;
        public Button startRecoButton;
    
        private object threadLocker = new object();
        private bool waitingForReco;
        private string message;
    
        private bool micPermissionGranted = false;
    
    #if PLATFORM_ANDROID
        // Required to manifest microphone permission, cf.
        // https://docs.unity3d.com/Manual/android-manifest.html
        private Microphone mic;
    #endif
    
        public async void ButtonClick()
        {
            // Creates an instance of a speech config with specified subscription key and service region.
            // Replace with your own subscription key and service region (e.g., "westus").
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
            // Make sure to dispose the recognizer after use!
            using (var recognizer = new SpeechRecognizer(config))
            {
                lock (threadLocker)
                {
                    waitingForReco = true;
                }
    
                // Starts speech recognition, and returns after a single utterance is recognized. The end of a
                // single utterance is determined by listening for silence at the end or until a maximum of 15
                // seconds of audio is processed.  The task returns the recognition text as result.
                // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                // shot recognition like command or query.
                // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
    
                // Checks result.
                string newMessage = string.Empty;
                if (result.Reason == ResultReason.RecognizedSpeech)
                {
                    newMessage = result.Text;
                }
                else if (result.Reason == ResultReason.NoMatch)
                {
                    newMessage = "NOMATCH: Speech could not be recognized.";
                }
                else if (result.Reason == ResultReason.Canceled)
                {
                    var cancellation = CancellationDetails.FromResult(result);
                    newMessage = $"CANCELED: Reason={cancellation.Reason} ErrorDetails={cancellation.ErrorDetails}";
                }
    
                lock (threadLocker)
                {
                    message = newMessage;
                    waitingForReco = false;
                }
            }
        }
    
        void Start()
        {
            if (outputText == null)
            {
                UnityEngine.Debug.LogError("outputText property is null! Assign a UI Text element to it.");
            }
            else if (startRecoButton == null)
            {
                message = "startRecoButton property is null! Assign a UI Button to it.";
                UnityEngine.Debug.LogError(message);
            }
            else
            {
                // Continue with normal initialization, Text and Button objects are present.
    
    #if PLATFORM_ANDROID
                // Request to use the microphone, cf.
                // https://docs.unity3d.com/Manual/android-RequestingPermissions.html
                message = "Waiting for mic permission";
                if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
                {
                    Permission.RequestUserPermission(Permission.Microphone);
                }
    #else
                micPermissionGranted = true;
                message = "Click button to recognize speech";
    #endif
                startRecoButton.onClick.AddListener(ButtonClick);
            }
        }
    
        void Update()
        {
    #if PLATFORM_ANDROID
            if (!micPermissionGranted && Permission.HasUserAuthorizedPermission(Permission.Microphone))
            {
                micPermissionGranted = true;
                message = "Click button to recognize speech";
            }
    #endif
    
            lock (threadLocker)
            {
                if (startRecoButton != null)
                {
                    startRecoButton.interactable = !waitingForReco && micPermissionGranted;
                }
                if (outputText != null)
                {
                    outputText.text = message;
                }
            }
        }
    }
    
  4. Locate and replace the string YourSubscriptionKey with your Speech Services subscription key.

  5. Locate and replace the string YourServiceRegion with the region associated with your subscription. For example, if you're using the free trial, the region is westus.

  6. Save the changes to the script.

  7. Back in the Unity Editor, the script needs to be added as a component to one of your game objects.

    • Click on the Canvas object in the Hierarchy Window. This opens up the setting in the Inspector Window (by default on the right).

    • Click the Add Component button in the Inspector Window, then search for the HelloWorld script we create above and add it.

    • Note that the Hello World component has two uninitialized properties, Output Text and Start Reco Button, that match public properties of the HelloWorld class. To wire them up, click the Object Picker (the small circle icon to the right of the property), and choose the text and button objects you created earlier.

      Note

      The button also has a nested text object. Make sure you do not accidentally pick it for text output (or rename one of the text objects using the Name field in the Inspector Window to avoid that confusion).

Run the application in the Unity Editor

  • Press the Play button in the Unity Editor toolbar (below the menu bar).

  • After the app launches, click the button and speak an English phrase or sentence into your computer's microphone. Your speech is transmitted to the Speech Services and transcribed to text, which appears in the window.

    Screenshot of the running quickstart in the Unity Game Window

  • Check the Console Window for debug messages.

  • When you're done recognizing speech, click the Play button in the Unity Editor toolbar to stop the app.

Additional options to run this application

This application can also be deployed to Android, as a Windows stand-alone app, or UWP application. Refer to our sample repository in the quickstart/csharp-unity folder that describes the configuration for these additional targets.

Next steps

See also