Unity 中的語音輸入Voice input in Unity


除了下列資訊之外,請考慮使用適用于認知語音服務 SDK 的 Unity 外掛程式,其具有更好的語音精確度結果,並可讓您輕鬆存取語音轉換文字解碼和先進的語音功能,例如對話方塊、意圖型互動、轉譯、文字轉語音的合成和自然語言的語音辨識。Instead of the below information, consider using the Unity plug-in for the Cognitive Speech Services SDK which has much better Speech Accuracy results and provides easy access to speech-to-text decode and advanced speech features like dialog, intent based interaction, translation, text-to-speech synthesis and natural language speech recognition. 在這裡尋找範例和檔: https://docs.microsoft.com//azure/cognitive-services/speech-service/quickstart-csharp-unityFind the sample and documentaion here: https://docs.microsoft.com//azure/cognitive-services/speech-service/quickstart-csharp-unity

Unity 公開三種將 語音輸入 新增至 Unity 應用程式的方式。Unity exposes three ways to add Voice input to your Unity application.

KeywordRecognizer (兩種 PhraseRecognizers 類型的其中一種) ,您的應用程式可以指定要接聽的字串命令陣列。With the KeywordRecognizer (one of two types of PhraseRecognizers), your app can be given an array of string commands to listen for. GrammarRecognizer (PhraseRecognizer) 的另一種類型時,您的應用程式可以取得定義特定文法的 SRGS 檔案來接聽。With the GrammarRecognizer (the other type of PhraseRecognizer), your app can be given an SRGS file defining a specific grammar to listen for. 使用 DictationRecognizer 時,您的應用程式可以接聽任何單字,並提供使用者語音的備註或其他顯示。With the DictationRecognizer, your app can listen for any word and provide the user with a note or other display of their speech.


一次只能處理聽寫或片語辨識。Only dictation or phrase recognition can be handled at once. 這表示,如果 GrammarRecognizer 或 KeywordRecognizer 處於作用中狀態,DictationRecognizer 就無法作用,反之亦然。That means if a GrammarRecognizer or KeywordRecognizer is active, a DictationRecognizer can not be active and vice versa.

啟用語音的功能Enabling the capability for Voice

必須為應用程式宣告 麥克風 功能,才能使用語音輸入。The Microphone capability must be declared for an app to use Voice input.

  1. 在 Unity 編輯器中,流覽至 [> Player 編輯 > 專案設定],移至播放機設定In the Unity Editor, go to the player settings by navigating to "Edit > Project Settings > Player"
  2. 選取 [Windows 存放區] 索引標籤Select on the "Windows Store" tab
  3. 在 [發佈設定 > 功能] 區段中,檢查 麥克風 功能In the "Publishing Settings > Capabilities" section, check the Microphone capability

片語辨識Phrase Recognition

若要讓您的應用程式接聽使用者所說的特定片語,請採取一些動作,您必須:To enable your app to listen for specific phrases spoken by the user then take some action, you need to:

  1. 使用 KeywordRecognizer 或 GrammarRecognizer 指定要接聽的片語Specify which phrases to listen for using a KeywordRecognizer or GrammarRecognizer
  2. 處理 OnPhraseRecognized 事件,並採取對應至已辨識片語的動作Handle the OnPhraseRecognized event and take action corresponding to the phrase recognized


命名空間: UnityEngineNamespace: UnityEngine.Windows.Speech
類型: KeywordRecognizerPhraseRecognizedEventArgsSpeechErrorSpeechSystemStatusTypes: KeywordRecognizer, PhraseRecognizedEventArgs, SpeechError, SpeechSystemStatus

我們需要一些 using 語句來節省一些按鍵:We'll need a few using statements to save some keystrokes:

using UnityEngine.Windows.Speech;
using System.Collections.Generic;
using System.Linq;

然後讓我們將幾個欄位新增至您的類別,以儲存辨識器和關鍵字 >動作字典:Then let's add a few fields to your class to store the recognizer and keyword->action dictionary:

KeywordRecognizer keywordRecognizer;
Dictionary<string, System.Action> keywords = new Dictionary<string, System.Action>();

現在將關鍵字加入至字典,例如,在 Start ( # A1 方法中。Now add a keyword to the dictionary, for example in of a Start() method. 在此範例中,我們要新增 "activate" 關鍵字:We're adding the "activate" keyword in this example:

//Create keywords for keyword recognizer
keywords.Add("activate", () =>
    // action to be performed when this keyword is spoken

建立關鍵字辨識器並告訴它要辨識的內容:Create the keyword recognizer and tell it what we want to recognize:

keywordRecognizer = new KeywordRecognizer(keywords.Keys.ToArray());

現在報名 OnPhraseRecognized 事件Now register for the OnPhraseRecognized event

keywordRecognizer.OnPhraseRecognized += KeywordRecognizer_OnPhraseRecognized;

範例處理常式如下:An example handler is:

private void KeywordRecognizer_OnPhraseRecognized(PhraseRecognizedEventArgs args)
    System.Action keywordAction;
    // if the keyword recognized is in our dictionary, call that Action.
    if (keywords.TryGetValue(args.text, out keywordAction))

最後,開始辨識!Finally, start recognizing!



命名空間: UnityEngineNamespace: UnityEngine.Windows.Speech
類型GrammarRecognizerPhraseRecognizedEventArgsSpeechErrorSpeechSystemStatusTypes: GrammarRecognizer, PhraseRecognizedEventArgs, SpeechError, SpeechSystemStatus

如果您要使用 SRGS 指定辨識文法,則會使用 GrammarRecognizer。The GrammarRecognizer is used if you're specifying your recognition grammar using SRGS. 如果您的應用程式有多個關鍵字,如果您想要辨識更複雜的片語,或是想要輕鬆開啟和關閉命令集,這會很有用。This can be useful if your app has more than just a few keywords, if you want to recognize more complex phrases, or if you want to easily turn on and off sets of commands. 請參閱: 使用 SRGS XML 建立 檔案格式資訊的文法。See: Create Grammars Using SRGS XML for file format information.

一旦您有 SRGS 文法,而且它位於 StreamingAssets 資料夾的專案中:Once you have your SRGS grammar, and it is in your project in a StreamingAssets folder:


建立 GrammarRecognizer,並將路徑傳遞至您的 SRGS 檔案:Create a GrammarRecognizer and pass it the path to your SRGS file:

private GrammarRecognizer grammarRecognizer;
grammarRecognizer = new GrammarRecognizer(Application.streamingDataPath + "/SRGS/myGrammar.xml");

現在報名 OnPhraseRecognized 事件Now register for the OnPhraseRecognized event

grammarRecognizer.OnPhraseRecognized += grammarRecognizer_OnPhraseRecognized;

您將會取得一個回呼,其中包含您可以適當處理的 SRGS 文法中指定的資訊。You'll get a callback containing information specified in your SRGS grammar, which you can handle appropriately. 大部分的重要資訊都會在 semanticMeanings 陣列中提供。Most of the important information will be provided in the semanticMeanings array.

private void Grammar_OnPhraseRecognized(PhraseRecognizedEventArgs args)
    SemanticMeaning[] meanings = args.semanticMeanings;
    // do something

最後,開始辨識!Finally, start recognizing!



命名空間: UnityEngineNamespace: UnityEngine.Windows.Speech
類型DictationRecognizerSpeechErrorSpeechSystemStatusTypes: DictationRecognizer, SpeechError, SpeechSystemStatus

使用 DictationRecognizer 將使用者的語音轉換成文字。Use the DictationRecognizer to convert the user's speech to text. DictationRecognizer 會公開 聽寫 功能,並支援註冊和接聽假設和片語完成的事件,讓您可以在使用者說話和之後,將意見反應提供給您的使用者。The DictationRecognizer exposes dictation functionality and supports registering and listening for hypothesis and phrase completed events, so you can give feedback to your user both while they speak and afterwards. 開始 ( # A1 並停止 ( # A3 方法,分別啟用和停用聽寫辨識。Start() and Stop() methods respectively enable and disable dictation recognition. 完成辨識器之後,應該使用 Dispose ( # A1 方法來處置它所使用的資源。Once done with the recognizer, it should be disposed using Dispose() method to release the resources it uses. 如果未在這之前釋出這些資源,它會在垃圾收集期間自動釋放這些資源。It will release these resources automatically during garbage collection at an additional performance cost if they aren't released before that.

開始使用聽寫只需要幾個步驟:There are only a few steps needed to get started with dictation:

  1. 建立新的 DictationRecognizerCreate a new DictationRecognizer
  2. 處理聽寫事件Handle Dictation events
  3. 啟動 DictationRecognizerStart the DictationRecognizer

啟用聽寫功能Enabling the capability for dictation

您必須為應用程式宣告「網際網路用戶端」功能(以及上面所述的「麥克風」功能),以利用聽寫。The "Internet Client" capability, along with the "Microphone" capability mentioned above, must be declared for an app to leverage dictation.

  1. 在 Unity 編輯器中,流覽至 [> Player 編輯 > 專案設定] 頁面,移至播放機設定In the Unity Editor, go to the player settings by navigating to "Edit > Project Settings > Player" page
  2. 選取 [Windows 存放區] 索引標籤Select on the "Windows Store" tab
  3. 在 [發佈設定 > 功能] 區段中,檢查 InternetClient 功能In the "Publishing Settings > Capabilities" section, check the InternetClient capability


建立 DictationRecognizer,如下所示:Create a DictationRecognizer like so:

dictationRecognizer = new DictationRecognizer();

有四個聽寫事件可供訂閱和處理,以實行聽寫行為。There are four dictation events that can be subscribed to and handled to implement dictation behavior.

  1. DictationResultDictationResult
  2. DictationCompleteDictationComplete
  3. DictationHypothesisDictationHypothesis
  4. DictationErrorDictationError


此事件會在使用者暫停(通常是在句子結尾)時引發。This event is fired after the user pauses, typically at the end of a sentence. 在這裡會傳回完整的可辨識字串。The full recognized string is returned here.

首先,訂閱 DictationResult 事件:First, subscribe to the DictationResult event:

dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;

然後處理 DictationResult 回呼:Then handle the DictationResult callback:

private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
    // do something


此事件會在使用者進行交談時持續引發。This event is fired continuously while the user is talking. 當辨識器接聽時,它會提供到目前為止所聽過的文字。As the recognizer listens, it provides text of what it's heard so far.

首先,訂閱 DictationHypothesis 事件:First, subscribe to the DictationHypothesis event:

dictationRecognizer.DictationHypothesis += DictationRecognizer_DictationHypothesis;

然後處理 DictationHypothesis 回呼:Then handle the DictationHypothesis callback:

private void DictationRecognizer_DictationHypothesis(string text)
    // do something


此事件會在辨識器停止時引發,不論是從停止 ( # A1 呼叫、發生超時或其他錯誤。This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.

首先,訂閱 DictationComplete 事件:First, subscribe to the DictationComplete event:

dictationRecognizer.DictationComplete += DictationRecognizer_DictationComplete;

然後處理 DictationComplete 回呼:Then handle the DictationComplete callback:

private void DictationRecognizer_DictationComplete(DictationCompletionCause cause)
   // do something


發生錯誤時,就會引發此事件。This event is fired when an error occurs.

首先,訂閱 DictationError 事件:First, subscribe to the DictationError event:

dictationRecognizer.DictationError += DictationRecognizer_DictationError;

然後處理 DictationError 回呼:Then handle the DictationError callback:

private void DictationRecognizer_DictationError(string error, int hresult)
    // do something

當您訂閱並處理您所關心的聽寫事件之後,請啟動聽寫辨識器以開始接收事件。Once you've subscribed and handled the dictation events that you care about, start the dictation recognizer to begin receiving events.


如果您不想再保留 DictationRecognizer,則需要取消訂閱事件並處置 DictationRecognizer。If you no longer want to keep the DictationRecognizer around, you need to unsubscribe from the events and Dispose the DictationRecognizer.

dictationRecognizer.DictationResult -= DictationRecognizer_DictationResult;
dictationRecognizer.DictationComplete -= DictationRecognizer_DictationComplete ;
dictationRecognizer.DictationHypothesis -= DictationRecognizer_DictationHypothesis ;
dictationRecognizer.DictationError -= DictationRecognizer_DictationError ;


  • 開始 ( # A1 並停止 ( # A3 方法,分別啟用和停用聽寫辨識。Start() and Stop() methods respectively enable and disable dictation recognition.
  • 完成辨識器之後,必須使用 Dispose ( # A1 方法處置,以釋放其使用的資源。Once done with the recognizer, it must be disposed using Dispose() method to release the resources it uses. 如果未在這之前釋出這些資源,它會在垃圾收集期間自動釋放這些資源。It will release these resources automatically during garbage collection at an additional performance cost if they aren't released before that.
  • 在設定的一段時間後發生超時。Timeouts occur after a set period of time. 您可以檢查 DictationComplete 事件中的這些超時。You can check for these timeouts in the DictationComplete event. 有兩個需要注意的超時:There are two timeouts to be aware of:
    1. 如果辨識器啟動,但在前五秒沒有聽到任何音訊,則會超時。If the recognizer starts and doesn't hear any audio for the first five seconds, it will time out.
    2. 如果辨識器已指定結果,但接著會聽到20秒的無回應,則會超時。If the recognizer has given a result, but then hears silence for 20 seconds, it will time out.

使用片語辨識和聽寫Using both Phrase Recognition and Dictation

如果您想要在您的應用程式中同時使用片語辨識和聽寫,您必須完全關閉,才能啟動另一個。If you want to use both phrase recognition and dictation in your app, you'll need to fully shut one down before you can start the other. 如果您有多個 KeywordRecognizers 正在執行,您可以將它們一次全部關閉:If you have multiple KeywordRecognizers running, you can shut them all down at once with:


為了將所有辨識器還原至先前的狀態,在 DictationRecognizer 停止之後,您可以呼叫:In order to restore all recognizers to their previous state, after the DictationRecognizer has stopped, you can call:


您也可以直接啟動 KeywordRecognizer,這也會重新開機 PhraseRecognitionSystem。You could also just start a KeywordRecognizer, which will restart the PhraseRecognitionSystem as well.

使用麥克風協助程式Using the microphone helper

GitHub 上的混合現實工具組包含麥克風協助程式類別,可在系統上有可用的麥克風時提示開發人員。The Mixed Reality Toolkit on GitHub contains a microphone helper class to hint at developers if there's a usable microphone on the system. 其中一個用途是要檢查系統上是否有麥克風,然後才在應用程式中顯示任何語音互動提示。One use for it's where one would want to check if there's microphone on system before showing any speech interaction hints in the application.

您可以在 [輸入/腳本/公用程式] 資料夾中找到麥克風協助程式腳本。The microphone helper script can be found in the Input/Scripts/Utilities folder. GitHub 存放庫也包含示範如何使用 helper 的 小型範例The GitHub repo also contains a small sample demonstrating how to use the helper.

混合現實工具組中的語音輸入Voice input in Mixed Reality Toolkit

您可以在此場景中找到語音輸入的範例。You can find the examples of the voice input in this scene.

下一個開發檢查點Next Development Checkpoint

如果您要遵循我們所配置的 Unity 開發檢查點旅程,您接下來要探索混合現實平臺功能和 Api:If you're following the Unity development checkpoint journey we've laid out, you're next task is exploring the Mixed Reality platform capabilities and APIs:

您可以隨時回到 Unity 開發檢查點You can always go back to the Unity development checkpoints at any time.