连续听写Continuous dictation

了解如何捕获和识别较长的连续听写语音输入。Learn how to capture and recognize long-form, continuous dictation speech input.

重要 APISpeechContinuousRecognitionSessionContinuousRecognitionSessionImportant APIs: SpeechContinuousRecognitionSession, ContinuousRecognitionSession

语音识别中,你已了解如何使用 SpeechRecognizer 对象的 RecognizeAsyncRecognizeWithUIAsync 方法捕获和识别相对较短的语音输入。例如,撰写短信 (SMS) 或进行提问时。In Speech recognition, you learned how to capture and recognize relatively short speech input using the RecognizeAsync or RecognizeWithUIAsync methods of a SpeechRecognizer object, for example, when composing a short message service (SMS) message or when asking a question.

对于较长的连续语音识别会话(例如听写或电子邮件),则使用 SpeechRecognizerContinuousRecognitionSession 属性以获取 SpeechContinuousRecognitionSession 对象。For longer, continuous speech recognition sessions, such as dictation or email, use the ContinuousRecognitionSession property of a SpeechRecognizer to obtain a SpeechContinuousRecognitionSession object.

备注

听写语言支持取决于运行应用的 设备Dictation language support depends on the device where your app is running. 对于 Pc 和笔记本电脑,只能识别 en-us,而 Xbox 和手机可以识别语音识别支持的所有语言。For PCs and laptops, only en-US is recognized, while Xbox and phones can recognize all languages supported by speech recognition. 有关详细信息,请参阅 指定语音识别器语言For more info, see Specify the speech recognizer language.

设置Set up

若要管理连续听写会话,你的应用需要几个对象:Your app needs a few objects to manage a continuous dictation session:

  • SpeechRecognizer 对象的示例。An instance of a SpeechRecognizer object.
  • 对要在听写期间更新 UI 的 UI 调度程序的引用。A reference to a UI dispatcher to update the UI during dictation.
  • 用于跟踪用户累积说出的字词的方式。A way to track the accumulated words spoken by the user.

此处,我们将一个 SpeechRecognizer 实例声明为代码隐藏类的私有字段。Here, we declare a SpeechRecognizer instance as a private field of the code-behind class. 如果你希望连续听写在单个可扩展应用程序标记语言 (XAML) 页面之后持续,则你的应用需要将引用存储在其他位置。Your app needs to store a reference elsewhere if you want continuous dictation to persist beyond a single Extensible Application Markup Language (XAML) page.

private SpeechRecognizer speechRecognizer;

在听写期间,识别器会从后台线程引发事件。During dictation, the recognizer raises events from a background thread. 由于后台线程不能直接在 XAML 中更新 UI,你的应用必须使用调度程序才能更新 UI 以响应识别事件。Because a background thread cannot directly update the UI in XAML, your app must use a dispatcher to update the UI in response to recognition events.

此处,我们声明一个私有字段,它将在之后通过 UI 调度程序进行初始化。Here, we declare a private field that will be initialized later with the UI dispatcher.

// Speech events may originate from a thread other than the UI thread.
// Keep track of the UI thread dispatcher so that we can update the
// UI in a thread-safe manner.
private CoreDispatcher dispatcher;

若要跟踪用户说出的内容,你需要处理由语音识别器所引发的识别事件。To track what the user is saying, you need to handle recognition events raised by the speech recognizer. 这些事件提供用户话语块的识别结果。These events provide the recognition results for chunks of user utterances.

此处,我们使用 StringBuilder 对象保留在会话期间获取的所有识别结果。Here, we use a StringBuilder object to hold all the recognition results obtained during the session. 新结果将在处理后追加到 StringBuilderNew results are appended to the StringBuilder as they are processed.

private StringBuilder dictatedTextBuilder;

初始化Initialization

在连续语音识别初始化期间,你必须:During the initialization of continuous speech recognition, you must:

  • 提取 UI 线程的调度程序(如果在连续识别事件处理程序中更新你的应用的 UI)。Fetch the dispatcher for the UI thread if you update the UI of your app in the continuous recognition event handlers.
  • 初始化语音识别器。Initialize the speech recognizer.
  • 编译内置的听写语法。Compile the built-in dictation grammar. 注意   语音识别至少需要一个约束才能定义可识别词汇。Note   Speech recognition requires at least one constraint to define a recognizable vocabulary. 如果未指定任何约束,将使用预定义的听写语法。If no constraint is specified, a predefined dictation grammar is used. 请参阅语音识别See Speech recognition.
  • 为识别事件设置事件侦听器。Set up the event listeners for recognition events.

在此示例中,我们将在 OnNavigatedTo 页面事件中初始化语音识别。In this example, we initialize speech recognition in the OnNavigatedTo page event.

  1. 因为由语音识别器引发的事件在后台线程上发生,所以请创建一个对调度程序的引用以更新 UI 线程。Because events raised by the speech recognizer occur on a background thread, create a reference to the dispatcher for updates to the UI thread. OnNavigatedTo 始终在 UI 线程上调用。OnNavigatedTo is always invoked on the UI thread.
this.dispatcher = CoreWindow.GetForCurrentThread().Dispatcher;
  1. 然后,我们初始化 SpeechRecognizer 实例。We then initialize the SpeechRecognizer instance.
this.speechRecognizer = new SpeechRecognizer();
  1. 我们再添加和编译语法,该语法定义所有可通过 SpeechRecognizer 识别的字词和短语。We then add and compile the grammar that defines all of the words and phrases that can be recognized by the SpeechRecognizer.

    如果未显式指定语法,则默认使用预定义听写语法。If you don't specify a grammar explicitly, a predefined dictation grammar is used by default. 通常,默认语法最适用于常规听写。Typically, the default grammar is best for general dictation.

    此处,我们立即调用 CompileConstraintsAsync 而无需添加语法。Here, we call CompileConstraintsAsync immediately without adding a grammar.

SpeechRecognitionCompilationResult result =
      await speechRecognizer.CompileConstraintsAsync();

处理识别事件Handle recognition events

你可以通过调用 RecognizeAsyncRecognizeWithUIAsync 捕获单一、简要的话语或短语。You can capture a single, brief utterance or phrase by calling RecognizeAsync or RecognizeWithUIAsync.

但是,为了捕获较长的连续识别会话,我们将指定在用户说话时要在后台运行的事件侦听器,并定义处理程序以构建听写字符串。However, to capture a longer, continuous recognition session, we specify event listeners to run in the background as the user speaks and define handlers to build the dictation string.

然后,我们使用识别器的 ContinuousRecognitionSession 属性来获取 SpeechContinuousRecognitionSession 对象,该对象提供用于管理连续识别会话的方法和事件。We then use the ContinuousRecognitionSession property of our recognizer to obtain a SpeechContinuousRecognitionSession object that provides methods and events for managing a continuous recognition session.

两个事件尤其关键:Two events in particular are critical:

  • ResultGenerated,在识别器已生成一些结果时发生。ResultGenerated, which occurs when the recognizer has generated some results.
  • Completed,在连续识别会话已结束时发生。Completed, which occurs when the continuous recognition session has ended.

当用户说话时引发 ResultGenerated 事件。The ResultGenerated event is raised as the user speaks. 识别器持续侦听用户,并定期引发一个传递语音输入块的事件。The recognizer continuously listens to the user and periodically raises an event that passes a chunk of speech input. 你必须使用事件参数的 Result 属性检查语音输入,并在事件处理程序中采取相应操作,例如将文本追加到 StringBuilder 对象。You must examine the speech input, using the Result property of the event argument, and take appropriate action in the event handler, such as appending the text to a StringBuilder object.

作为 SpeechRecognitionResult 的实例,Result 属性可用于确定是否希望接受语音输入。As an instance of SpeechRecognitionResult, the Result property is useful for determining whether you want to accept the speech input. SpeechRecognitionResult 为此提供了两个属性:A SpeechRecognitionResult provides two properties for this:

  • Status 指示识别是否成功。Status indicates whether the recognition was successful. 识别失败的原因有多种。Recognition can fail for a variety of reasons.
  • Confidence 指示识别器正确理解字词的相对置信度。Confidence indicates the relative confidence that the recognizer understood the correct words.

下面是支持连续识别的基本步骤:Here are the basic steps for supporting continuous recognition:

  1. 此处,我们在 OnNavigatedTo 页面事件中注册 ResultGenerated 连续识别事件的处理程序。Here, we register the handler for the ResultGenerated continuous recognition event in the OnNavigatedTo page event.
speechRecognizer.ContinuousRecognitionSession.ResultGenerated +=
        ContinuousRecognitionSession_ResultGenerated;
  1. 然后检查 Confidence 属性。We then check the Confidence property. 如果 Confidence 的值是 Medium 或更好,我们便将文本追加到 StringBuilder。If the value of Confidence is Medium or better, we append the text to the StringBuilder. 我们还在收集输入时更新 UI。We also update the UI as we collect input.

    注意   在无法直接更新 UI 的后台线程上引发ResultGenerated事件。Note  the ResultGenerated event is raised on a background thread that cannot update the UI directly. 如果处理程序需要更新 UI (因为 [) 语音和 TTS 示例 ] ,则必须通过调度程序的 RunAsync 方法将更新调度到 UI 线程。If a handler needs to update the UI (as the [Speech and TTS sample] does), you must dispatch the updates to the UI thread through the RunAsync method of the dispatcher.

private async void ContinuousRecognitionSession_ResultGenerated(
      SpeechContinuousRecognitionSession sender,
      SpeechContinuousRecognitionResultGeneratedEventArgs args)
      {

        if (args.Result.Confidence == SpeechRecognitionConfidence.Medium ||
          args.Result.Confidence == SpeechRecognitionConfidence.High)
          {
            dictatedTextBuilder.Append(args.Result.Text + " ");

            await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
            {
              dictationTextBox.Text = dictatedTextBuilder.ToString();
              btnClearText.IsEnabled = true;
            });
          }
        else
        {
          await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
            {
              dictationTextBox.Text = dictatedTextBuilder.ToString();
            });
        }
      }
  1. 然后处理 Completed 事件,该事件指示连续听写的结尾。We then handle the Completed event, which indicates the end of continuous dictation.

    当你调用 StopAsyncCancelAsync 方法时会话结束(在下一部分介绍)。The session ends when you call the StopAsync or CancelAsync methods (described the next section). 在发生错误或用户停止说话时,会话也可以结束。The session can also end when an error occurs, or when the user has stopped speaking. 检查事件参数的 Status 属性以确定会话结束的原因 (SpeechRecognitionResultStatus)。Check the Status property of the event argument to determine why the session ended (SpeechRecognitionResultStatus).

    此处,我们在 OnNavigatedTo 页面事件中注册 Completed 连续识别事件的处理程序。Here, we register the handler for the Completed continuous recognition event in the OnNavigatedTo page event.

speechRecognizer.ContinuousRecognitionSession.Completed +=
      ContinuousRecognitionSession_Completed;
  1. 事件处理程序检查“Status”属性,以确定识别是否成功。The event handler checks the Status property to determine whether the recognition was successful. 它还可处理用户已停止说话的情况。It also handles the case where the user has stopped speaking. 通常,将 TimeoutExceeded 视为成功的识别,因为这意味着用户已结束说话。Often, a TimeoutExceeded is considered successful recognition as it means the user has finished speaking. 你应该在代码中对这种情况进行处理以提供良好体验。You should handle this case in your code for a good experience.

    注意   在无法直接更新 UI 的后台线程上引发ResultGenerated事件。Note  the ResultGenerated event is raised on a background thread that cannot update the UI directly. 如果处理程序需要更新 UI (因为 [) 语音和 TTS 示例 ] ,则必须通过调度程序的 RunAsync 方法将更新调度到 UI 线程。If a handler needs to update the UI (as the [Speech and TTS sample] does), you must dispatch the updates to the UI thread through the RunAsync method of the dispatcher.

private async void ContinuousRecognitionSession_Completed(
      SpeechContinuousRecognitionSession sender,
      SpeechContinuousRecognitionCompletedEventArgs args)
      {
        if (args.Status != SpeechRecognitionResultStatus.Success)
        {
          if (args.Status == SpeechRecognitionResultStatus.TimeoutExceeded)
          {
            await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
            {
              rootPage.NotifyUser(
                "Automatic Time Out of Dictation",
                NotifyType.StatusMessage);

              DictationButtonText.Text = " Continuous Recognition";
              dictationTextBox.Text = dictatedTextBuilder.ToString();
            });
          }
          else
          {
            await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
            {
              rootPage.NotifyUser(
                "Continuous Recognition Completed: " + args.Status.ToString(),
                NotifyType.StatusMessage);

              DictationButtonText.Text = " Continuous Recognition";
            });
          }
        }
      }

提供正在进行的识别反馈Provide ongoing recognition feedback

当用户对话时,他们通常将依赖上下文才能完全理解所说内容。When people converse, they often rely on context to fully understand what is being said. 同样,语音识别器通常需要上下文才能提供高可信度的识别结果。Similarly, the speech recognizer often needs context to provide high-confidence recognition results. 例如,除非可从前后的词语中收集到更多的上下文,否则词语“包含”和“包涵”本身是无法区分的。For example, by themselves, the words "weight" and "wait" are indistinguishable until more context can be gleaned from surrounding words. 除非识别器已具有一定的置信度来确保字词已正确识别,否则它将不会引发 ResultGenerated 事件。Until the recognizer has some confidence that a word, or words, have been recognized correctly, it will not raise the ResultGenerated event.

这可能会导致不理想的用户体验,因为他们仍在继续说话,但在识别器不足以具有能够引发 ResultGenerated 事件的置信度之前,不会提供任何结果。This can result in a less than ideal experience for the user as they continue speaking and no results are provided until the recognizer has high enough confidence to raise the ResultGenerated event.

处理 HypothesisGenerated 事件以改进这种明显的响应缺乏问题。Handle the HypothesisGenerated event to improve this apparent lack of responsiveness. 只要识别器为要处理的字词生成一组新的潜在匹配,就会引发此事件。This event is raised whenever the recognizer generates a new set of potential matches for the word being processed. 事件参数提供包含当前匹配的 Hypothesis 属性。The event argument provides an Hypothesis property that contains the current matches. 在用户继续说话时向其展示这些匹配,向他们保证处理仍在进行。Show these to the user as they continue speaking and reassure them that processing is still active. 当置信度较高并已确定识别结果时,使用 ResultGenerated 事件中提供的最终 Result 替换临时的 Hypothesis 结果。Once confidence is high and a recognition result has been determined, replace the interim Hypothesis results with the final Result provided in the ResultGenerated event.

此处,我们将假设文本和一个省略号(“…”)追加到输出 TextBox 的当前值。Here, we append the hypothetical text and an ellipsis ("…") to the current value of the output TextBox. 从生成新的假设直到从 ResultGenerated 事件获取最终结果后,文本框内容才会更新。The contents of the text box are updated as new hypotheses are generated and until the final results are obtained from the ResultGenerated event.

private async void SpeechRecognizer_HypothesisGenerated(
  SpeechRecognizer sender,
  SpeechRecognitionHypothesisGeneratedEventArgs args)
  {

    string hypothesis = args.Hypothesis.Text;
    string textboxContent = dictatedTextBuilder.ToString() + " " + hypothesis + " ...";

    await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
    {
      dictationTextBox.Text = textboxContent;
      btnClearText.IsEnabled = true;
    });
  }

启动和停止识别Start and stop recognition

启动识别会话之前,检查语音识别器 State 属性的值。Before starting a recognition session, check the value of the speech recognizer State property. 语音识别器必须处于 Idle 状态。The speech recognizer must be in an Idle state.

在检查语音识别器的状态之后,我们通过调用语音识别器的 ContinuousRecognitionSession 属性的 StartAsync 方法启动会话。After checking the state of the speech recognizer, we start the session by calling the StartAsync method of the speech recognizer's ContinuousRecognitionSession property.

if (speechRecognizer.State == SpeechRecognizerState.Idle)
{
  await speechRecognizer.ContinuousRecognitionSession.StartAsync();
}

可以采用两种方法停止识别:Recognition can be stopped in two ways:

  • StopAsync允许任何挂起的识别事件完成(直到所有识别操作完成之前,都将继续引发 ResultGeneratedStopAsync lets any pending recognition events complete (ResultGenerated continues to be raised until all pending recognition operations are complete).
  • CancelAsync立即终止识别会话并放弃任何挂起的结果。CancelAsync terminates the recognition session immediately and discards any pending results.

在检查语音识别器的状态之后,我们通过调用语音识别器的 ContinuousRecognitionSession 属性的 CancelAsync 方法停止会话。After checking the state of the speech recognizer, we stop the session by calling the CancelAsync method of the speech recognizer's ContinuousRecognitionSession property.

if (speechRecognizer.State != SpeechRecognizerState.Idle)
{
  await speechRecognizer.ContinuousRecognitionSession.CancelAsync();
}

备注

ResultGenerated 事件可在调用 CancelAsync 后发生。A ResultGenerated event can occur after a call to CancelAsync.
由于多线程处理,当调用 CancelAsync 时,ResultGenerated 事件可能仍保留在堆栈上。Because of multithreading, a ResultGenerated event might still remain on the stack when CancelAsync is called. 如果如此,则仍引发 ResultGenerated 事件。If so, the ResultGenerated event still fires.
如果在取消识别会话时设置任何私有字段,请始终在 ResultGenerated 处理程序中确认它们的值。If you set any private fields when canceling the recognition session, always confirm their values in the ResultGenerated handler. 例如,如果在取消会话时将字段设置为 null,请勿假定字段在处理程序中进行初始化。For example, don't assume a field is initialized in your handler if you set them to null when you cancel the session.

 

示例Samples