语音识别Speech recognition

使用语音识别提供输入内容、指定操作或命令并完成任务。Use speech recognition to provide input, specify an action or command, and accomplish tasks.

重要 APIWindows.Media.SpeechRecognitionImportant APIs: Windows.Media.SpeechRecognition

语音识别由以下部分构成:语音运行时、用于为运行时编程的识别 API、用于听写和 Web 搜索的现成语法,以及帮助用户发现和使用语音识别功能的默认系统 UI。Speech recognition is made up of a speech runtime, recognition APIs for programming the runtime, ready-to-use grammars for dictation and web search, and a default system UI that helps users discover and use speech recognition features.

配置语音识别Configure speech recognition

若要支持应用的语音识别,用户必须在其设备上连接并启用麦克风,并接受 Microsoft 隐私策略授予你的应用程序使用的权限。To support speech recognition with your app, the user must connect and enable a microphone on their device, and accept the Microsoft Privacy Policy granting permission for your app to use it.

若要自动提示用户使用系统对话框,该对话框请求访问和使用麦克风的音频源 (示例,请阅读下面) 的语音识别和语音合成示例,只需在应用程序包清单中设置麦克风设备功能即可。To automatically prompt the user with a system dialog requesting permission to access and use the microphone's audio feed (example from the Speech recognition and speech synthesis sample shown below), just set the Microphone device capability in the App package manifest. 有关更多详细信息,请参阅 应用功能声明For more detail, see App capability declarations.

麦克风访问隐私策略

如果用户单击 "是" 将访问权限授予麦克风,你的应用将添加到 "设置-> 隐私-> 麦克风" 页上的 "已批准的应用程序" 列表中。If the user clicks Yes to grant access to the microphone, your app is added to the list of approved applications on the Settings -> Privacy -> Microphone page. 但是,用户可以随时选择关闭此设置,因此,在尝试使用它之前,应确认你的应用程序有权访问麦克风。However, as the user can choose to turn this setting off at any time, you should confirm that your app has access to the microphone before attempting to use it.

如果还想要支持听写、Cortana 或其他语音识别服务 (例如在主题约束) 中定义的 预定义语法 ,还必须确认启用了 联机语音识别 (设置-> 隐私 > 语音) 。If you also want to support dictation, Cortana, or other speech recognition services (such as a predefined grammar defined in a topic constraint), you must also confirm that Online speech recognition (Settings -> Privacy -> Speech) is enabled.

此代码段显示了你的应用程序可以如何检查是否存在麦克风以及是否有权使用它。This snippet shows how your app can check if a microphone is present and if it has permission to use it.

public class AudioCapturePermissions
{
    // If no microphone is present, an exception is thrown with the following HResult value.
    private static int NoCaptureDevicesHResult = -1072845856;

    /// <summary>
    /// Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
    /// the Cortana/Dictation privacy check.
    ///
    /// You should perform this check every time the app gets focus, in case the user has changed
    /// the setting while the app was suspended or not in focus.
    /// </summary>
    /// <returns>True, if the microphone is available.</returns>
    public async static Task<bool> RequestMicrophonePermission()
    {
        try
        {
            // Request access to the audio capture device.
            MediaCaptureInitializationSettings settings = new MediaCaptureInitializationSettings();
            settings.StreamingCaptureMode = StreamingCaptureMode.Audio;
            settings.MediaCategory = MediaCategory.Speech;
            MediaCapture capture = new MediaCapture();

            await capture.InitializeAsync(settings);
        }
        catch (TypeLoadException)
        {
            // Thrown when a media player is not available.
            var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components are unavailable.");
            await messageDialog.ShowAsync();
            return false;
        }
        catch (UnauthorizedAccessException)
        {
            // Thrown when permission to use the audio capture device is denied.
            // If this occurs, show an error or disable recognition functionality.
            return false;
        }
        catch (Exception exception)
        {
            // Thrown when an audio capture device is not present.
            if (exception.HResult == NoCaptureDevicesHResult)
            {
                var messageDialog = new Windows.UI.Popups.MessageDialog("No Audio Capture devices are present on this system.");
                await messageDialog.ShowAsync();
                return false;
            }
            else
            {
                throw;
            }
        }
        return true;
    }
}
/// <summary>
/// Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
/// the Cortana/Dictation privacy check.
///
/// You should perform this check every time the app gets focus, in case the user has changed
/// the setting while the app was suspended or not in focus.
/// </summary>
/// <returns>True, if the microphone is available.</returns>
IAsyncOperation<bool>^  AudioCapturePermissions::RequestMicrophonePermissionAsync()
{
    return create_async([]() 
    {
        try
        {
            // Request access to the audio capture device.
            MediaCaptureInitializationSettings^ settings = ref new MediaCaptureInitializationSettings();
            settings->StreamingCaptureMode = StreamingCaptureMode::Audio;
            settings->MediaCategory = MediaCategory::Speech;
            MediaCapture^ capture = ref new MediaCapture();

            return create_task(capture->InitializeAsync(settings))
                .then([](task<void> previousTask) -> bool
            {
                try
                {
                    previousTask.get();
                }
                catch (AccessDeniedException^)
                {
                    // Thrown when permission to use the audio capture device is denied.
                    // If this occurs, show an error or disable recognition functionality.
                    return false;
                }
                catch (Exception^ exception)
                {
                    // Thrown when an audio capture device is not present.
                    if (exception->HResult == AudioCapturePermissions::NoCaptureDevicesHResult)
                    {
                        auto messageDialog = ref new Windows::UI::Popups::MessageDialog("No Audio Capture devices are present on this system.");
                        create_task(messageDialog->ShowAsync());
                        return false;
                    }

                    throw;
                }
                return true;
            });
        }
        catch (Platform::ClassNotRegisteredException^ ex)
        {
            // Thrown when a media player is not available. 
            auto messageDialog = ref new Windows::UI::Popups::MessageDialog("Media Player Components unavailable.");
            create_task(messageDialog->ShowAsync());
            return create_task([] {return false; });
        }
    });
}
var AudioCapturePermissions = WinJS.Class.define(
    function () { }, {},
    {
        requestMicrophonePermission: function () {
            /// <summary>
            /// Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
            /// the Cortana/Dictation privacy check.
            ///
            /// You should perform this check every time the app gets focus, in case the user has changed
            /// the setting while the app was suspended or not in focus.
            /// </summary>
            /// <returns>True, if the microphone is available.</returns>
            return new WinJS.Promise(function (completed, error) {

                try {
                    // Request access to the audio capture device.
                    var captureSettings = new Windows.Media.Capture.MediaCaptureInitializationSettings();
                    captureSettings.streamingCaptureMode = Windows.Media.Capture.StreamingCaptureMode.audio;
                    captureSettings.mediaCategory = Windows.Media.Capture.MediaCategory.speech;

                    var capture = new Windows.Media.Capture.MediaCapture();
                    capture.initializeAsync(captureSettings).then(function () {
                        completed(true);
                    },
                    function (error) {
                        // Audio Capture can fail to initialize if there's no audio devices on the system, or if
                        // the user has disabled permission to access the microphone in the Privacy settings.
                        if (error.number == -2147024891) { // Access denied (microphone disabled in settings)
                            completed(false);
                        } else if (error.number == -1072845856) { // No recording device present.
                            var messageDialog = new Windows.UI.Popups.MessageDialog("No Audio Capture devices are present on this system.");
                            messageDialog.showAsync();
                            completed(false);
                        } else {
                            error(error);
                        }
                    });
                } catch (exception) {
                    if (exception.number == -2147221164) { // REGDB_E_CLASSNOTREG
                        var messageDialog = new Windows.UI.Popups.MessageDialog("Media Player components not available on this system.");
                        messageDialog.showAsync();
                        return false;
                    }
                }
            });
        }
    })

识别语音输入Recognize speech input

约束可定义该应用在语音输入中识别出的字词和短语(词汇)。A constraint defines the words and phrases (vocabulary) that an app recognizes in speech input. 约束是语音识别的核心,使应用更好地控制语音识别的准确性。Constraints are at the core of speech recognition and give your app greater control over the accuracy of speech recognition.

您可以使用以下类型的约束来识别语音输入。You can use the following types of constraints for recognizing speech input.

预定义的语法Predefined grammars

预定义的听写和 Web 搜索语法在无需你创作语法的情况下为你的应用提供语音识别。Predefined dictation and web-search grammars provide speech recognition for your app without requiring you to author a grammar. 使用这些语法时,语音识别由远程 Web 服务执行,并且结果将返回到设备。When using these grammars, speech recognition is performed by a remote web service and the results are returned to the device.

默认自由文本听写语法可以识别用户以特定语言说出的大部分字词或短语,并且为识别短语进行了优化。The default free-text dictation grammar can recognize most words and phrases that a user can say in a particular language, and is optimized to recognize short phrases. 如果没有为 SpeechRecognizer 对象指定任何约束,将使用预定义的听写语法。The predefined dictation grammar is used if you don't specify any constraints for your SpeechRecognizer object. 当你不希望限制用户可说内容的种类时,自由文本听写非常有用。Free-text dictation is useful when you don't want to limit the kinds of things a user can say. 典型用法包括为一条消息创建笔记或听写其内容。Typical uses include creating notes or dictating the content for a message.

诸如听写语法等 Web 搜索语法包含了用户可能说出的大量字词和短语。The web-search grammar, like a dictation grammar, contains a large number of words and phrases that a user might say. 但是,优化它的目的是识别用户搜索 Web 时通常使用的术语。However, it is optimized to recognize terms that people typically use when searching the web.

备注

 由于预定义的听写和 Web 搜索语法可能很大,而且处于联机状态(不在设备上),性能可能不如安装在设备上的自定义语法快。 Because predefined dictation and web-search grammars can be large, and because they are online (not on the device), performance might not be as fast as with a custom grammar installed on the device.  

可以使用这些预定义语法识别长达 10 秒的语音输入,并且不要求你进行任何创作。These predefined grammars can be used to recognize up to 10 seconds of speech input and require no authoring effort on your part. 然而,它们确实需要连接到网络。However, they do require a connection to a network.

若要使用 Web 服务约束,必须在设置中启用语音输入和听写支持,方法是在设置 -> 隐私 -> 语音、墨迹书写和键入中打开“了解我”选项。To use web-service constraints, speech input and dictation support must be enabled in Settings by turning on the "Get to know me" option in Settings -> Privacy -> Speech, inking, and typing.

下面我们将介绍如何测试是否已启用语音输入,如果未启用,则打开“设置”->“隐私”->“语音、墨迹书写和键入”页面。Here, we show how to test whether speech input is enabled and open the Settings -> Privacy -> Speech, inking, and typing page, if not.

首先,我们将全局变量 (HResultPrivacyStatementDeclined) 初始化为 0x80045509 的 HResult 值。First, we initialize a global variable (HResultPrivacyStatementDeclined) to the HResult value of 0x80045509. 请参阅 C # 或 Visual Basic 中的异常处理See Exception handling for in C# or Visual Basic.

private static uint HResultPrivacyStatementDeclined = 0x80045509;

如果 HResult 值等于 HResultPrivacyStatementDeclined 变量的值,则我们会在识别和测试过程中发现任何标准异常。We then catch any standard exceptions during recogntion and test if the HResult value is equal to the value of the HResultPrivacyStatementDeclined variable. 如果情况如此,我们将会显示一则警告并调用 await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-accounts")); 以打开“设置”页。If so, we display a warning and call await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-accounts")); to open the Settings page.

catch (Exception exception)
{
  // Handle the speech privacy policy error.
  if ((uint)exception.HResult == HResultPrivacyStatementDeclined)
  {
    resultTextBlock.Visibility = Visibility.Visible;
    resultTextBlock.Text = "The privacy statement was declined." + 
      "Go to Settings -> Privacy -> Speech, inking and typing, and ensure you" +
      "have viewed the privacy policy, and 'Get To Know You' is enabled.";
    // Open the privacy/speech, inking, and typing settings page.
    await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-accounts")); 
  }
  else
  {
    var messageDialog = new Windows.UI.Popups.MessageDialog(exception.Message, "Exception");
    await messageDialog.ShowAsync();
  }
}

请参阅 SpeechRecognitionTopicConstraintSee SpeechRecognitionTopicConstraint.

编程列表约束Programmatic list constraints

编程列表约束提供一种轻型方法,用于使用字词或短语的列表创建一种简单的语法。Programmatic list constraints provide a lightweight approach to creating simple grammars using a list of words or phrases. 列表约束非常适用于识别清晰的短语。A list constraint works well for recognizing short, distinct phrases. 因为语音识别引擎仅须处理语音即可确认匹配,所以采用某种语法明确指定所有字词也可提高识别准确度。Explicitly specifying all words in a grammar also improves recognition accuracy, as the speech recognition engine must only process speech to confirm a match. 也可以以编程方式更新该列表。The list can also be programmatically updated.

列表约束由字符串数组组成,此数组表示你的应用将为识别操作接受的语音输入。A list constraint consists of an array of strings that represents speech input that your app will accept for a recognition operation. 你可以通过创建语音识别列表约束对象并传递字符串数组在应用中创建列表约束。You can create a list constraint in your app by creating a speech-recognition list-constraint object and passing an array of strings. 然后,将该对象添加到识别器的约束集合。Then, add that object to the constraints collection of the recognizer. 当语音识别器识别数组中的任何一个字符串时,识别成功。Recognition is successful when the speech recognizer recognizes any one of the strings in the array.

请参阅 SpeechRecognitionListConstraintSee SpeechRecognitionListConstraint.

SRGS 语法SRGS grammars

语音识别语法规范 (SRGS) 语法是一个静态文档,与编程列表约束不同,它使用由 SRGS 版本 1.0 定义的 XML 格式。An Speech Recognition Grammar Specification (SRGS) grammar is a static document that, unlike a programmatic list constraint, uses the XML format defined by the SRGS Version 1.0. SRGS 语法提供了对语音识别体验的最大控制,方法是让你在单个识别中捕获多个语义含义。An SRGS grammar provides the greatest control over the speech recognition experience by letting you capture multiple semantic meanings in a single recognition.

请参阅 SpeechRecognitionGrammarFileConstraintSee SpeechRecognitionGrammarFileConstraint.

语音命令约束Voice command constraints

使用语音命令定义 (VCD) XML 文件定义用户可以在激活应用时说出以启动操作的命令。Use a Voice Command Definition (VCD) XML file to define the commands that the user can say to initiate actions when activating your app. 有关更多详细信息,请参阅 通过 Cortana 使用语音命令激活前台应用For more detail, see Activate a foreground app with voice commands through Cortana.

请参阅 SpeechRecognitionVoiceCommandDefinitionConstraint/See SpeechRecognitionVoiceCommandDefinitionConstraint/

注意   使用的约束类型类型取决于要创建的识别体验的复杂性。Note  The type of constraint type you use depends on the complexity of the recognition experience you want to create. 对于特定识别任务,任一类型都可能是最佳选择,你也可能在应用中发现所有类型的约束的用途。Any could be the best choice for a specific recognition task, and you might find uses for all types of constraints in your app. 要开始使用约束,请参阅定义自定义识别约束To get started with constraints, see Define custom recognition constraints.

预定义的通用 Windows 应用听写语法可识别使用某种语言的大部分字词和短语。The predefined Universal Windows app dictation grammar recognizes most words and short phrases in a language. 如果语音识别器对象在没有自定义约束的情况下实例化,它会自动激活。It is activated by default when a speech recognizer object is instantiated without custom constraints.

在该示例中,我们展示如何:In this example, we show how to:

  • 创建语音识别器。Create a speech recognizer.
  • 编译默认 Universal Windows App 约束(未向语音识别器的语法集添加任何语法)。Compile the default Universal Windows app constraints (no grammars have been added to the speech recognizer's grammar set).
  • 开始使用 RecognizeWithUIAsync 方法提供的基本识别 UI 和 TTS 反馈侦听语音。Start listening for speech by using the basic recognition UI and TTS feedback provided by the RecognizeWithUIAsync method. 如果不需要默认 UI,则使用 RecognizeAsync 方法。Use the RecognizeAsync method if the default UI is not required.
private async void StartRecognizing_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Compile the dictation grammar by default.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

自定义识别 UICustomize the recognition UI

当你的应用通过调用 SpeechRecognizer.RecognizeWithUIAsync 来尝试进行语音识别时,多个屏幕将按以下顺序显示。When your app attempts speech recognition by calling SpeechRecognizer.RecognizeWithUIAsync, several screens are shown in the following order.

如果你使用基于预定义语法的约束(听写或 Web 搜索):If you're using a constraint based on a predefined grammar (dictation or web search):

  • 侦听屏幕。The Listening screen.
  • 思考屏幕。The Thinking screen.
  • 听到你说屏幕或错误屏幕。The Heard you say screen or the error screen.

如果你使用的约束基于字词或短语列表,或者基于 SRGS 语法文件:If you're using a constraint based on a list of words or phrases, or a constraint based on a SRGS grammar file:

  • 侦听屏幕。The Listening screen.
  • 你说的是屏幕,如果用户所说的内容可以解释为不止一种可能性结果。The Did you say screen, if what the user said could be interpreted as more than one potential result.
  • 听到你说屏幕或错误屏幕。The Heard you say screen or the error screen.

下图演示了语音识别器在不同屏幕间的流程的示例,该识别器使用基于 SRGS 语法文件的约束。The following image shows an example of the flow between screens for a speech recognizer that uses a constraint based on a SRGS grammar file. 在本例中,语音识别是成功的。In this example, speech recognition was successful.

基于 sgrs 语法文件的约束的初始识别屏幕

基于 sgrs 语法文件的约束的中间识别屏幕

基于 sgrs 语法文件的约束的最终识别屏幕

侦听屏幕可提供应用可识别的字词或短语的示例。The Listening screen can provide examples of words or phrases that the app can recognize. 下面我们介绍如何使用 SpeechRecognizerUIOptions 类的属性(通过调用 SpeechRecognizer.UIOptions 属性获取)自定义侦听屏幕上的内容。Here, we show how to use the properties of the SpeechRecognizerUIOptions class (obtained by calling the SpeechRecognizer.UIOptions property) to customize content on the Listening screen.

private async void WeatherSearch_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Listen for audio input issues.
    speechRecognizer.RecognitionQualityDegrading += speechRecognizer_RecognitionQualityDegrading;

    // Add a web search grammar to the recognizer.
    var webSearchGrammar = new Windows.Media.SpeechRecognition.SpeechRecognitionTopicConstraint(Windows.Media.SpeechRecognition.SpeechRecognitionScenario.WebSearch, "webSearch");


    speechRecognizer.UIOptions.AudiblePrompt = "Say what you want to search for...";
    speechRecognizer.UIOptions.ExampleText = @"Ex. 'weather for London'";
    speechRecognizer.Constraints.Add(webSearchGrammar);

    // Compile the constraint.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
    //await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

示例Samples