语音输入Voice input


在 HoloLens 上,语音是重要输入形式之一。Voice is one of the key forms of input on HoloLens. 它使你可以直接在无需使用 手型手势的情况下直接执行命令。It allows you to directly command a hologram without having to use hand gestures. 可以将语音输入作为一种传达意图的自然方式。Voice input can be a natural way to communicate your intent. 语音在遍历复杂接口时特别有用,因为它允许用户通过一个命令剪切嵌套菜单。Voice is especially good at traversing complex interfaces, because it lets users cut through nested menus with one command.

语音输入由支持所有 通用 Windows 应用 中的语音的 同一引擎提供支持。Voice input is powered by the same engine that supports speech in all Universal Windows Apps. 在 HoloLens 上,语音识别将始终在设备设置中配置的 Windows 显示语言中工作。On HoloLens, speech recognition will always function in the Windows display language configured in your device Settings.

语音和注视Voice and gaze

使用语音命令时,head 或眼睛看起来是典型的目标机制,无论是使用光标 "选择" 还是将命令通道到你要查看的应用程序。When you're using voice commands, head or eye gaze is the typical targeting mechanism, whether with a cursor to "select" or to channel your command to an application you're looking at. 可能甚至不需要显示任何看起来光标 ( "查看它" )It may not even be required to show any gaze cursor ("see it, say it"). 某些语音命令根本不需要目标,如 "开始" 或 "你好 Cortana"。Some voice commands don't require a target at all, such as "go to start" or "Hey Cortana."

设备支持Device support

功能Feature HoloLens(第 1 代)HoloLens (1st gen) HoloLens 2HoloLens 2 沉浸式头戴显示设备Immersive headsets
语音输入Voice input ✔️✔️ ✔️✔️ 用麦克风) ✔️ (✔️ (with microphone)

"Select" 命令The "select" command

HoloLens(第一代)HoloLens (1st gen)

即使不将语音支持专门添加到应用,用户也可以通过口述系统语音命令 "select" 来激活全息影像。Even without specifically adding voice support to your app, your users can activate holograms simply by saying the system voice command "select". 此行为与在 HoloLens 上的 点击 ,按下 hololens clicker上的 "选择" 按钮或在 Windows Mixed Reality 运动控制器上按下触发器的行为相同。This behaves the same as an air tap on HoloLens, pressing the select button on the HoloLens clicker, or pressing the trigger on a Windows Mixed Reality motion controller. 听到声音,并看到带有 "select" 的工具提示显示为确认。You'll hear a sound and see a tooltip with "select" appear as confirmation. "选择" 由低功率关键字检测算法启用,这意味着您可以在任何时间使用最小的电池寿命。"Select" is enabled by a low-power keyword detection algorithm, which means you can say it anytime with minimal battery life impact. 你甚至可以在一边说 "选择"。You can even say "select" with your hands at your side.

HoloLens 2HoloLens 2

若要在 HoloLens 2 中使用 "选择" 语音命令,首先需要打开 "注视" 光标作为指针。To use the "select" voice command in HoloLens 2, you first need to bring up the gaze cursor to use as a pointer. 让它变得更容易记住,只需说 "选择"。The command to bring it up is easy to remember--just say, "select".

若要退出此模式,请通过空中攻指再次使用鼠标,使用手指接近按钮,或使用系统手势。To exit the mode, use your hands again by air tapping, approaching a button with your fingers, or using the system gesture.

图像:说 "选择" 以使用声音命令进行选择Image: Say "select" to use the voice command for selection

用户可以将 "选择" 用于所选的语音命令。

你好小娜Hey Cortana

你可以随时说 "你好 Cortana" 来打开 Cortana。You can say "Hey Cortana" to bring up Cortana at any time. 无需等待她就会继续询问她的问题或向她提供一条指导。You don't have to wait for her to appear to continue asking her your question or giving her an instruction. 例如,尝试说 "你好 Cortana,天气是什么?"。For example, try saying "Hey Cortana, what's the weather?" 作为单个句子。as a single sentence. 有关 Cortana 和你可以执行的操作的详细信息,请询问她!For more information about Cortana and what you can do, ask her! 说 "你好 Cortana,我该怎么办?"Say "Hey Cortana, what can I say?" 然后,她将获取工作和建议命令的列表。and she'll pull up a list of working and suggested commands. 如果已在 Cortana 应用中,请选择 If you're already in the Cortana app, select the ? 图标,提取此相同的菜单。icon on the sidebar to pull up this same menu.

HoloLens 特定的命令HoloLens-specific commands

  • “我可以说什么?”"What can I say?"
  • "转到开始"-而不是 布隆 转到 " 开始" 菜单"Go to Start" - instead of bloom to get to Start Menu
  • "启动 ""Launch "
  • "移动到 此处""Move here"
  • "拍摄照片""Take a picture"
  • "开始录制""Start recording"
  • "停止录制""Stop recording"
  • "显示现有的光线""Show hand ray"
  • "隐藏现有的光线""Hide hand ray"
  • "增加亮度""Increase the brightness"
  • "降低亮度""Decrease the brightness"
  • "增加音量""Increase the volume"
  • "降低音量""Decrease the volume"
  • "静音" 或 "取消静音""Mute" or "Unmute"
  • "关闭设备""Shut down the device"
  • "重新启动设备""Restart the device"
  • "进入睡眠状态""Go to sleep"
  • "它是什么时间?""What time is it?"
  • "我有多少电池剩余电量?""How much battery do I have left?"

"看,说它""See It, Say It"

HoloLens 为语音输入提供了一个 "查看 it,假设 it" 模型,其中按钮上的标签告诉用户他们可以表达的语音命令。HoloLens has a "see it, say it" model for voice input, where labels on buttons tell users what voice commands they can say as well. 例如,在使用 HoloLens (第一代) 中查看应用窗口时,用户可以说 "调整" 命令来调整应用在世界中的位置。For example, when looking at an app window in HoloLens (1st gen), a user can say "Adjust" command to adjust the position of the app in the world.

图像:用户可以说 "调整" 命令,该命令显示在应用程序栏中以调整应用程序的位置Image: A user can say the "Adjust" command, which they see in the App bar to adjust the position of the app

查看应用程序窗口或全息图时,用户可以说 "调整" 命令,该命令显示在应用程序栏中,用于调整应用在世界中的位置。When looking at an app window or hologram, a user can say the "Adjust" command which they see in the App bar to adjust the position of the app in the world

当应用遵循此规则时,用户可以轻松地了解要控制系统的内容。When apps follow this rule, users can easily understand what to say to control the system. Gazing 在 HoloLens (第一代) 中的某个按钮上,你会看到一个 "声音停留" 工具提示,该工具提示在一秒钟后出现,如果按钮已启用语音,并显示 "按" 的命令。While gazing at a button in HoloLens (1st gen), you'll see a "voice dwell" tooltip that comes up after a second if the button is voice-enabled and displays the command to speak to "press" it. 若要在 HoloLens 2 中显示语音工具提示,请通过说 "选择" 或 "我可以说什么" 来显示语音光标 (查看图像) 。To reveal voice tooltips in HoloLens 2, show the voice cursor by saying "select" or "What can I say" (See image).

图像: "查看它,假设" 命令显示在按钮下方Image: "See it, say it" commands appear below the buttons

查看它,假设 "

用于快速全息影像操作的语音命令Voice commands for fast hologram manipulation

在 gazing 时,可以使用许多语音命令来快速执行操作任务。There are many voice commands you can say while gazing at a hologram to quickly do manipulation tasks. 这些语音命令适用于你放置在世界中的应用程序窗口和3D 对象。These voice commands work on app windows and 3D objects you've placed in the world.

全息图操作命令Hologram manipulation commands

  • 面部Face me
  • 更大 |完善Bigger | Enhance
  • 超过Smaller

在 HoloLens 2 上,你还可以结合眼睛创建更自然的交互,从而隐式地提供有关所引用内容的上下文信息。On HoloLens 2, you can also create more natural interactions in combination with eye-gaze, which implicitly provides contextual information about what you are referring to. 例如,您可以查看全息图,说 "放置 内容",然后在要放置它的位置上进行查看,并显示 "在 此处"。For example, you could look at a hologram and say "put this" and then look over where you want to place it and say "over here". 您也可以在复杂的计算机上查看全息部分,并说: "向我介绍有关 的详细信息"。Or you could look at a holographic part on a complex machine and say: "give me more information about this".

发现语音命令Discovering voice commands

某些命令(如上面用于快速操作的命令)可以隐藏。Some commands, like the commands for fast manipulation above, can be hidden. 若要了解可以使用的命令,请看一看对象并说 "我可以说什么?"。To learn about what commands you can use, gaze at an object and say, "what can I say?". 将弹出的可能命令的列表。A list of possible commands pops up. 你还可以使用打印头的光标来浏览并显示你前面每个按钮的语音工具提示。You can also use the head gaze cursor to look around and reveal the voice tooltips for each button in front of you.

如果需要完整的列表,请随时说 "显示所有命令"。If you want a complete list, just say, "Show all commands" anytime.


语音听写可以更高效地向应用程序中输入文本,而不是使用 空中点击Rather than typing with air taps, voice dictation can be more efficient to enter text into an app. 这可以极大地加快用户的输入。This can greatly accelerate input with less effort for the user.

语音听写开始,选择 "麦克风" 按钮Voice dictation starts by selecting the microphone button
语音听写开始于选择键盘上的麦克风按钮Voice dictation starts by selecting the microphone button on the keyboard

无论何时激活全息键盘,都可以切换到听写模式,而无需键入。Anytime the holographic keyboard is active, you can switch to dictation mode instead of typing. 选择文本输入框一侧的麦克风即可开始。Select the microphone on the side of the text input box to get started.

向应用程序添加语音命令Adding voice commands to your app

考虑为生成的任何体验添加语音命令。Consider adding voice commands to any experience that you build. 语音是控制系统和应用程序的一种强大的方式。Voice is a powerful way control the system and apps. 由于用户说到不同种类的方言和重音,因此正确选择的语音关键字将确保用户的命令可明确解释。Because users speak with different kinds of dialects and accents, proper choice of speech keywords will make sure your users' commands are interpreted unambiguously.

最佳做法Best practices

以下是一些有助于流畅语音识别的做法。Below are some practices that will aid in smooth speech recognition.

  • 使用简明命令 - 如果可能的话,选择两个或更多音节的关键词。Use concise commands - When possible, choose keywords of two or more syllables. 不同口音的人说单音节词时倾向于使用不同的元音。One-syllable words tend to use different vowel sounds when spoken by persons of different accents. 示例: "播放视频" 优于 "播放当前选定的视频"Example: "Play video" is better than "Play the currently selected video"
  • 使用简单词汇 -示例: "show note" 优于 "show placard"Use simple vocabulary - Example: "Show note" is better than "Show placard"
  • 请确保命令是非破坏性命令 ,确保任何语音命令操作都不会破坏性,并可在用户附近的其他人意外触发命令的情况下轻松撤消。Make sure commands are non-destructive - Make sure any speech command actions are non-destructive and can easily be undone in case another person speaking near the user accidentally triggers a command.
  • 避免发音类似的命令 -避免注册发音类似的多个语音命令。Avoid similar sounding commands - Avoid registering multiple speech commands that sound similar. 示例: "显示更多" 和 "显示存储" 的发音可能与此类似。Example: "Show more" and "Show store" can be similar sounding.
  • 当你的应用程序不使用时对其进行注销 -当你的应用未处于特定语音命令有效的状态时,请考虑将其取消注册,以便不会混淆其他命令。Unregister your app when not it uses - When your app isn't in a state in which a particular speech command is valid, consider unregistering it so that other commands aren't confused for that one.
  • 使用不同的口音进行测试 - 由具有不同口音的用户测试应用。Test with different accents - Test your app with users of different accents.
  • 保持语音命令一致性 - 如果“返回”可转到上一页,请在应用程序中保持此行为。Maintain voice command consistency - If "Go back" goes to the previous page, maintain this behavior in your applications.
  • 避免使用系统命令 -为系统保留以下语音命令,因此请避免在应用程序中使用它们:Avoid using system commands - The following voice commands are reserved for the system, so avoid using them in your applications:
    • “你好小娜”"Hey Cortana"
    • “选择”"Select"
    • "中转到开始""Go to start"

语音输入的优点Advantages of voice input

语音输入是传达我们意图的自然方式。Voice input is a natural way to communicate our intents. 语音在接口 遍历 上特别不错,因为它可以帮助用户遍历接口的多个步骤。Voice is especially good at interface traversals because it can help users cut through multiple steps of an interface. 在查看网页时,用户可能会说 "返回",而不是在应用程序中使用后退按钮。A user might say "go back" while looking at a webpage, instead of having to go up and hit the back button in the app. 这一小段节省时间对用户的体验感到 非常强大, 并为他们提供少量实现超级。This small time saving has a powerful emotional effect on user’s perception of the experience and gives them a small amount superpower. 当我们忙得不可开交或同时处理多项任务时,使用语音也是一种方便的输入方法。Using voice is also a convenient input method when we have our arms full or are multi-tasking. 对于在键盘上键入很困难的设备, 语音听写 可能是输入文本的一种有效的替代方法。On devices where typing on a keyboard is difficult, voice dictation can be an efficient alternative way to input text. 最后,在某些情况下,看看注视和手势的 准确性范围 会受到限制,语音有助于消除用户的意图。Lastly, in some cases when the range of accuracy for gaze and gesture are limited, voice can help to disambiguate the user's intent.

语音的使用如何让用户受益How using voice can benefit the user

  • 省时 - 它应该使最终目标更高效。Reduces time - it should make the end goal more efficient.
  • 最大限度地减少工作量 - 它应该使任务更加流畅和轻松。Minimizes effort - it should make tasks more fluid and effortless.
  • 减少认知压力 - 它是直观的,易于学习和记忆。Reduces cognitive load - it's intuitive, easy to learn, and remember.
  • 这是可接受的社交-它应符合社会的行为。It's socially acceptable - it should fit in with societal norms of behavior.
  • 它是常规的 - 语音很容易成为一种习惯行为。It's routine - voice can readily become a habitual behavior.

语音输入的挑战Challenges for voice input

虽然语音输入非常适合许多不同的应用程序,但它也面临着几个难题。While voice input is great for many different applications, it also faces several challenges. 了解语音输入的优点和挑战使应用程序开发人员能够更灵活地选择使用语音输入的方式和时间,并为用户提供良好的体验。Understanding both the advantages and challenges for voice input enables app developers to make smarter choices for how and when to use voice input and to create a great experience for their users.

连续输入控件的语音输入 细粒度控件是其中之一。Voice input for continuous input control Fine-grained control is one of them. 例如,用户可能想要更改其音乐应用中的音量。For example, a user might want to change their volume in their music app. 她可能会说 "更大",但并不清楚系统的容量。She can say "louder", but it's not clear how much louder the system is supposed to make the volume. 用户可能会说: "让它变得稍大一些",但很难量化。The user could say: "Make it a little louder", but "a little" is difficult to quantify. 移动或缩放全息影像的方式同样困难。Moving or scaling holograms with voice is similarly difficult.

语音输入检测的可靠性 虽然语音输入系统变得更好且更好,但有时它们可能会错误地听到和解释语音命令。Reliability of voice input detection While voice input systems become better and better, sometimes they may incorrectly hear and interpret a voice command. 关键是解决应用程序中的难题。The key is to address the challenge in your application. 当系统正在侦听时向用户提供反馈,并且系统理解的内容阐明了理解用户语音的潜在问题。Provide feedback to your users when the system is listening and what the system understood clarifies potential issues understanding the users' speech.

共享空间中的语音输入 在与他人共享的空格中,语音可能无法社交。Voice input in shared spaces Voice may not be socially acceptable in spaces that you share with others. 以下是一些示例:Here are a few examples:

  • 用户可能不希望干扰其他 (例如,在安静库或共享办公室) The user may not want to disturb others (for example, in a quiet library or shared office)
  • 用户可能会很难被视为公开的,Users may feel awkward being seen talking to themselves in public,
  • 用户可能会感到不安,口述个人或机密邮件 (在其他人侦听时) 密码A user may feel uncomfortable dictating a personal or confidential message (including passwords) while others are listening

语音输入唯一或未知字词 当用户听写可能对系统未知的字词时(如昵称、某些俚语词或缩写),也会出现语音输入问题。Voice input of unique or unknown words Difficulties for voice input also come when users are dictating words that may be unknown to the system, such as nicknames, certain slang words, or abbreviations.

学习语音命令 尽管最终目标是自然地与系统进行对话,但应用程序通常仍依赖于特定的预定义语音命令。Learning voice commands While the ultimate goal is to naturally converse with your system, often apps still rely on specific pre-defined voice commands. 与一组重要的语音命令相关的挑战是如何在不使用户超载的情况下进行教授,以及如何帮助用户保留它们。A challenge associated with a significant set of voice commands is how to teach them without overloading the user and how to help the user to keep them.

语音反馈状态Voice feedback states

当语音应用正确时,用户了解他们能说什么,并得到清晰的反馈 - 系统正确地听到了用户说的话。When Voice is applied properly, the user understands what they can say and get clear feedback the system heard them correctly. 这两个信号使用户在使用语音作为主要输入方法时充满自信。These two signals make the user feel confident in using Voice as a primary input. 下面的图表显示了识别语音输入时光标发生的情况以及它是如何将信息传达给用户的。Below is a diagram showing what happens to the cursor when voice input is recognized and how it communicates that to the user.

1. 常规游标状态1. Regular cursor state
1. 常规游标状态1. Regular cursor state

2. 传达语音反馈,然后消失2. Communicates voice feedback and then disappears
2. 传达语音反馈,然后消失2. Communicates voice feedback and then disappears

三维空间.*3. 常规游标状态Regular cursor state
3. 返回到常规游标状态3. Returns to regular cursor state

在混合现实中,用户应该知道的关于“语音”的重要事项Top things users should know about "speech" in mixed reality

  • 如果以按钮为目标,则显示 "选择" (可以使用此任意位置选择) 的按钮。Say "Select" while targeting a button (you can use this anywhere to select a button).
  • 可以在某些应用中通过说出应用栏按钮的标签名称来执行操作。You can say the label name of an app bar button in some apps to take an action. 例如,在查看某个应用程序时,用户可能会说 "删除" 命令,以从世界中删除该应用程序 (这节省了用手) 选择它的时间。For example, while looking at an app, a user can say the command "Remove" to remove the app from the world (this saves time from having to select it with your hand).
  • 可以通过口述 "你好 cortana" 开始 cortana 侦听。You can start Cortana listening by saying "Hey Cortana." 你可以向她提问(“你好小娜,埃菲尔铁塔有多高”),告诉她打开应用(“你好小娜,打开 Netflix”),或告诉她调出“开始”菜单(“你好小娜,带我回家”)等等。You can ask her questions ("Hey Cortana, how tall is the Eiffel tower"), tell her to open an app ("Hey Cortana, open Netflix"), or tell her to bring up the Start Menu ("Hey Cortana, take me home") and more.

用户对语音的常见问题和关注点Common questions and concerns users have about voice

  • 我可以说什么?What can I say?
  • 我如何知道系统正确听到了我说的话?How do I know the system heard me correctly?
    • 系统总是理解错误我的语音命令。The system keeps getting my voice commands wrong.
    • 当我发出语音命令时,系统不会做出反应。It doesn’t react when I give it a voice command.
  • 当我发出语音命令时,系统的反应是错误的。It reacts the wrong way when I give it a voice command.
  • 我如何将我的语音定向到一个特定的应用或应用命令?How do I target my voice to a specific app or app command?
  • 我可以使用语音来对 HoloLens 上的全息帧执行命令吗?Can I use voice to command things out the holographic frame on HoloLens?


对于想要利用 HoloLens 提供的自定义音频输入处理选项的应用程序,请务必了解应用程序可使用的各种 音频流类别For applications that want to take advantage of the customized audio input processing options provided by HoloLens, it's important to understand the various audio stream categories your app can consume. Windows 10 支持多个不同的流类别,并且 HoloLens 使用这三个类别来启用自定义处理,以优化为语音、通信和其他工作而定制的麦克风音频质量,这些质量可用于环境环境音频捕获 (即 "摄像机" ) 方案。Windows 10 supports several different stream categories and HoloLens makes use of three of these to enable custom processing to optimize the microphone audio quality tailored for speech, communication, and other, which can be used for ambient environment audio capture (that is, "camcorder") scenarios.

  • 为呼叫质量和旁白方案自定义 AudioCategory_Communications 流类别,并为客户端提供用户语音的 16 kHz 24 位 mono 音频流The AudioCategory_Communications stream category is customized for call quality and narration scenarios and provides the client with a 16-kHz 24-bit mono audio stream of the user's voice
  • 为 HoloLens (Windows) 语音引擎自定义 AudioCategory_Speech 流类别,并为其提供一个 16-kHz 24 位 mono 的用户语音流。The AudioCategory_Speech stream category is customized for the HoloLens (Windows) speech engine and provides it with a 16-kHz 24-bit mono stream of the user's voice. 如果需要,第三方语音引擎可以使用此类别。This category can be used by third-party speech engines if needed.
  • 为环境环境录音录音自定义 AudioCategory_Other 流类别,并为客户端提供 48-kHz 24 位立体声音频流。The AudioCategory_Other stream category is customized for ambient environment audio recording and provides the client with a 48-kHz 24-bit stereo audio stream.

所有这种音频处理都是硬件加速,这意味着功能消耗的电能要比在 HoloLens CPU 上执行相同处理要少得多。All this audio processing is hardware accelerated which means the features drain a lot less power than if the same processing was done on the HoloLens CPU. 避免在 CPU 上运行其他音频输入处理,以最大程度地提高系统电池寿命,并利用内置的卸载音频输入处理。Avoid running other audio input processing on the CPU to maximize system battery life and take advantage of the built-in, offloaded audio input processing.


HoloLens 2 支持多种语言HoloLens 2 supports multiple languages. 请记住,即使安装了多个键盘或应用尝试使用其他语言创建语音识别器,语音命令也始终会在系统的显示语言中运行。Keep in mind that speech commands will always run in the system's display language even if multiple keyboards are installed or if apps attempt to create a speech recognizer in a different language.


如果使用 "选择" 和 "你好 Cortana" 时遇到任何问题,请尝试移动到可取消选择的空间、远离噪音源或说出更大的声音。If you're having any issues using "select" and "Hey Cortana", try moving to a quieter space, turning away from the source of noise, or by speaking louder. 目前,HoloLens 上的所有语音识别都专门针对美国英语的本机扬声器进行优化和优化。At this time, all speech recognition on HoloLens is tuned and optimized specifically to native speakers of United States English.

对于 Windows Mixed Reality Developer Edition 版本2017,音频终结点管理逻辑将正常运行 (在初始 HMD 连接后,将) 注销并返回到 PC 桌面。For the Windows Mixed Reality Developer Edition release 2017, the audio endpoint management logic will work fine (forever) after logging out and back in to the PC desktop after the initial HMD connection. 在第一次登录/进入 WMR 之后,用户可能会遇到各种音频功能问题,范围从无音频切换到无音频切换,具体取决于系统在第一次连接 HMD 之前的设置方式。Before that first sign out/in event after going through WMR OOBE, the user could experience various audio functionality issues ranging from no audio to no audio switching depending on how the system was set up before connecting the HMD for the first time.

MRTK 中的语音输入 (混合现实工具包) 适用于 UnityVoice input in MRTK (Mixed Reality Toolkit) for Unity

借助 MRTK,你可以轻松地为任何对象分配语音命令。With MRTK, you can easily assign voice command on any objects. 使用 MRTK 的 语音输入配置文件 定义关键字。Use MRTK's Speech Input Profile to define your keywords. 通过分配 SpeechInputHandler 脚本,你可以使任何对象响应语音输入配置文件中定义的关键字。By assigning SpeechInputHandler script, you can make any object respond to the keywords defined in the Speech Input Profile. SpeechInputHandler 还提供了语音确认标签以提高用户信心。SpeechInputHandler also provides speech confirmation label to improve the user's confidence.

另请参阅See also