使用语音调用 UI 元素Using Speech to Invoke UI Elements

启用语音的 Shell (VES) 是 Windows 语音平台的扩展,可在应用内实现一流的语音体验,从而允许用户使用语音调用屏幕上的控件并通过听写插入文本。Voice Enabled Shell (VES) is an extension to the Windows Speech Platform that enables a first-class speech experience inside apps, allowing users to use speech for invoking on-screen controls and to insert text via dictation. VES 致力于在所有 Windows Shell 和设备上提供通用的端到端体验,并在应用程序开发人员所需的工作量最小。VES strives to provide a common end-to-end see-it-say-it experience on all Windows Shells and devices, with minimum effort required from app developers. 为实现此目的,它利用了 Microsoft 语音平台和 UI 自动化 (UIA) framework。To achieve this, it leverages the Microsoft Speech Platform and the UI Automation (UIA) framework.

用户体验演练User experience walkthrough

下面概述了在 Xbox 上使用 VES 时用户将遇到的情况,并在深入了解 VES 工作原理的详细信息之前,帮助设置上下文。The following is an overview of what a user would experience when using VES on Xbox, and it should help set the context before diving into the details of how VES works.

  • 用户打开了 Xbox 控制台并想要浏览其应用,以查找感兴趣的内容:User turns on the Xbox console and wants to browse through their apps to find something of interest:

    用户: "你好 Cortana,请打开我的游戏和应用程序"User: "Hey Cortana, open My Games and Apps"

  • 用户处于活动状态的侦听模式 (ALM) ,这意味着,控制台现在正在侦听用户调用屏幕上可见的控件,而无需说 "你好 Cortana"。User is left in Active Listening Mode (ALM), meaning the console is now listening for the user to invoke a control that’s visible on the screen, without needing to say, “Hey Cortana” each time. 用户现在可以切换到查看应用并滚动应用列表:User can now switch to view apps and scroll through the app list:

    用户: "应用程序"User: "applications"

  • 若要滚动视图,用户可以简单地说:To scroll the view, user can simply say:

    用户: "向下滚动"User: "scroll down"

  • 用户会看到感兴趣的应用程序的 box 画面,但忘记了名称。User sees the box art for the app they are interested in but forgot the name. 用户要求显示语音提示标签:User asks for voice tip labels to be displayed:

    用户: "显示标签"User: "show labels"

  • 现在清楚地说,应用程序可以启动:Now that it's clear what to say, the app can be launched:

    用户: "电影和电视"User: "movies and TV"

  • 若要退出活动侦听模式,用户会告诉 Xbox 停止侦听:To exit active listening mode, user tells Xbox to stop listening:

    用户: "停止侦听"User: "stop listening"

  • 稍后,可以通过以下方式启动新的活动侦听会话:Later on, a new active listening session can be started with:

    用户: "你好 Cortana,进行选择" 或 "你好 Cortana",请选择 "User: "Hey Cortana, make a selection" or "Hey Cortana, select"

UI 自动化依赖项UI automation dependency

VES 是 UI 自动化客户端,它依赖于应用程序的 UI 自动化提供程序公开的信息。VES is a UI Automation client and relies on information exposed by the app through its UI Automation providers. 这与 Windows 平台上的 "讲述人" 功能已使用的基础结构相同。This is the same infrastructure already being used by the Narrator feature on Windows platforms. UI 自动化启用对用户界面元素的编程访问,包括控件的名称、控件的类型以及它所实现的控件模式。UI Automation enables programmatic access to user interface elements, including the name of the control, its type and what control patterns it implements. 当应用程序中的 UI 变化时,VES 将响应 UIA 更新事件,并重新分析更新后的 UI 自动化树,以查找所有可操作的项,并使用此信息来生成语音识别语法。As the UI changes in the app, VES will react to UIA update events and re-parse the updated UI Automation tree to find all the actionable items, using this information to build a speech recognition grammar.

所有 UWP 应用都有权访问 UI 自动化框架,并可公开有关 UI 的信息,这些信息独立于在 (XAML、DirectX/Direct3D、Xamarin 等 ) 上构建的图形框架。All UWP apps have access to the UI Automation framework and can expose information about the UI independent of which graphics framework they are built upon (XAML, DirectX/Direct3D, Xamarin, etc.). 在某些情况下,如 XAML,大多数繁重的提升都是通过框架来完成的,这大大减少了支持讲述人和 VES 所需的工作。In some cases, like XAML, most of the heavy lifting is done by the framework, greatly reducing the work required to support Narrator and VES.

有关 UI 自动化的详细信息,请参阅 Ui 自动化基础知识For more info on UI Automation see UI Automation Fundamentals.

控件调用名称Control invocation name

VES 采用以下试探法来确定将哪个短语注册到语音识别器作为控件名称 (ie。用户调用控件所需的对话) 。VES employs the following heuristic for determining what phrase to register with the speech recognizer as the control’s name (ie. what the user needs to speak to invoke the control). 这也是在语音提示标签中显示的短语。This is also the phrase that will show up in the voice tip label.

名称的源(按优先级顺序):Source of Name in order of priority:

  1. 如果元素具有 LabeledBy 附加属性,则 VES 将使用 AutomationProperties.Name 此文本标签的。If the element has a LabeledBy attached property, VES will use the AutomationProperties.Name of this text label.
  2. AutomationProperties.Name 元素的。AutomationProperties.Name of the element. 在 XAML 中,将使用控件的文本内容作为的默认值 AutomationProperties.NameIn XAML, the text content of the control will be used as the default value for AutomationProperties.Name.
  3. 如果控件是一个用名或按钮,则 VES 将查找具有有效的第一个子元素 AutomationProperties.NameIf the control is a ListItem or Button, VES will look for the first child element with a valid AutomationProperties.Name.

可操作控件Actionable controls

如果控件实现以下某个自动化控件模式,则 VES 会认为该控件可操作:VES considers a control actionable if it implements one of the following Automation control patterns:

  • InvokePattern (例如InvokePattern (eg. Button) -表示启动或执行单个明确操作并且在激活后不维护状态的控件。Button)- Represents controls that initiate or perform a single, unambiguous action and do not maintain state when activated.

  • TogglePattern (例如TogglePattern (eg. 复选框) -表示一个控件,该控件可以循环一组状态并在设置后保持状态。Check Box) - Represents a control that can cycle through a set of states and maintain a state once set.

  • SelectionItemPattern (例如SelectionItemPattern (eg. 组合框) -表示一个控件,该控件充当可选子项集合的容器。Combo Box) - Represents a control that acts as a container for a collection of selectable child items.

  • ExpandCollapsePattern (例如ExpandCollapsePattern (eg. 组合框) -表示直观展开以显示内容并折叠以隐藏内容的控件。Combo Box) - Represents controls that visually expand to display content and collapse to hide content.

  • ScrollPattern (例如ScrollPattern (eg. List) -表示充当子元素集合的可滚动容器的控件。List) - Represents controls that act as scrollable containers for a collection of child elements.

可滚动容器Scrollable containers

对于支持 ScrollPattern 的可滚动容器,VES 将侦听语音命令,如 "向左滚动"、"向右滚动" 等。当用户触发其中一个命令时,将调用具有相应参数的 Scroll。For scrollable containers that support the ScrollPattern, VES will listen for voice commands like “scroll left”, “scroll right”, etc. and will invoke Scroll with the appropriate parameters when the user triggers one of these commands. 根据和属性的值注入滚动命令 HorizontalScrollPercent VerticalScrollPercentScroll commands are injected based on the value of the HorizontalScrollPercent and VerticalScrollPercent properties. 例如,如果 HorizontalScrollPercent 大于0,将添加 "向左滚动",如果小于100,则将添加 "向右滚动" 等等。For instance, if HorizontalScrollPercent is greater than 0, “scroll left” will be added, if it’s less than 100, “scroll right” will be added, and so on.

讲述人重叠Narrator overlap

"讲述人" 应用程序也是 UI 自动化客户端,并使用 AutomationProperties.Name 属性作为其为当前所选 UI 元素读取的文本的源之一。The Narrator application is also a UI Automation client and uses the AutomationProperties.Name property as one of the sources for the text it reads for the currently selected UI element. 为了提供更好的可访问性体验,许多应用程序开发人员已尝试使用 Name 较长的描述性文本重载该属性,目的是在讲述人阅读时提供详细信息和上下文。To provide a better accessibility experience many app developers have resorted to overloading the Name property with long descriptive text with the goal of providing more information and context when read by Narrator. 但是,这会导致两个功能之间存在冲突: VES 需要与控件的可见文本匹配或匹配的短短语,而讲述人却受益于更长的描述性短语,以提供更好的上下文。However, this causes a conflict between the two features: VES needs short phrases that match or closely match the visible text of the control, while Narrator benefits from longer, more descriptive phrases to give better context.

若要解决此问题,请从 Windows 10 创意者更新开始,更新了讲述人以查看 AutomationProperties.HelpText 属性。To resolve this, starting with Windows 10 Creators Update, Narrator was updated to also look at the AutomationProperties.HelpText property. 如果此属性不为空,则除了外,讲述人还会说出其内容 AutomationProperties.NameIf this property is not empty, Narrator will speak its contents in addition to AutomationProperties.Name. 如果 HelpText 为空,则讲述人只会读取名称的内容。If HelpText is empty, Narrator will only read the contents of Name. 这将允许在需要时使用较长的描述性字符串,但会在属性中保留较短的语音识别短语 NameThis will enable longer descriptive strings to be used where needed, but maintains a shorter, speech recognition friendly phrase in the Name property.

显示按钮后面的代码的关系图,其中包括 AutomationProperties.Name 和 Automationproperties.livesetting。 HelpText 表明启用了语音的 Shell 侦听名称配置。

有关详细信息,请参阅 UI 中的辅助功能支持的自动化属性For more info see Automation Properties for Accessibility Support in UI.

活动侦听模式 (ALM) Active Listening Mode (ALM)

输入 ALMEntering ALM

在 Xbox 上,VES 并不会不断地侦听语音输入。On Xbox, VES is not constantly listening for speech input. 用户需要通过口述显式进入活动的侦听模式:The user needs to enter Active Listening Mode explicitly by saying:

  • "你好 Cortana,请选择",或“Hey Cortana, select”, or
  • "你好 Cortana,进行选择"“Hey Cortana, make a selection”

还有其他几个 Cortana 命令还会使用户处于活动状态,例如 "你好 Cortana,登录" 或 "你好 Cortana,回家"。There are several other Cortana commands that also leave the user in active listening upon completion, for example “Hey Cortana, sign in” or “Hey Cortana, go home”.

输入 ALM 将产生以下影响:Entering ALM will have the following effect:

  • Cortana 覆盖将显示在右上角,告诉用户他们可以说出的内容。The Cortana overlay will be shown in the top right corner, telling the user they can say what they see. 用户讲话时,语音识别器识别的短语片段也会显示在此位置中。While the user is speaking, phrase fragments that are recognized by the speech recognizer will also be shown in this location.

  • VES 分析 UIA 树,查找所有可操作的控件,在语音识别语法中注册其文本,并启动持续侦听会话。VES parses the UIA tree, finds all actionable controls, registers their text in the speech recognition grammar and starts a continuous listening session.

    显示带的屏幕截图,其中突出显示了 "显示标签" 选项。

正在退出 ALMExiting ALM

当用户使用语音与 UI 交互时,系统将保留在 ALM 中。The system will remain in ALM while the user is interacting with the UI using voice. 可以通过两种方式退出 ALM:There are two ways to exit ALM:

  • 用户显式显示 "停止侦听",或User explicitly says, “stop listening”, or
  • 如果在开始输入 ALM 或自上次肯定识别后的17秒内没有肯定识别,则会超时A timeout will occur If there isn’t a positive recognition within 17 seconds of entering ALM or since the last positive recognition

调用控件Invoking controls

在 ALM 中,用户可以使用语音与 UI 交互。When in ALM the user can interact with the UI using voice. 如果 UI 配置正确 (名称属性与可见文本) 匹配,则使用语音执行操作应该是一种无缝、自然的体验。If the UI is configured correctly (with Name properties matching the visible text), using voice to perform actions should be a seamless, natural experience. 用户应能够只说他们在屏幕上看到的内容。The user should be able to just say what they see on the screen.

在 Xbox 上覆盖 UIOverlay UI on Xbox

为控件派生的名称 VES 可能不同于 UI 中的实际可见文本。The name VES derives for a control may be different than the actual visible text in the UI. 这可能是由于控件的 Name 属性或 LabeledBy 显式设置为不同字符串的附加元素。This can be due to the Name property of the control or the attached LabeledBy element being explicitly set to different string. 或者,控件不具有 GUI 文本,而只包含图标或图像元素。Or, the control does not have GUI text but only an icon or image element.

在这些情况下,用户需要有一种方法来查看调用此类控件所需的方法。In these cases, users need a way to see what needs to be said in order to invoke such a control. 因此,一旦进入活动状态,就可以通过 "显示标签" 显示语音提示。Therefore, once in active listening, voice tips can be displayed by saying “show labels”. 这会使语音提示标签显示在每个可操作控件的顶部。This causes voice tip labels to appear on top of every actionable control.

存在100个标签的限制,因此,如果应用的 UI 具有比100更具可操作性的控件,将会出现一些不会显示语音提示标签的部分。There is a limit of 100 labels, so if the app’s UI has more actionable controls than 100 there will be some that will not have voice tip labels shown. 在这种情况下选择哪些标签是不确定的,因为它依赖于当前 UI 的结构和组合,因为它是在 UIA 树中首次枚举的。Which labels are chosen in this case is not deterministic, as it depends on the structure and composition of the current UI as first enumerated in the UIA tree.

显示语音提示标签后,没有用于隐藏它们的命令,它们将保持可见,直到发生以下事件之一:Once voice tip labels are shown there is no command to hide them, they will remain visible until one of the following events occur:

  • 用户调用控件user invokes a control
  • 用户离开当前场景user navigates away from the current scene
  • 用户说 "停止侦听"user says, “stop listening”
  • 活动侦听模式超时active listening mode times out

语音提示标签的位置Location of voice tip labels

语音提示标签在控件的 BoundingRectangle 中水平和垂直居中。Voice tip labels are horizontally and vertically centered within the control’s BoundingRectangle. 当控件很小且分组紧密时,某些标签可能会重叠,并会被其他人遮盖,并且 VES 会尝试将这些标签分开以分隔它们,确保它们可见。When controls are small and tightly grouped, the labels can overlap/become obscured by others and VES will try to push these labels apart to separate them and ensure they are visible. 但是,这不一定会在100% 的时间运行。However, this is not guaranteed to work 100% of the time. 如果 UI 非常拥挤,则很可能会导致某些标签被其他人遮盖。If there is a very crowded UI, it will likely result in some labels being obscured by others. 请查看包含 "显示标签" 的 UI,以确保有足够的空间来显示语音提示可见性。Please review your UI with “show labels” to ensure there is adequate room for voice tip visibility.

控件边框内水平和垂直居中的语音提示标签屏幕截图。

组合框Combo boxes

展开组合框中的每个项时,组合框中的每个项都将获取其自己的语音提示标签,并且通常会位于下拉列表后面的现有控件的顶部。When a combo box is expanded each individual item in the combo box gets its own voice tip label and often these will be on top of existing controls behind drop down list. 若要避免显示混乱的、令人困惑的 muddle 标签 (其中组合框项标签与组合框后的控件标签混合) 在展开组合框时,只会显示其子项的标签; 所有其他语音提示标签都将隐藏。To avoid presenting a cluttered and confusing muddle of labels (where combo box item labels are intermixed with the labels of controls behind the combo box) when a combo box is expanded only the labels for its child items will be shown; all other voice tip labels will be hidden. 然后,用户可以选择某个下拉项或 "关闭" 组合框。The user can then either select one of the drop-down items or “close” the combo box.

  • 折叠组合框上的标签:Labels on collapsed combo boxes:

    显示和声音视频输出窗口的屏幕截图,其中的标签位于折叠组合框上。

  • 展开组合框上的标签:Labels on expanded combo box:

    带有展开组合框中的标签的显示和声音视频输出窗口的屏幕截图。

可滚动控件Scrollable controls

对于滚动控件,滚动命令的语音提示将在控件的每个边缘上居中。For scrollable controls, the voice tips for the scroll commands will be centered on each of the edges of the control. 将仅为可操作的滚动方向显示语音提示。例如,如果垂直滚动不可用,则不会显示 "向上滚动" 和 "向下滚动"。Voice tips will only be shown for the scroll directions that are actionable, so for example if vertical scrolling is not available, “scroll up” and “scroll down” will not be shown. 当存在多个可滚动区域时,VES 将使用序号来区分它们 (例如。When multiple scrollable regions are present VES will use ordinals to differentiate between them (eg. "向右滚动 1"、"向右滚动 2" 等 ) 。“Scroll right 1”, “Scroll right 2”, etc.).

向左滚动并向右滚动 U I 方向的屏幕截图。

消除歧义Disambiguation

如果多个 UI 元素具有相同的名称,或语音识别器与多个候选项匹配,则 VES 将进入歧义消除模式。When multiple UI elements have the same Name, or the speech recognizer matched multiple candidates, VES will enter disambiguation mode. 在此模式下,将为所涉及的元素显示语音提示标签,以便用户可以选择正确的标签。In this mode voice tip labels will be shown for the elements involved so that the user can select the right one. 用户可以通过说 "取消" 取消消除歧义模式。The user can cancel out of disambiguation mode by saying "cancel".

例如:For example:

  • 处于活动状态的侦听模式下,消除歧义之前;用户说 "我不明确":In Active Listening Mode, before disambiguation; user says, "Am I Ambiguous":

    活动侦听模式的屏幕截图现在可以说出显示的选项,并且按钮上不显示任何标签。

  • 两个按钮都匹配;消除歧义开始:Both buttons matched; disambiguation started:

    活动侦听模式的屏幕截图,其中显示了所需的选项,以及按钮上的项1和项2标签。

  • 选择 "选择 2" 时显示单击操作:Showing click action when "Select 2" was chosen:

    活动侦听模式的屏幕截图现在,您可以说到 "显示的内容" 选项,而在第一个按钮上,"我的标签" 不明确。

示例 UISample UI

下面是基于 XAML 的 UI 的示例,以各种方式设置 AutomationProperties.Name:Here’s an example of a XAML based UI, setting the AutomationProperties.Name in various ways:

<Page
    x:Class="VESSampleCSharp.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:VESSampleCSharp"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d">
    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <Button x:Name="button1" Content="Hello World" HorizontalAlignment="Left" Margin="44,56,0,0" VerticalAlignment="Top"/>
        <Button x:Name="button2" AutomationProperties.Name="Launch Game" Content="Launch" HorizontalAlignment="Left" Margin="44,106,0,0" VerticalAlignment="Top" Width="99"/>
        <TextBlock AutomationProperties.Name="Day of Week" x:Name="label1" HorizontalAlignment="Left" Height="22" Margin="168,62,0,0" TextWrapping="Wrap" Text="Select Day of Week:" VerticalAlignment="Top" Width="137"/>
        <ComboBox AutomationProperties.LabeledBy="{Binding ElementName=label1}" x:Name="comboBox" HorizontalAlignment="Left" Margin="310,57,0,0" VerticalAlignment="Top" Width="120">
            <ComboBoxItem Content="Monday" IsSelected="True"/>
            <ComboBoxItem Content="Tuesday"/>
            <ComboBoxItem Content="Wednesday"/>
            <ComboBoxItem Content="Thursday"/>
            <ComboBoxItem Content="Friday"/>
            <ComboBoxItem Content="Saturday"/>
            <ComboBoxItem Content="Sunday"/>
        </ComboBox>
        <Button x:Name="button3" HorizontalAlignment="Left" Margin="44,156,0,0" VerticalAlignment="Top" Width="213">
            <Grid>
                <TextBlock AutomationProperties.Name="Accept">Accept Offer</TextBlock>
                <TextBlock Margin="0,25,0,0" Foreground="#FF5A5A5A">Exclusive offer just for you</TextBlock>
            </Grid>
        </Button>
    </Grid>
</Page>

在此示例中,用户界面的外观将类似于,无需使用语音提示标签。Using the above sample here is what the UI will look like with and without voice tip labels.

  • 处于活动状态的侦听模式下,不显示标签:In Active Listening Mode, without labels shown:

    活动侦听模式的屏幕截图,其中包含以查看标签,即显示 "显示标签" 选项且不显示标签。

  • 在活动监听模式下,在用户显示 "显示标签" 后:In Active Listening Mode, after user says "show labels":

    活动侦听模式的屏幕截图(如果已完成),假设显示 "停止侦听" 选项和在 U I 控件上显示的标签。

对于 button1 ,XAML 自动 AutomationProperties.Name 使用控件的可视文本内容中的文本填充该属性。In the case of button1, XAML auto populates the AutomationProperties.Name property using text from the control’s visible text content. 这就是为什么即使没有显式集,也会出现语音提示标签 AutomationProperties.NameThis is why there is a voice tip label even though there isn't an explicit AutomationProperties.Name set.

对于 button2 ,我们将显式设置 AutomationProperties.Name 为控件文本以外的内容。With button2, we explicitly set the AutomationProperties.Name to something other than the text of the control.

使用 comboBox ,我们使用 LabeledBy 属性 label1 作为自动化的源进行引用 Name ,而在中, label1 我们将设置 AutomationProperties.Name 为比屏幕上呈现的内容更自然的短语 ( "星期的日期",而不是 "选择星期几" ) 。With comboBox, we used the LabeledBy property to reference label1 as the source of the automation Name, and in label1 we set the AutomationProperties.Name to a more natural phrase than what is rendered on screen (“Day of Week” rather than “Select Day of Week”).

最后,使用时 button3 ,VES Name 从第一个子元素中获取,因为 button3 本身没有 AutomationProperties.Name 集。Finally, with button3, VES grabs the Name from the first child element since button3 itself does not have an AutomationProperties.Name set.

另请参阅See also