使用語音來叫用 UI 元素Using Speech to Invoke UI Elements

啟用語音的 Shell (VES) 是 Windows 語音平臺的延伸模組,可在應用程式內啟用一流的語音體驗,讓使用者可以使用語音來叫用螢幕控制項,以及透過聽寫插入文字。Voice Enabled Shell (VES) is an extension to the Windows Speech Platform that enables a first-class speech experience inside apps, allowing users to use speech for invoking on-screen controls and to insert text via dictation. VES 致力於在所有 Windows Shell 和裝置上提供常見的端對端體驗,並提供應用程式開發人員所需的最少投入時間。VES strives to provide a common end-to-end see-it-say-it experience on all Windows Shells and devices, with minimum effort required from app developers. 為了達成此目的,它會利用 Microsoft Speech Platform 和消費者介面自動化 (UIA) 架構。To achieve this, it leverages the Microsoft Speech Platform and the UI Automation (UIA) framework.

使用者經驗逐步解說User experience walkthrough

以下是在 Xbox 上使用 VES 時,使用者會遇到什麼體驗的總覽,並且應該先協助設定內容,再深入瞭解如何執行 VES 的詳細資料。The following is an overview of what a user would experience when using VES on Xbox, and it should help set the context before diving into the details of how VES works.

  • 使用者開啟 Xbox 主控台,並想要流覽其應用程式以尋找感興趣的內容:User turns on the Xbox console and wants to browse through their apps to find something of interest:

    使用者: "嗨 Cortana,開啟我的遊戲和應用程式User: "Hey Cortana, open My Games and Apps"

  • 使用者停留在主動接聽模式 (ALM) ,這表示主控台現在正在接聽使用者來叫用畫面上可見的控制項,而不需要每次「嗨 Cortana」。User is left in Active Listening Mode (ALM), meaning the console is now listening for the user to invoke a control that’s visible on the screen, without needing to say, “Hey Cortana” each time. 使用者現在可以切換以查看應用程式,並在應用程式清單中進行滾動:User can now switch to view apps and scroll through the app list:

    使用者:「應用程式」User: "applications"

  • 若要滾動查看,使用者可以直接說:To scroll the view, user can simply say:

    使用者:「向下滾動」User: "scroll down"

  • 使用者會看到他們感興趣的應用程式封面,但忘記名稱。User sees the box art for the app they are interested in but forgot the name. 使用者要求顯示語音提示標籤:User asks for voice tip labels to be displayed:

    使用者:「顯示標籤」User: "show labels"

  • 現在您可以清楚說,可以啟動應用程式:Now that it's clear what to say, the app can be launched:

    使用者:「電影和電視」User: "movies and TV"

  • 若要結束主動接聽模式,使用者可告知 Xbox 停止聆聽:To exit active listening mode, user tells Xbox to stop listening:

    使用者:「停止接聽」User: "stop listening"

  • 之後,可以使用下列方式啟動新的作用中接聽會話:Later on, a new active listening session can be started with:

    使用者:「嗨 Cortana,進行選取」或「嗨 Cortana,請選取」User: "Hey Cortana, make a selection" or "Hey Cortana, select"

使用者介面自動化相依性UI automation dependency

VES 是消費者介面自動化用戶端,而且會依賴應用程式透過其消費者介面自動化提供者公開的資訊。VES is a UI Automation client and relies on information exposed by the app through its UI Automation providers. 這與 Windows 平臺上的 [朗讀程式] 功能已使用相同的基礎結構。This is the same infrastructure already being used by the Narrator feature on Windows platforms. 消費者介面自動化可讓您以程式設計方式存取使用者介面元素,包括控制項的名稱、其類型,以及它所執行的控制項模式。UI Automation enables programmatic access to user interface elements, including the name of the control, its type and what control patterns it implements. 當應用程式中的 UI 變更時,VES 將會回應 UIA 更新事件,並重新剖析更新的消費者介面自動化樹狀結構,以找出所有可採取動作的專案,並使用這項資訊來建立語音辨識文法。As the UI changes in the app, VES will react to UIA update events and re-parse the updated UI Automation tree to find all the actionable items, using this information to build a speech recognition grammar.

所有 UWP 應用程式都可以存取消費者介面自動化架構,並且可以公開有關 UI 的資訊,而這些資訊與根據 (XAML、DirectX/Direct3D、Xamarin 等 ) 建立的圖形架構無關。All UWP apps have access to the UI Automation framework and can expose information about the UI independent of which graphics framework they are built upon (XAML, DirectX/Direct3D, Xamarin, etc.). 在某些情況下(例如 XAML),大部分的繁重工作都是由架構來完成,大幅減少支援朗讀程式和 VES 所需的工作。In some cases, like XAML, most of the heavy lifting is done by the framework, greatly reducing the work required to support Narrator and VES.

如需消費者介面自動化的詳細資訊,請參閱 消費者介面自動化基礎For more info on UI Automation see UI Automation Fundamentals.

控制項調用名稱Control invocation name

VES 採用下列啟發學習法來決定要使用語音辨識器註冊哪一個片語,作為控制項名稱 (ie。使用者必須說的,才能叫用控制項) 。VES employs the following heuristic for determining what phrase to register with the speech recognizer as the control’s name (ie. what the user needs to speak to invoke the control). 這也是要顯示在語音提示標籤中的片語。This is also the phrase that will show up in the voice tip label.

依優先順序排列的名稱來源:Source of Name in order of priority:

  1. 如果專案具有 LabeledBy 附加屬性,則 VES 會使用 AutomationProperties.Name 此文字標籤的。If the element has a LabeledBy attached property, VES will use the AutomationProperties.Name of this text label.
  2. AutomationProperties.Name 專案的。AutomationProperties.Name of the element. 在 XAML 中,控制項的文字內容將作為的預設值使用 AutomationProperties.NameIn XAML, the text content of the control will be used as the default value for AutomationProperties.Name.
  3. 如果控制項是專案1或按鈕,則 VES 會尋找具有有效的第一個子項目 AutomationProperties.NameIf the control is a ListItem or Button, VES will look for the first child element with a valid AutomationProperties.Name.

可操作的控制項Actionable controls

如果控制項執行下列其中一個 Automation 控制項模式,則會將控制項視為可採取動作:VES considers a control actionable if it implements one of the following Automation control patterns:

  • InvokePattern (例如InvokePattern (eg. 按鈕) -代表在啟用時起始或執行單一、明確動作和不維持狀態的控制項。Button)- Represents controls that initiate or perform a single, unambiguous action and do not maintain state when activated.

  • TogglePattern (例如TogglePattern (eg. 核取方塊) -表示可以在一組狀態之間迴圈的控制項,並在設定之後維護狀態。Check Box) - Represents a control that can cycle through a set of states and maintain a state once set.

  • SelectionItemPattern (例如SelectionItemPattern (eg. 下拉式方塊) -表示作為可選取子專案集合之容器的控制項。Combo Box) - Represents a control that acts as a container for a collection of selectable child items.

  • ExpandCollapsePattern (例如ExpandCollapsePattern (eg. 下拉式方塊) -表示以視覺化方式展開來顯示內容和折迭以隱藏內容的控制項。Combo Box) - Represents controls that visually expand to display content and collapse to hide content.

  • ScrollPattern (例如ScrollPattern (eg. List) -代表可作為子項目集合之可滾動容器的控制項。List) - Represents controls that act as scrollable containers for a collection of child elements.

可滾動的容器Scrollable containers

針對可滾動的容器(支援 ScrollPattern),VES 會接聽語音命令,例如「向左滾動」、「向右滾動」等等。當使用者觸發其中一個命令時,就會使用適當的參數叫用 Scroll。For scrollable containers that support the ScrollPattern, VES will listen for voice commands like “scroll left”, “scroll right”, etc. and will invoke Scroll with the appropriate parameters when the user triggers one of these commands. 捲軸命令會根據和屬性的值插入 HorizontalScrollPercent VerticalScrollPercentScroll commands are injected based on the value of the HorizontalScrollPercent and VerticalScrollPercent properties. 比方說,如果 HorizontalScrollPercent 大於0,將會加入「左滾」,如果小於100,則會加入「向右滾動」等等。For instance, if HorizontalScrollPercent is greater than 0, “scroll left” will be added, if it’s less than 100, “scroll right” will be added, and so on.

朗讀程式重迭Narrator overlap

「朗讀程式」應用程式也是消費者介面自動化用戶端,並使用 AutomationProperties.Name 屬性做為針對目前選取的 UI 元素所讀取之文字的其中一個來源。The Narrator application is also a UI Automation client and uses the AutomationProperties.Name property as one of the sources for the text it reads for the currently selected UI element. 為了提供更好的協助工具體驗,許多應用程式開發人員都已使用較長的描述性文字來進行屬性的多載,其 Name 目標是在朗讀朗讀程式時提供詳細資訊和內容。To provide a better accessibility experience many app developers have resorted to overloading the Name property with long descriptive text with the goal of providing more information and context when read by Narrator. 不過,這會造成兩個功能之間的衝突: VES 需要符合或完全符合控制項可見文字的簡短片語,而朗讀程式從較長、更具描述性的片語中獲益,以提供更好的內容。However, this causes a conflict between the two features: VES needs short phrases that match or closely match the visible text of the control, while Narrator benefits from longer, more descriptive phrases to give better context.

若要解決這個問題,請從 Windows 10 Creators Update 開始,朗讀程式也已更新,以查看 AutomationProperties.HelpText 屬性。To resolve this, starting with Windows 10 Creators Update, Narrator was updated to also look at the AutomationProperties.HelpText property. 如果這個屬性不是空的,除了之外,朗讀程式還會說出其內容 AutomationProperties.NameIf this property is not empty, Narrator will speak its contents in addition to AutomationProperties.Name. 如果 HelpText 是空的,則 [朗讀程式] 只會讀取名稱的內容。If HelpText is empty, Narrator will only read the contents of Name. 這可讓您在需要時使用較長的描述性字串,但會在屬性中維護較短的語音辨識易記片語 NameThis will enable longer descriptive strings to be used where needed, but maintains a shorter, speech recognition friendly phrase in the Name property.

顯示按鈕背後的程式碼的圖表,其中包含 AutomationProperties.Name 和 AutomationProperties。 HelpText 顯示已啟用語音的 Shell 會接聽名稱設定。

如需詳細資訊,請參閱 UI 中協助工具支援的自動化屬性For more info see Automation Properties for Accessibility Support in UI.

(ALM) 的主動式接聽模式Active Listening Mode (ALM)

進入 ALMEntering ALM

在 Xbox 上,VES 不會不斷地接聽語音輸入。On Xbox, VES is not constantly listening for speech input. 使用者需要明確地進入主動接聽模式,方法是說:The user needs to enter Active Listening Mode explicitly by saying:

  • "嗨 Cortana、select 或“Hey Cortana, select”, or
  • 「嗨 Cortana,請選擇」“Hey Cortana, make a selection”

另外還有其他幾個 Cortana 命令,也會在完成時讓使用者保持作用中狀態,例如「嗨 Cortana、登入」或「嗨 Cortana,前往 home」。There are several other Cortana commands that also leave the user in active listening upon completion, for example “Hey Cortana, sign in” or “Hey Cortana, go home”.

進入 ALM 將會有下列效果:Entering ALM will have the following effect:

  • Cortana 重迭會顯示在右上角,告訴使用者他們會看到的內容。The Cortana overlay will be shown in the top right corner, telling the user they can say what they see. 當使用者說話時,語音辨識器所辨識的片語片段也會顯示在此位置中。While the user is speaking, phrase fragments that are recognized by the speech recognizer will also be shown in this location.

  • VES 會剖析 UIA 樹狀結構、尋找所有可操作的控制項、在語音辨識文法中註冊其文字,以及開始連續接聽會話。VES parses the UIA tree, finds all actionable controls, registers their text in the speech recognition grammar and starts a continuous listening session.

    顯示醒目提示 [顯示標籤] 選項的 [顯示標籤] 選項的螢幕擷取畫面。

離開 ALMExiting ALM

當使用者使用語音與 UI 互動時,系統將會保留在 ALM 中。The system will remain in ALM while the user is interacting with the UI using voice. 有兩種方式可以離開 ALM:There are two ways to exit ALM:

  • 使用者明確指出、「停止接聽」或User explicitly says, “stop listening”, or
  • 如果在輸入 ALM 或自上次正面辨識後的17秒內沒有正面辨識,將會發生超時狀況A timeout will occur If there isn’t a positive recognition within 17 seconds of entering ALM or since the last positive recognition

叫用控制項Invoking controls

在 ALM 中,使用者可以使用語音與 UI 互動。When in ALM the user can interact with the UI using voice. 如果已正確設定 UI (具有符合可見文字) 的名稱屬性,則使用語音來執行動作應該是順暢的自然體驗。If the UI is configured correctly (with Name properties matching the visible text), using voice to perform actions should be a seamless, natural experience. 使用者應該能夠直接說出它們在螢幕上看到的內容。The user should be able to just say what they see on the screen.

Xbox 上的重迭 UIOverlay UI on Xbox

針對控制項衍生的名稱 VES 可能與 UI 中的實際可見文字不同。The name VES derives for a control may be different than the actual visible text in the UI. 這可能是因為控制項的 Name 屬性,或是 LabeledBy 明確設定為不同字串的附加元素所致。This can be due to the Name property of the control or the attached LabeledBy element being explicitly set to different string. 或者,控制項沒有 GUI 文字,而只有圖示或影像元素。Or, the control does not have GUI text but only an icon or image element.

在這些情況下,使用者需要一種方式來查看必須說的,才能叫用這類控制項。In these cases, users need a way to see what needs to be said in order to invoke such a control. 因此,在使用中的接聽之後,您可以藉由說出 [顯示標籤] 來顯示語音提示。Therefore, once in active listening, voice tips can be displayed by saying “show labels”. 這會導致語音提示標籤顯示在每個可採取動作的控制項之上。This causes voice tip labels to appear on top of every actionable control.

有100標籤的限制,因此,如果應用程式的 UI 具有比100更具可操作的控制項,將會有一些不會顯示語音提示標籤的部分。There is a limit of 100 labels, so if the app’s UI has more actionable controls than 100 there will be some that will not have voice tip labels shown. 在此情況下選擇的標籤不具決定性,因為它相依于目前 UI 的結構和組合,作為 UIA 樹狀結構中的第一個列舉。Which labels are chosen in this case is not deterministic, as it depends on the structure and composition of the current UI as first enumerated in the UIA tree.

一旦顯示語音提示標籤,就不會有任何命令可以隱藏它們,直到發生下列其中一個事件時,才會顯示這些標籤:Once voice tip labels are shown there is no command to hide them, they will remain visible until one of the following events occur:

  • 使用者叫用控制項user invokes a control
  • 使用者離開目前的場景user navigates away from the current scene
  • 使用者說:「停止接聽」user says, “stop listening”
  • 主動式接聽模式超時active listening mode times out

語音提示標籤的位置Location of voice tip labels

語音提示標籤會在控制項的 BoundingRectangle 中水準和垂直置中。Voice tip labels are horizontally and vertically centered within the control’s BoundingRectangle. 當控制項較小且緊密地分組時,其他標籤可能會重迭/被他人遮蔽,而且 VES 會嘗試將這些標籤分開來分開,並確保它們是可見的。When controls are small and tightly grouped, the labels can overlap/become obscured by others and VES will try to push these labels apart to separate them and ensure they are visible. 不過,這不保證會在100% 的時間內運作。However, this is not guaranteed to work 100% of the time. 如果有非常擁擠的 UI,可能會導致某些標籤被他人遮蔽。If there is a very crowded UI, it will likely result in some labels being obscured by others. 請使用 [顯示標籤] 檢查您的 UI,以確保有足夠的聲音提示可見空間。Please review your UI with “show labels” to ensure there is adequate room for voice tip visibility.


下拉式方塊Combo boxes

展開下拉式方塊中的每個個別專案時,下拉式方塊中的每個個別專案都會取得自己的語音提示標籤,而且通常會在下拉式清單的現有控制項上方。When a combo box is expanded each individual item in the combo box gets its own voice tip label and often these will be on top of existing controls behind drop down list. 為了避免呈現混亂而令人困惑的標籤 muddle (其中的下拉式方塊專案標籤會與下拉式方塊後方的控制項標籤混合) 當下拉式方塊展開時,只會顯示其子專案的標籤; 所有其他語音提示標籤將會隱藏。To avoid presenting a cluttered and confusing muddle of labels (where combo box item labels are intermixed with the labels of controls behind the combo box) when a combo box is expanded only the labels for its child items will be shown; all other voice tip labels will be hidden. 然後,使用者可以選取其中一個下拉式專案或 [關閉] 下拉式方塊。The user can then either select one of the drop-down items or “close” the combo box.

  • 折迭下拉式方塊上的標籤:Labels on collapsed combo boxes:


  • 展開的下拉式方塊上的標籤:Labels on expanded combo box:


可滾動的控制項Scrollable controls

針對可滾動的控制項,捲軸命令的聲音提示將會以控制項的每個邊緣為中心。For scrollable controls, the voice tips for the scroll commands will be centered on each of the edges of the control. 語音秘訣只會顯示為可採取動作的捲軸方向,因此例如,如果無法使用垂直捲動條,將不會顯示「向上快移」和「向下滾動」。Voice tips will only be shown for the scroll directions that are actionable, so for example if vertical scrolling is not available, “scroll up” and “scroll down” will not be shown. 當有多個可捲動區域存在時,會使用序數來區分它們 (例如。When multiple scrollable regions are present VES will use ordinals to differentiate between them (eg. ) 的「向右滾動1」、「向右滾動2」等等。“Scroll right 1”, “Scroll right 2”, etc.).

在水準滾動 U I 上向左和向右滾動的聲音提示螢幕擷取畫面。


當多個 UI 元素有相同的名稱,或語音辨識器符合多個候選項目時,VES 將會進入去除混淆模式。When multiple UI elements have the same Name, or the speech recognizer matched multiple candidates, VES will enter disambiguation mode. 在此模式中,會顯示相關元素的語音提示標籤,讓使用者可以選取正確的專案。In this mode voice tip labels will be shown for the elements involved so that the user can select the right one. 使用者可以藉由說出 [取消] 來取消去除混淆模式。The user can cancel out of disambiguation mode by saying "cancel".

例如:For example:

  • 在進行混淆之前的主動式接聽模式;使用者說:「我不清楚」:In Active Listening Mode, before disambiguation; user says, "Am I Ambiguous":


  • 這兩個按鈕相符;已開始消除混淆:Both buttons matched; disambiguation started:


  • 選擇 [選取 2] 時顯示 click 動作:Showing click action when "Select 2" was chosen:

    使用中的 [使用中] 接聽模式的螢幕擷取畫面,您可以看到顯示的選項,而且我的第一個按鈕上的標籤不明確。

範例 UISample UI

以下是以 XAML 為基礎的 UI 範例,以各種方式設定 AutomationProperties.Name:Here’s an example of a XAML based UI, setting the AutomationProperties.Name in various ways:

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <Button x:Name="button1" Content="Hello World" HorizontalAlignment="Left" Margin="44,56,0,0" VerticalAlignment="Top"/>
        <Button x:Name="button2" AutomationProperties.Name="Launch Game" Content="Launch" HorizontalAlignment="Left" Margin="44,106,0,0" VerticalAlignment="Top" Width="99"/>
        <TextBlock AutomationProperties.Name="Day of Week" x:Name="label1" HorizontalAlignment="Left" Height="22" Margin="168,62,0,0" TextWrapping="Wrap" Text="Select Day of Week:" VerticalAlignment="Top" Width="137"/>
        <ComboBox AutomationProperties.LabeledBy="{Binding ElementName=label1}" x:Name="comboBox" HorizontalAlignment="Left" Margin="310,57,0,0" VerticalAlignment="Top" Width="120">
            <ComboBoxItem Content="Monday" IsSelected="True"/>
            <ComboBoxItem Content="Tuesday"/>
            <ComboBoxItem Content="Wednesday"/>
            <ComboBoxItem Content="Thursday"/>
            <ComboBoxItem Content="Friday"/>
            <ComboBoxItem Content="Saturday"/>
            <ComboBoxItem Content="Sunday"/>
        <Button x:Name="button3" HorizontalAlignment="Left" Margin="44,156,0,0" VerticalAlignment="Top" Width="213">
                <TextBlock AutomationProperties.Name="Accept">Accept Offer</TextBlock>
                <TextBlock Margin="0,25,0,0" Foreground="#FF5A5A5A">Exclusive offer just for you</TextBlock>

使用上述範例,就是 UI 的外觀和沒有語音提示標籤的樣子。Using the above sample here is what the UI will look like with and without voice tip labels.

  • 在主動接聽模式中,不會顯示標籤:In Active Listening Mode, without labels shown:

    作用中接聽模式的螢幕擷取畫面,其中顯示標籤、顯示 [顯示標籤] 選項,且未顯示任何標籤。

  • 在主動接聽模式中,在使用者顯示 [顯示標籤] 之後:In Active Listening Mode, after user says "show labels":

    使用中接聽模式的螢幕擷取畫面(如果您已完成),並顯示顯示在 U I 控制項上的 [停止接聽選項] 和 [標籤]。

在的案例中 button1 ,XAML 會 AutomationProperties.Name 使用控制項的可見文字內容中的文字來自動填入屬性。In the case of button1, XAML auto populates the AutomationProperties.Name property using text from the control’s visible text content. 這就是為什麼即使沒有明確的設定,也會有語音提示標籤 AutomationProperties.NameThis is why there is a voice tip label even though there isn't an explicit AutomationProperties.Name set.

button2 中,我們會明確地將設定 AutomationProperties.Name 為控制項文字以外的內容。With button2, we explicitly set the AutomationProperties.Name to something other than the text of the control.

使用 comboBox 時,我們使用 LabeledBy 屬性做為 label1 自動化的來源 Name ,並 label1 將設定 AutomationProperties.Name 為比螢幕上轉譯的更自然的片語 ( "Day of week",而不是 "Select day of week" ) 。With comboBox, we used the LabeledBy property to reference label1 as the source of the automation Name, and in label1 we set the AutomationProperties.Name to a more natural phrase than what is rendered on screen (“Day of Week” rather than “Select Day of Week”).

最後,使用 button3 ,VES Name 會從第一個子項目抓取,因為 button3 本身沒有 AutomationProperties.Name 集合。Finally, with button3, VES grabs the Name from the first child element since button3 itself does not have an AutomationProperties.Name set.

另請參閱See also