您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

语法格式Grammar Format

语法是一个 XML 文件,它指定服务可解释的一组加权的自然语言查询,并指定了如何将这些自然语言查询转换为语义查询表达式。The grammar is an XML file that specifies the weighted set of natural language queries that the service can interpret, as well as how these natural language queries are translated into semantic query expressions. 语法句法基于 SRGS,这是一个面向语音识别语法的 W3C 标准,具有支持数据索引集成和语义函数的扩展程序。The grammar syntax is based on SRGS, a W3C standard for speech recognition grammars, with extensions to support data index integration and semantic functions.

下面描述了可在语法中使用的每个句法元素。The following describes each of the syntactic elements that can be used in a grammar. 要了解在上下文中展示这些元素用法的完整语法,请参阅本例See this example for a complete grammar that demonstrates the use of these elements in context.

grammar 元素grammar Element

grammar 元素是语法规范 XML 中的顶级元素。The grammar element is the top-level element in the grammar specification XML. root 属性是必需的,它指定了根规则的名称,而根规则定义了语法的起始点。The required root attribute specifies the name of the root rule that defines the starting point of the grammar.

<grammar root="GetPapers">

import 元素import Element

import 元素导入外部文件中的架构定义来实现属性引用。The import element imports a schema definition from an external file to enable attribute references. 此元素必须是顶级 grammar 元素的子元素,且显示在所有 attrref 元素的前面。The element must be a child of the top-level grammar element and appear before any attrref elements. schema 属性是必需的,它指定了与语法 XML 文件位于同一目录的架构文件的名称。The required schema attribute specifies the name of a schema file located in the same directory as the grammar XML file. name 元素也是必需的,它指定了后续 attrref 元素在引用此架构中定义的属性时使用的架构别名。The required name element specifies the schema alias that subsequent attrref elements use when referencing attributes defined within this schema.

  <import schema="academic.schema" name="academic"/>

rule 元素rule Element

rule 元素定义了语法规则,该规则是一个结构单元,用于指定系统可解释的一组查询表达式。The rule element defines a grammar rule, a structural unit that specifies a set of query expressions that the system can interpret. 此元素必须是顶级 grammar 元素的子元素。The element must be a child of the top-level grammar element. id 属性是必需的,它指定从 grammarruleref 元素中引用的规则的名称。The required id attribute specifies the name of the rule, which is referenced from grammar or ruleref elements.

rule 元素定义了一组合法扩展。A rule element defines a set of legal expansions. 文本令牌直接针对数据查询进行匹配。Text tokens match against the input query directly. item 元素指定了重复项并更改解释概率。item elements specify repeats and alter interpretation probabilities. one-of 元素指示替代选项。one-of elements indicate alternative choices. ruleref 元素可用于根据简单的扩展构造更复杂的扩展。ruleref elements enable construction of more complex expansions from simpler ones. attrref 元素允许针对索引的属性值进行匹配。attrref elements allow matches against attribute values from the index. tag 元素指定解释的语义,且可更改解释概率。tag elements specify the semantics of the interpretation and can alter the interpretation probability.

<rule id="GetPapers">...</rule>

example 元素example Element

example 元素可选,它指定所含 rule 定义可能接受的示例短语。The optional example element specifies example phrases that may be accepted by the containing rule definition. 此元素可用于文档和自动测试。This may be used for documentation and/or automated testing.

<example>papers about machine learning by michael jordan</example>

item 元素item Element

item 元素将一系列语法构造进行分组。The item element groups a sequence of grammar constructs. 它可用于指示扩展序列的重复,或结合 one-of 元素来指定替代项。It can be used to indicate repetitions of the expansion sequence, or to specify alternatives in conjunction with the one-of element.

item 元素不是 one-of 元素的子级时,它可向计数值分配 repeat 属性,从而指定封闭序列的重复。When an item element is not a child of a one-of element, it can specify repetition of the enclosed sequence by assigning the repeat attribute to a count value. 计数值“n”(其中 n 是一个整数)指示序列必须恰好发生 n 次 。A count value of "n" (where n is an integer) indicates that the sequence must occur exactly n times. 计数值“m-n”允许序列出现 m 到 n 次(包含 m 次和 n 次) 。A count value of "m-n" allows the sequence to appear between m and n times, inclusively. 计数值“m-”指定序列必须至少出现 m 次 。A count value of "m-" specifies that the sequence must appear at least m times. repeat-logprob 属性可选,它可用于更改超过最小值的每个额外重复的解释概率。The optional repeat-logprob attribute can be used to alter the interpretation probability for each additional repetition beyond the minimum.

<item repeat="1-" repeat-logprob="-10">...</item>

item 元素显示为 one-of 元素的子级时,它定义一组扩展替代项。When item elements appear as children of a one-of element, they define the set of expansion alternatives. 在此用法中,可选的 logprob 属性指定不同选项之间的相对对数概率。In this usage, the optional logprob attribute specifies the relative log probability among the different choices. 如果概率 p 介于 0 到 1 之间,则相应的对数概率可计算为 log(p),其中 log() 是自然对数函数 。Given a probability p between 0 and 1, the corresponding log probability can be computed as log(p), where log() is the natural log function. 若未指定,则 logprob 默认为 0,即不更改解释概率。If not specified, logprob defaults to 0, which does not alter the interpretation probability. 请注意,对数概率始终为负的浮点值或 0。Note that log probability is always a negative floating-point value or 0.

<one-of>
  <item>by</item>
  <item logprob="-0.5">written by</item>
  <item logprob="-1">authored by</item>
</one-of>

one-of 元素one-of Element

one-of 元素指定其中一个子级 item 元素中的备选扩展。The one-of element specifies alternative expansions among one of the child item elements. one-of 元素中仅可出现 item 元素。Only item elements may appear inside a one-of element. 不同选项间的相对概率可通过每个子级 item 中的 logprob 进行指定。Relative probabilities among the different choices may be specified via the logprob attribute in each child item.

<one-of>
  <item>by</item>
  <item logprob="-0.5">written by</item>
  <item logprob="-1">authored by</item>
</one-of>

ruleref 元素ruleref Element

ruleref 元素通过引用其他 rule 元素指定有效扩展。The ruleref element specifies valid expansions via references to another rule element. 通过使用 ruleref 元素,可基于更简单的规则构建更复杂的表达式。Through the use of ruleref elements, more complex expressions can be built from simpler rules. uri 属性是必需的,它使用语法“#rulename”指定所引用的 rule 的名称。The required uri attribute indicates the name of the referenced rule using the syntax "#rulename". 要捕获所引用语法的语义输出,需使用可选的 name 属性来指定要将语义输出分配到的变量的名称。To capture the semantic output of the referenced rule, use the optional name attribute to specify the name of a variable to which the semantic output is assigned.

<ruleref uri="#GetPaperYear" name="year"/>

attrref 元素attrref Element

attrref 元素引用索引属性,从而能够针对索引中观察到的属性值进行匹配。The attrref element references an index attribute, allowing matching against attribute values observed in the index. uri 属性是必需的,它使用“schemaName#attrName”指定索引架构名称和属性名称。The required uri attribute specifies the index schema name and attribute name using the syntax "schemaName#attrName". 必须存在前置 import 元素,它用于导入名为 schemaName 的架构 。There must be a preceding import element that imports the schema named schemaName. 属性名称是相应架构中定义的属性的名称。The attribute name is the name of an attribute defined in the corresponding schema.

除了匹配用户输入,attrref 元素还返回一个结构化的查询对象作为输出,该对象选择匹配输入值的索引中的对象子集。In addition to matching user input, the attrref element also returns a structured query object as output that selects the subset of objects in the index matching the input value. 可选的 name 属性可用于指定应存储查询对象输出的变量的名称。Use the optional name attribute to specify the name of the variable where the query object output should be stored. 该查询对象可包含其他查询对象,构成更复杂的表达式。The query object can be composed with other query objects to form more complex expressions. 有关详细信息,请参阅语义解释See Semantic Interpretation for details.

<attrref uri="academic#Keyword" name="keyword"/>

查询完成Query Completion

要在解释部分用户查询时支持“查询完成”功能,所引用的每个属性必须在架构定义中包含“starts_with”作为操作。To support query completions when interpreting partial user queries, each referenced attribute must include "starts_with" as an operation in the schema definition. 如果具有用户查询前缀,则 attrref 将匹配索引中补全前缀的所有值,并生成每个完整的值作为语法的单独解释。Given a user query prefix, attrref will match all values in the index that complete the prefix, and yield each complete value as a separate interpretation of the grammar.

示例:Examples:

  • 如果针对查询前缀“dat”匹配 <attrref uri="academic#Keyword" name="keyword"/>,则对“数据库”论文生成一个解释,对“数据挖掘”论文再生成一个解释等等。Matching <attrref uri="academic#Keyword" name="keyword"/> against the query prefix "dat" generates one interpretation for papers about "database", one interpretation for papers about "data mining", etc.
  • 如果针对查询前缀“200”匹配 <attrref uri="academic#Year" name="year"/>,则对“2000”年的论文生成一个解释,对“2001”年的论文再生成一个解释等等。Matching <attrref uri="academic#Year" name="year"/> against the query prefix "200" generates one interpretation for papers in "2000", one interpretation for papers in "2001", etc.

匹配操作Matching Operations

除了完全匹配,select 属性类型还通过可选的 op 属性支持前缀和不等性匹配。In addition to exact match, select attribute types also support prefix and inequality matches via the optional op attribute. 如果索引中的对象均不具有匹配值,则语法路径受阻,且服务不生成任何遍历此语法路径的解释。If no object in the index has a value that matches, the grammar path is blocked and the service will not generate any interpretations traversing over this grammar path. op 属性默认为“eq”。The op attribute defaults to "eq".

in <attrref uri="academic#Year" name="year"/>
before <attrref uri="academic#Year" op="lt" name="year"/

下表列出了每个属性类型支持的 op 值。The following table lists the supported op values for each attribute type. 要使用这些值,架构属性定义中需要包含相应的索引操作。Their use requires the corresponding index operation to be included in the schema attribute definition.

属性类型Attribute Type Op 值Op Value 描述Description 索引操作Index Operation
StringString eqeq 字符串完全匹配String exact match equalsequals
StringString starts_withstarts_with 字符串前缀匹配String prefix match starts_withstarts_with
Int32、Int64、DoubleInt32, Int64, Double eqeq 数字相等性匹配Numeric equality match equalsequals
Int32、Int64、DoubleInt32, Int64, Double lt、le、gt、gelt, le, gt, ge 数字不等性匹配(<、<=、>、>=)Numeric inequality match (<, <=, >, >=) is_betweenis_between
Int32、Int64、DoubleInt32, Int64, Double starts_withstarts_with 十进制表示法中值的前缀匹配Prefix match of value in decimal notation starts_withstarts_with

示例:Examples:

  • <attrref uri="academic#Year" op="lt" name="year"/> 与输入字符串“2000”相匹配,并返回 2000 年之前(包含 2000 年)发布的所有论文。<attrref uri="academic#Year" op="lt" name="year"/> matches the input string "2000" and returns all papers published before the year 2000, exclusively.
  • <attrref uri="academic#Year" op="lt" name="year"/> 与输入字符串“20”不匹配,原因是在索引中,20 年之前未发布任何论文。<attrref uri="academic#Year" op="lt" name="year"/> does not match the input string "20" because there are no papers in the index published before the year 20.
  • <attrref uri="academic#Keyword" op="starts_with" name="keyword"/> 与输入字符串“dat”相匹配,并在单个解释中返回“数据库”和“数据挖掘”等论文。这种用法很少见。<attrref uri="academic#Keyword" op="starts_with" name="keyword"/> matches the input string "dat" and returns in a single interpretation papers about "database", "data mining", etc. This is a rare use case.
  • <attrref uri="academic#Year" op="starts_with" name="year"/> 与输入字符串“20”相匹配,并在单个解释中返回 200-299 和 2000-2999 年发布的论文。这种用法很少见。<attrref uri="academic#Year" op="starts_with" name="year"/> matches the input string "20" and returns in a single interpretation papers published in 200-299, 2000-2999, etc. This is a rare use case.

tag 元素tag Element

tag 元素指定要如何解释贯穿语法的路径。The tag element specifies how a path through the grammar is to be interpreted. 它包含一系列以分号结束的语句。It contains a sequence of semicolon-terminated statements. 语句可能是文本分配,也可能是另一变量的变量。A statement may be an assignment of a literal or a variable to another variable. 此外,它还可向变量分配不带参数或带参数的函数的输出。It may also assign the output of a function with 0 or more parameters to a variable. 每个函数参数都可使用文本或变量指定。Each function parameter may be specified using a literal or a variable. 如果函数不返回任何输出,则略掉分配。If the function does not return any output, the assignment is omitted. 变量范围是“包含”规则的本地范围。Variable scope is local to the containing rule.

<tag>x = 1; y = x;</tag>
<tag>q = All(); q = And(q, q2);</tag>
<tag>AssertEquals(x, 1);</tag>

语法中的每个 rule 均具有名为“out”的预定义变量,它表示此规则的语义输出。Each rule in the grammar has a predefined variable named "out", representing the semantic output of the rule. 其值按如下方式计算:通过匹配用户查询输入的 rule 评估路径遍历的每个语义语句。Its value is computed by evaluating each of the semantic statements traversed by the path through the rule matching the user query input. 在评估结束时分配给“out”变量的值就是规则的语义输出。The value assigned to the "out" variable at the end of the evaluation is the semantic output of the rule. 针对语法解释用户查询的语义输出就是根规则的语义输出。The semantic output of interpreting a user query against the grammar is the semantic output of the root rule.

一些语句可能引入加法对数概率偏移量,从而使解释路径的概率出现变化。Some statements may alter the probability of an interpretation path by introducing an additive log probability offset. 如果不满足指定的条件,某些语句可能完全拒绝解释。Some statements may reject the interpretation altogether if specified conditions are not satisfied.

如需所支持的语义函数的列表,请参阅语义函数For a list of supported semantic functions, see Semantic Functions.

解释概率Interpretation Probability

贯穿语法的解释路径的概率是指整个路径中遇到的所有 <item> 元素和语义函数的累计对数概率。The probability of an interpretation path through the grammar is the cumulative log probability of all the <item> elements and semantic functions encountered along the way. 它描述了与特定输入序列相匹配的相对可能性。It describes the relative likelihood of matching a particular input sequence.

如果概率 p 介于 0 到 1 之间,则相应的对数概率可计算为 log(p),其中 log() 是自然对数函数 。Given a probability p between 0 and 1, the corresponding log probability can be computed as log(p), where log() is the natural log function. 通过使用对数概率,系统可累计贯穿简单加法的解释路径的联合概率。Using log probabilities allows the system to accumulate the joint probability of an interpretation path through simple addition. 它还能避免此类联合概率计算中常见的浮点数下溢。It also avoids floating-point underflow common to such joint probability calculations. 请注意,根据设计,对数概率始终为负浮点值或 0,其中值越大表示可能性越大。Note that by design, the log probability is always a negative floating-point value or 0, where larger values indicate higher likelihood.

示例Example

下面是学术刊物域中的一个示例 XML,它展示了语法的各个元素:The following is an example XML from the academic publications domain that demonstrates the various elements of a grammar:

<grammar root="GetPapers">

  <!-- Import academic data schema-->
  <import schema="academic.schema" name="academic"/>
  
  <!-- Define root rule-->
  <rule id="GetPapers">
    <example>papers about machine learning by michael jordan</example>
    
    papers
    <tag>
      yearOnce = false;
      isBeyondEndOfQuery = false;
      query = All();
    </tag>
  
    <item repeat="1-" repeat-logprob="-10">
      <!-- Do not complete additional attributes beyond end of query -->
      <tag>AssertEquals(isBeyondEndOfQuery, false);</tag>
        
      <one-of>
        <!-- about <keyword> -->
        <item logprob="-0.5">
          about <attrref uri="academic#Keyword" name="keyword"/>
          <tag>query = And(query, keyword);</tag>
        </item>
        
        <!-- by <authorName> [while at <authorAffiliation>] -->
        <item logprob="-1">
          by <attrref uri="academic#Author.Name" name="authorName"/>
          <tag>authorQuery = authorName;</tag>
          <item repeat="0-1" repeat-logprob="-1.5">
            while at <attrref uri="academic#Author.Affiliation" name="authorAffiliation"/>
            <tag>authorQuery = And(authorQuery, authorAffiliation);</tag>
          </item>
          <tag>
            authorQuery = Composite(authorQuery);
            query = And(query, authorQuery);
          </tag>
        </item>
        
        <!-- written (in|before|after) <year> -->
        <item logprob="-1.5">
          <!-- Allow this grammar path to be traversed only once -->
          <tag>
            AssertEquals(yearOnce, false);
            yearOnce = true;
          </tag>
          <ruleref uri="#GetPaperYear" name="year"/>
          <tag>query = And(query, year);</tag>
        </item>
      </one-of>

      <!-- Determine if current parse position is beyond end of query -->
      <tag>isBeyondEndOfQuery = GetVariable("IsBeyondEndOfQuery", "system");</tag>
    </item>
    <tag>out = query;</tag>
  </rule>
  
  <rule id="GetPaperYear">
    <tag>year = All();</tag>
    written
    <one-of>
      <item>
        in <attrref uri="academic#Year" name="year"/>
      </item>
      <item>
        before
        <one-of>
          <item>[year]</item>
          <item><attrref uri="academic#Year" op="lt" name="year"/></item>
        </one-of>
      </item>
      <item>
        after
        <one-of>
          <item>[year]</item>
          <item><attrref uri="academic#Year" op="gt" name="year"/></item>
        </one-of>
      </item>
    </one-of>
    <tag>out = year;</tag>
  </rule>
</grammar>