Lesson 10 -- Mixed Initiative Forms

In the Contoso Travel Company application that we have developed, we have been using "directed forms," a type of dialog in which each of the fields in a form must be filled before proceeding to the next field in sequence.

There is another type of form, a "mixed initiative form," in which more than one field can be filled with a single utterance and fields do not have to be filled in order.

While we will not alter the Contoso application to include mixed initiative forms, the purpose of this lesson is to make sure that you know about them. Refer to the VoiceXML 2.0 specification, http://www.w3.org/TR/2004/REC-voicexml20-20040316/, for more information.

In this lesson, we will create a short application with a mixed initiative form that asks for two things: a city and a state. The caller can say both the city and the state in the same utterance (but doesn't have to).

Note

While the sample application in this lesson only asks for two inputs (city and state), mixed initiative forms can ask for more than two inputs in one utterance.

What's in a mixed initiative form?

A mixed initiative form differs from a directed form in these respects:

  • Mixed initiative grammars must have form scope. They are placed as children of the <form> element rather than as children of <field> elements. There are no grammars in fields.

  • Mixed initiative grammars must be able to return more than one match simultaneously. They must be able to return a match for each input the mixed initiative form asks for.

  • Mixed initiative forms contain an <initial> element that asks for multiple inputs.

  • The fields in a mixed initiative form do not include <filled> elements.

  • There is usually just one <filled> element (a child of the mixed initiative form) that handles the filled task for the whole form at once.

So, are there fields?

Yes, there must be one field for each input the mixed initiative form asks for. However, these fields are not like the ones we have been using in directed forms—they usually have no <grammar> or <filled> elements. Typically a mixed initiative field only contains <prompt> and <catch> elements.

Fields are still input items, however. The <initial> element is a control item, so grammar matches do not fill it (that is, get assigned to the <initial> element's form item variable.) Instead, grammar matches fill the fields. For example, if there is a match to the city in the <initial> element, it will be assigned to the input item variable of a field named "city." The mechanisms for doing this are described later in this lesson.

Basic outline of a mixed initiative form

A mixed initiative form can contain:

  • One or more <grammar> elements.

  • An <initial> element (just one).

  • <field> elements, one for each input item the caller is prompted for.

  • A <filled> element (usually just one).

Here is an outline of the structure for a fixed initiative form for city and state. Briefly, the form interpretation algorithm (FIA) attempts to get both of the inputs in the <initial> element. If it fails to get both inputs in <initial>, it goes to the individual <field> elements to get whatever's missing.

<form>

   <grammar>
      <!-- form scope grammar -->
   </grammar>
   <initial>
      <!-- asks for both city and state. Uses the form scope 
           grammar. FIA will exit this <initial> when at least one 
           of city and state is matched. If no matches, FIA will 
           keep trying. Will try forever unless we use a <catch> to 
           get out of it. 

           Matches found will be put in the input item variables of 
           the fields! -->
   </initial>

   <field name="city">
      <!-- Will be filled by match from <initial>. If no such match,
           will prompt for input with its own prompt when FIA visits
           this field in the FIA loop. Uses the form scope grammar. 
           When a match occurs, fills the input item variable named city.
           If no match, the FIA keeps trying until <catch> says to 
           leave the field. The field has no <filled> section. -->
   </field>

   <field name="state">
      <!-- Will be filled by match from <initial>. If no such match,
           will prompt for input with its own prompt when FIA visits
           this field in the FIA loop. Uses the form scope grammar. 
           When a match occurs, fills the input item variable named state.
           If no match, the FIA keeps trying until <catch> says to 
           leave the field. The field has no <filled> section. -->
   </field>

   <filled> 
      <!-- Acts as <filled> for whole form. Depending on its attributes,
            may execute if both inputs are matched, or only one of them. -->
   </filled>

</form>

The elements used in a mixed initiative form

The <initial> element is new to us. It is only used in mixed initiative forms so we have not seen it before.

Form-level grammars are constructed differently than we have seen before in this tutorial. They must be able to return more than one match. To do this, they must return an object with two or more properties, one property for each input requested.

Mixed initiative forms also use the familiar <field> and <filled> elements, but not the same way that they are used in directed forms.

The <grammar> element

The grammars used in mixed initiative forms must have a different structure than those we have been using in this tutorial to date.

The point of a mixed initiative form is to allow the caller to provide two or more inputs in one utterance, so the mixed initiative grammar needs to be able to return matches to two or more different phrases.

The grammar is constructed using a separate subrule (subgrammar) to match each input that is looked for.

Returning two or more matches requires the use of semantic interpretation tags. The section titled "The grammar", below, shows how to do this.

The <initial> element

The <initial> element is placed at the beginning of a mixed initiative form as an alternative to a <field> element. It is used to prompt the caller for multiple inputs. Like a <field> element, it has prompts, catches, and event counters. However, unlike the <field> element in a directed form, the <initial> element has no grammars and no <filled> element. Here is a basic example:

   <initial name="weather_where">
     <prompt>
       You want the weather for what city and state?
     </prompt>
     <catch event="noinput nomatch" count="1">
        Please say something like "Los Angeles, California".
     </catch>
     < catch event="noinput nomatch" count="2">
        I'm sorry, I still don't understand.
        I'll ask you for information one piece at a time.
        <assign name="weather_where" expr="true"/>
     </catch>
   </initial>

When the FIA visits this <initial> element, there are two potential outcomes:

  • If the caller's utterance (on either a first or second attempt) matches either the city or state or both, the FIA will set the <initial> element's form item variable (i.e. weather_where) to true and will continue its loop with the next form item. It will not revisit the <initial> element unless its form item variable is cleared with a statement like <clear namelist="weather_where"/>.

  • If the caller's utterance fails to match either city or state in two tries, the <catch> element sets weather_where to true. Then the FIA continues its loop without revisiting the <initial> element.

The <field> element

When used in a mixed initiative form, the <field> element has different behavior than we are familiar with:

  • <field> elements in mixed initiative forms do not generally include grammars.

  • <field> elements in mixed initiative forms do not generally include <filled> elements.

This means that the <field> elements used in mixed initiative forms typically include only <prompt> and <catch> elements.

The fields still get filled in a mixed initiative form, however, in one of two ways:

  • When a match is found in the <initial> element, it is placed in the form item variable for the field that corresponds to that input item. For example, a match to the city in the <initial> element is placed in the city variable of the field named city, <field name="city>.

  • When an input is not matched in the <initial> element (for example, city), the FIA visits the field for that input and prompts for input there. The grammar with form scope is active in all the form's field, so a match can be made in the field.

The exact mechanism for filling the field's input item variable is described in "Assigning the grammar match results to the mixed initiative form's fields", below.

The <filled> element

When used in a mixed initiative form, the <filled> element also has different behavior than we are familiar with. Among other things, it does not appear as a child of a <field> element. Instead, it appears as a child of the <form> element itself.

The <filled> element has mode and namelist attributes that are used in mixed initiative forms.

The mode attribute specifies which of the form's input items must be filled before the contents of the <filled> element are executed. The mode attribute can have one of two values:

  • all—indicates that the contents of the <filled> element will be executed only when all of the input items specified in the namelist are filled.

  • any—indicates that the contents of the <filled> element will be executed when any of the input items specified in the namelist are filled.

The namelist attribute lists all the input items in the form that the mode attribute must take into consideration. If namelist is not specified, it defaults to the names of all of the form's input items. In this lesson, the only input items used are two <field> elements (one for the city and one for the state).

Warning

<initial> elements are control items, not input items, so they cannot appear in namelist. If you try to include the name of an <initial> element in a namelist attribute, you will get an error.

The grammar

Mixed initiative grammars differ from grammars used in directed forms both in scope (where they are placed) and in content.

Grammar scope

Mixed initiative grammars must have form scope.

When a grammar has form scope:

  • More than one input item can be filled as a result of a single caller utterance.

  • Its input items can be filled in any order.

Note

In directed forms, grammars are children of <field> elements and only have field scope.

The scope of a grammar is determined by three things:

  • Where the <grammar> element is placed.

  • The value of the scope attribute of the <form> element.

  • The value of the scope attribute of the <grammar> element.

When a grammar is placed in a <form> element, it does not automatically have form scope. It may have either form scope or document scope, depending on the values of the scope attributes of the <form> element and the <grammar> element. The optional scope attribute can have one of two values:

  • dialog—the scope of the grammar is the <form> element in which it is placed. On the Tellme platform, this is the default if the scope attribute is not specified.

  • document—the scope of the grammar is the entire document.

The scope attribute of a <grammar> element overrides the scope attribute of its parent <form> element.

Therefore, to give a grammar form scope, you place it in the <form> element in question and make sure that the grammar's scope attribute is set to scope="dialog", either explicitly or by default.

Constructing the grammar

Recall that we are creating a mixed initiative form that asks for both a city and a state. So, if the form-level grammar for this mixed initiative form returns a match, the match must be one of:

  • Both city and state.

  • Just the city.

  • Just the state.

To accomplish this, the grammar must have independent subrules for city and state. We already know how to use subrules: use a top rule that calls the subrule with a <ruleref> element.

In addition, the grammar must return two distinct values, one each for city and state. For this, we need to include "semantic interpretation," using the <tag> element. This is covered in Chapter 4 of the Tellme Speech Service Grammar Developer's Guide (https://msdn.microsoft.com/en-us/library/ee800148.aspx). When you have been through this lesson once, you should read Chapter 4 to be sure you understand what is being done with the <tag> elements in this grammar, as it is very important that you understand semantic interpretation.

To use the <tag> element to return an object, a grammar must include the attribute tag-format="semantics/1.0".

Rule variables

Every <rule> element in a grammar has a "Rule Variable." When tag-format= "semantics/1.0", the Rule Variable is named out. The variable out is implicitly declared as an empty object before the first tag in the rule is executed. The <tag> element can be used to fill out in one of two alternative ways:

  • by assigning a primitive value like a number or string to out (for example, out="george";), which converts the out object to an ordinary variable with the name out

  • by adding properties to the out object, for example, out.firstName="george";

Retrieving values from referenced rules with the rules object

When using tag-format="semantics/1.0", there is a global rules object that has properties that hold the Rule Variable for every visible rule (and the Rule Variable itself can have properties). The Rule Variable for any visible rule is contained in rules.rulename, where rulename is the name of the rule. Therefore, in complex grammars, the Rule variable (out) for every rule is available. Since rules is an object, you can define properties for it—for example, rules.STATE.state, where STATE is the name of a rule.

Here is a sample grammar that achieves the results we want:

<?xml version="1.0"?>
<grammar mode="voice"
         root="top"
         tag-format="semantics/1.0"
         version="1.0"
         xml:lang="en-US">
   <rule id="top">
      <!-- top rule allows match of both city and state,
           just city, or just state -->
      <one-of>
         <item>
            <ruleref uri="#CITY"/> <tag>out.city=rules.CITY.city </tag>
            <ruleref uri="#STATE"/> <tag>out.state=rules.STATE.state </tag>
         </item>
         <item>
            <ruleref uri="#CITY"/> <tag>out.city=rules.CITY.city</tag>
         </item>
         <item>
            <ruleref uri="#STATE"/> <tag>out.state=rules.STATE.state</tag>
         </item>
      </one-of>
   </rule>
   <rule id="CITY">
      <one-of>
        <item>Los Angeles <tag>out.city="Los Angeles"</tag></item>
        <item>San Francisco <tag>out.city="San Francisco"</tag></item>
        <item>San Diego <tag>out.city="San Diego"</tag></item>
        <item>Portland <tag>out.city="Portland"</tag></item>
        <item>Medford <tag>out.city="Medford"</tag></item>
        <item>Eugene <tag>out.city="Eugene"</tag></item>
      </one-of>
   </rule>
   <rule id="STATE">
      <one-of>
         <item>California <tag>out.state="California"</tag></item>
         <item>Oregon<tag>out.state="Oregon"</tag></item>
      </one-of>
   </rule>
</grammar>

Note

This grammar will allow an incorrect pair of city and state (for example, Portland, California). We choose not to deal with that, as it does not detract from the objective of the lesson.

Here is how this grammar returns values:

The Rule Variable for the CITY subrule is out (in the subrule). Since the variable out is created as an object, the <tag> element can create a property (out.city) and assign the value of the matched utterance to it. At the same time, in the grammar as a whole, the Rule Variable for the CITY subrule is rules.CITY. This means that, in the grammar as a whole, the match to the CITY grammar is rules.CITY.city, for example rules.CITY.city="San Francisco".

Similarly, the match to the STATE subrule is out.state inside the STATE subrule and rules.STATE.state in the grammar as a whole.

Note

If there is no match to the CITY subrule, for example, rules.CITY.city will be null.

Now, the Rule Variable for the top rule is out and we want the top rule to return out.city and out.state. Therefore, we included out.city=rules.CITY.city and out.state=rules.STATE.state in the above grammar.

In summary, the grammar returns an object with two properties, city and state, either of which can hold a match result or be null.

Note

Grammars used in mixed initiative forms can have more than two subrules and therefore return more than two matches. Then the form must include one <field> element for each subrule.

Assigning the grammar match results to the mixed initiative form's fields

The grammar that is presented above returns an object variable with two properties: city and state.

How do we write the application code so that the values of these two properties fill the fields? In other words, how is such a result from a form-level grammar assigned to the various input item variables within the form?

The VoiceXML 2.x specifications require that: "By default an input item is assigned the top-level result property whose name matches the input item name."

So, if we create a field named "city" like this <field name="city">, then the value of the .city property of the returned match object will be placed in the field's form item variable (which is, of course, city).

Similarly, the .state property of the returned match object will be placed in the state variable of a field such that <field name="state">.

So we can write the VoiceXML code like this:

<form id="mixed">
   <grammar mode="voice"
      <!-- our mixed initiative grammar goes here -->
   </grammar>

   <initial name="weather_where">
      <prompt>
         You want the weather for what city and state?
      </prompt>

      <!-- Reprompt once, then try directed prompts. -->
      <catch event="noinput nomatch" count="1">
        Please say something like "Los Angeles, California".
      </catch>
      <catch event="noinput nomatch" count="2">
        I'm sorry, I still don't understand.
        I'll ask you for information one piece at a time.
        <assign name="weather_where" expr="true"/>
      </catch>
   </initial>
  
   <field name="city">
      <prompt>
         What is the name of the city for which you 
         want a weather report?
      </prompt>

      <catch event="noinput nomatch" count="3">
         Sorry you're having trouble. Please call again later.
         <exit/>
      </catch>
   </field>
</form>
   <field name="state">
      <prompt>
         What is the name of the state?
      </prompt>

      <catch event="noinput nomatch" count="3">
         Sorry you're having trouble. Please call again later.
         <exit/>
      </catch>
   </field>
   
   <filled>
      <!-- Give them the weather report for their city. -->
   </filled>
</form>

Important

The mixed initiative <form> element must contain one <field> element for every match that can be returned by the grammar (that is, for every subrule in the grammar).

How it works

Since the grammar is a form-level grammar, it can match one or two phrases. When a match occurs, the grammar returns an object with one or both of .city and .state properties. Since we have named our fields city and state:

  • The value of the returned .city property (for example, Portland) is assigned to the city variable in the field of that name.

  • The value of the returned .state property (for example, Oregon) is assigned to the state variable in the field named "state".

When a match of any kind to the form level grammar is made, no match values are placed in the <initial> element's form item variable (weather_where), but weather_where is set to true. This means that the <initial> element will not be revisited by the FIA unless weather_where is subsequently cleared.

Now, the city and state fields contain only a prompt. Why? Recall that the FIA will only visit a form item if its form item variable has no value. So, if the caller's utterance in response to the <initial> element's prompt includes a match to the city (for example), then the city form item variable will be filled with a value immediately and the FIA will skip the city field. If, on the other hand, there was no match to city, then the FIA will visit the city field and the prompt will be played. The state field works similarly.

So, if the utterance prompted for in the <initial> element does not find a match to either city or state (or both), the application will prompt the caller for any missing pieces of information until all the form item variables are filled.

Note

Until a match is made that fills the city variable, the FIA will keep revisiting the city field and replaying the prompt until the third count, when the application will exit. The same is true for the state field.

Adding a <filled> element

In a real-life application, there would be one <filled> section with the mode attribute set to mode="all". This means that the content of the <filled> element will not be executed until all of the input fields are filled. Then, in that <filled> element, the application might request that the caller confirm the city and state (using a subdialog) and then provide a weather report.

In our final sample application below, we are using two <filled> elements. The first is executed when any of the input fields are filled and the second is executed when all of the input fields have been filled.

The reason we are doing this is so that you can run the application in Tellme Studio Scratchpad and see what happens when you speak either one or both of the city and state. The <log> elements in these <filled> sections write the contents of the input fields to the debug log, where you can look at them (see Appendix A -- Testing VoiceXML Applications on the Tellme Platform). The application will also speak the results to you using TTS.

A sample "city and state" mixed initiative application

To arrive at a simple but complete mixed initiative application that is based on the grammar and code examples above, we only need to add <catch> elements to handle <noinput> and <nomatch> events and then put it all together. Here is the result:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" revision="4"
      xmlns="http://www.w3.org/2001/vxml">
<form id="mixed">
   <grammar mode="voice"
           root="top"
           tag-format="semantics/1.0"
           version="1.0"
           xml:lang="en-US">
   <rule id="top">
      <!-- top rule allows match of both city and state,
           just city, or just state -->
      <one-of>
         <item>
            <ruleref uri="#CITY"/> <tag>out.city=rules.CITY.city </tag>
            <ruleref uri="#STATE"/> <tag>out.state=rules.STATE.state </tag>
         </item>
         <item>
            <ruleref uri="#CITY"/> <tag>out.city=rules.CITY.city</tag>
         </item>
         <item>
            <ruleref uri="#STATE"/> <tag>out.state=rules.STATE.state</tag>
         </item>
      </one-of>
   </rule>
   <rule id="CITY">
      <one-of>
         <item>Los Angeles <tag>out.city="Los Angeles"</tag></item>
         <item>San Francisco <tag>out.city="San Francisco"</tag></item>
         <item>San Diego <tag>out.city="San Diego"</tag></item>
         <item>Portland <tag>out.city="Portland"</tag></item>
         <item>Medford <tag>out.city="Medford"</tag></item>
         <item>Eugene <tag>out.city="Eugene"</tag></item>
      </one-of>
   </rule>

   <rule id="STATE">
      <one-of>
         <item>California <tag>out.state="California"</tag></item>
         <item>Oregon<tag>out.state="Oregon"</tag></item>
      </one-of>
   </rule>
   </grammar>



   <initial name="weather_where">
      <prompt>
       You want the weather for what city and state?
      </prompt>
      <help>
        Please say the name of the city and
        State, such as Los Angeles, California.
      </help>
      <!-- Reprompt once, then try directed prompts. -->
      <catch event="noinput nomatch" count="1">
        Please say something like "Los Angeles, California".
      </catch>
      <catch event="noinput nomatch" count="2">
        I'm sorry, I still don't understand.
        I'll ask you for information one piece at a time.
        <assign name="weather_where" expr="true"/>
      </catch>
   </initial>

   <field name="city">
      <prompt>What is the name of the city for which you would like
              a weather report?
      </prompt>

      <catch event="noinput nomatch" count="3">
         Sorry you're having trouble. Please call again later.
         <exit/>
      </catch>

   </field>
   <field name="state">
      <prompt>What is the name of the state?
      </prompt>

      <catch event="noinput nomatch" count="3">
         Sorry you're having trouble. Please call again later.
         <exit/>
      </catch>

   </field>

   <filled mode="any" namelist="city state">
      Thank you. This is the filled section for any input. You said that the city is <value expr="city"/> and the state is <value expr="state"/> <break/>
      <log> This is the filled section for any input. In the "city" field, the form item variable is 
           <value expr="city"/></log>
      <log> This is the filled section for any input. In the "state" field, the form item variable is
           <value expr="state"/></log>
      <log> The value of the input item variable is 
            <value expr="weather_where"/></log>
   </filled>

   <filled mode="all" namelist="city state">
      Thank you. This is the filled section for all input. You said that the city is <value expr="city"/> and the state is <value expr="state"/> <break/>
      <log> This is the filled section for all input. In the "city" field, the form item variable is 
           <value expr="city"/></log>
      <log> This is the filled section for all input. In the "state" field, the form item variable is
           <value expr="state"/></log>
      <log> The value of the input item variable is 
            <value expr="weather_where"/></log>
   </filled>

</form>
</vxml>

Try running this document in the Tellme Studio Scratchpad to see how it works. Say only the city in response to the <initial> prompt. Then say only the state. Then say both. Look at the debug logs to see what happens in each case.

What's next?

The app-root.vxml application that we have developed in this tutorial uses "directed forms" exclusively. In this final lesson we have developed a "mixed initiative form" in order to introduce you to this important alternative form of dialog.

In the ten lessons of this tutorial, you have learned the basics of creating a VoiceXML application. The summary that follows this lesson will remind you of what you have learned and suggest topics for further study.