Intent recognition with Orchestrator in Composer

APPLIES TO: Composer v2.x

Developers can craft a variety of conversational experiences using the Bot Framework, by unifying the onboarding to these experiences via one entry point for end-users. This composition model is often referred to as skills.

Within an organization, creating one parent bot that has multiple child bots, we call skills, owned by different teams, is an effective design pattern. These skills can more broadly leverage common capabilities provided by other developers, which makes for horizontally scaling automation across the organization.

There are two component model approaches available within Bot Framework, Bot Components and skills.

  • Bot Components let you share any combination of declarative conversational assets (dialogs, language models and language generation responses) and code with other developers either in source-code form or through a package and associated package feed. This is then imported into your project and can be modified as needed to suit your scenario. This is directly analogous to a shared code library.

  • Skills let a bot be surfaced to other conversational experiences, sharing it with other bots who then pass utterances across for remote processing. This is directly analogous to how you might construct a lattice of web services.

It's important to build the correct architecture for your scenario. Bot components provide a simple way to share common capabilities across projects, which can then be modified, and skills provide central ownership of a given experience which is then used as-is by other conversational experiences.

Composer can help leverage a bot as a skill. A skill manifest and additional changes are needed and Composer can help automate that.

For skill scenarios, the parent bot is responsible for dispatching utterances from a user to the skill best suited to process the utterances. This dispatcher needs to be trained with example utterances from each skill to build a dispatching model. This lets the parent bot to identify the right skill and route this and subsequent utterances to it.

Orchestrator is a replacement for the now deprecated Bot Framework Dispatcher. It provides a straightforward to use robust skill dispatching solution for Bot Framework customers. Bots built using Composer or ones created directly using the SDK can use it, enabling existing Dispatch users to switch to Orchestrator easily.

Orchestrator utilizes natural language understanding methods while at the same time simplifying the process of language modeling. Using it does not require expertise in deep neural networks or natural language processing (NLP). This work is co-authored with the industry experts in the field and includes some of the top methods used in the General Language Understanding Evaluation (GLUE) leader board. Orchestrator will continue to evolve and develop.

Dispatch user input with few training examples

Developers often need to properly define a language model with very few training examples. With the pre-trained models used by Orchestrator, this is less of a concern. Just one example for an intent or skill can often go far in making accurate predictions. For example, a "Greeting" intent defined with just one example, "hello", can be successfully predicted for examples like "how are you today" or "good morning to you".

The power of the pre-trained models and their generalization capabilities using a very few simple and short examples is powerful. This ability is often called few-shot learning, including one-shot learning, which Orchestrator also supports. This ability is made possible thanks to the pre-trained models that were trained on large data sets.

Multi-lingual

Orchestrator provides a multi-lingual model alongside English which provides the ability for a model trained with, for example English-only, data to process utterances in other languages.

In this example, using the CLI for ease of demonstration, we pull down the multi-lingual model rather than the default English model. You can retrieve a list of available models through the command bf orchestrator:basemodel:list.

​bf orchestrator:basemodel:get --versionId=pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx --out=model

Then create a snapshot using a .lu file with solely-English utterances.

bf orchestrator:create --in test.lu --model model --out generated

And then test using a German utterance ("book a meeting with Darren") which correctly classifies the intent as the BookMeeting intent.

bf orchestrator:query -i="generated\test.blu" -m=model -q="Buchen Sie einen Termin mit Darren"
[
  {
    "label": {
      "name": "BookMeeting",
      "label_type": 1,
      "span": {
        "offset": 0,
        "length": 34
      }
    },
    "score": 0.24771601762059242,
    "closest_text": "book a meeting with darren"
  }
]

Multi-intents

Orchestrator also supports multi-intent detection, whereby if an utterance for a user contains two instructions (for example book a meeting with Darren and add a task to pickup chocolate milk) these can both be identified and provided to the bot for subsequent processing.

The example below, using the CLI for ease of demonstration, shows two intents being extracted from a given utterance.

bf orchestrator:query -i="generated\test.blu" -m=model -q="book a meeting with darren and add a task to pickup chocolate milk"
[
  {
    "closest_text": "add task to pickup chocolate milk",
    "score": 0.7430192247281288,
    "label": {
      "name": "AddTask",
      "label_type": 1,
      "span": {
        "length": 56,
        "offset": 0
      }
    }
  },
  {
    "closest_text": "book a meeting with darren",
    "score": 0.6492044311828292,
    "label": {
      "name": "BookMeeting",
      "label_type": 1,
      "span": {
        "length": 56,
        "offset": 0
      }
    }
  }
]

Classify the unknown intent without additional examples

With Orchestrator, our goal is to ensure a deep understand of ML and NLP are not required to create a robust dispatching model.

Another common challenge that developers face in handling intent classification decisions is determining whether the top scoring intent should be triggered or not. Orchestrator provides a solution for this. Its scores can be interpreted as probabilities calibrated in such way that the score of 0.5 is defined as the maximum score for an unknown intent selected in a way to balance the precision and recall.

If the top intent's score is 0.5 or lower, the query or request should be considered an unknown intent and should probably trigger a follow-up question by the bot. Otherwise, if the score of two intents is above 0.5 then both intents (skills) could be triggered. If the bot is designed to handle only one intent at a time, then the application rules or other priorities could pick the one that gets triggered in this case.

The classification of the unknown intent is done without the need for any examples that define the unknown (often referred to as [zero-shot learning][10]), which would be challenging to accomplish. It would be hard to accomplish this without the heavily pre-trained language model, especially since the bot application may be extended in the future with additional skills.

Fast local library

The Orchestrator core is written in C++ and is currently available as a library in C# and Node.js. The library can be used directly by the bot code (a preferred approach) or can be hosted out-of-proc or on a remote server. Running locally eliminates additional service round-trip latency. This is especially helpful when using Orchestrator to dispatch across disparate LU and QnA services.

As an example, the English pre-trained language model (pretrained.20200924.microsoft.dte.00.06.en.onnx) is roughly 260 MB. Classification of a new example with this initial model takes about 10 milliseconds (depending on the text length). These numbers are for illustration only to give a sense of performance. As we improve the models or include additional languages, these numbers will likely change.

Reports

Orchestrator also provides a test mechanism for evaluating the performance of an Orchestrator model, which in turn generates a report.

In order to achieve high quality natural language processing (like intent detection), it's necessary to assess and refine the quality of the model. This is simplified in Orchestrator because of its use of pre-trained models. An optimization cycle is required in order to account for human language variations.

For more information on how to generate and analyze the report, see Report Interpretation on GitHub.

Minimal or no model training required

Orchestrator uses an example-based approach where the language model is defined as a set of labeled examples. A model example is represented as a vector of numbers (an embedding) obtained from the transformer model for a given text that the corresponding skills is capable of handling.

During runtime, a similarity of the new example is calculated comparing it to the existing model examples per skill. The weighted average of K closest examples KNN algorithm is taken to determine the classification result. This approach doesn't require an explicit training step, only calculation of the embeddings for the model examples is done. The operation is performed locally without GPU and without remote server roundtrips.

Additional Information