Tutorial: Get well-formatted data from the utterance

In this tutorial, create an app to extract consistently-formatted data from an utterance using the Regular Expression entity.

In this tutorial, you learn how to:

  • Create a new app
  • Add intent
  • Add regular expression entity
  • Train
  • Publish
  • Get intents and entities from endpoint

For this article, you can use the free LUIS account in order to author your LUIS application.

Regular expression entities

This app's use of the regular expression entity is to pull out well-formatted Human Resources (HR) form numbers from an utterance. While the utterance's intent is always determined with machine-learning, this specific entity type is not machine-learned.

Example utterances include:

Example utterances
Where is HRF-123456?
Who authored HRF-123234?
HRF-456098 is published in French?
HRF-456098
HRF-456098 date?

A regular expression is a good choice for this type of data when:

  • the data is well-formatted.

Create a new app

  1. Sign in to the LUIS portal with the URL of https://www.luis.ai.

  2. Select Create new app.

    Screenshot of Language Understanding (LUIS) My Apps page

  3. In the pop-up dialog, enter the name HumanResources and keep the default culture, English. Leave the description empty.

    Create LUIS new HumanResources app

  4. Select Done.

Create intent for finding form

  1. Make sure your Human Resources app is in the Build section of LUIS. You can change to this section by selecting Build on the top, right menu bar.

  2. Select Create new intent.

  3. Enter FindForm in the pop-up dialog box then select Done.

    Screenshot of create new intent dialog with Utilities in the search box

  4. Add example utterances to the intent.

    Example utterances
    What is the URL for hrf-123456?
    Where is hrf-345678?
    When was hrf-456098 updated?
    Did John Smith update hrf-234639 last week?
    How many versions of hrf-345123 are there?
    Who needs to authorize form hrf-123456?
    How many people need to sign off on hrf-345678?
    hrf-234123 date?
    author of hrf-546234?
    title of hrf-456234?

    Screenshot of Intent page with new utterances highlighted

    These few utterances are for demonstration purposes only. A real-world app should have at least 15 utterances of varying length, word order, tense, grammatical correctness, punctuation, and word count.

Use the regular expression entity for well-formatted data

The regular expression entity to match the form number is hrf-[0-9]{6}. This regular expression matches the literal characters hrf- but ignores case and culture variants. It matches digits 0-9, for 6 digits exactly.

HRF stands for human resources form.

LUIS tokenizes the utterance when it is added to an intent. The tokenization for these utterances adds spaces before and after the hyphen, Where is HRF - 123456? The regular expression is applied to the utterance in its raw form, before it is tokenized. Because it is applied to the raw form, the regular expression doesn't have to deal with word boundaries.

Create a regular expression entity to tell LUIS what an HRF-number format is in the following steps:

  1. Select Entities in the left panel.

  2. Select Create new entity button on the Entities Page.

  3. In the pop-up dialog, enter the new entity name HRF-number, select RegEx as the entity type, enter hrf-[0-9]{6} as the Regex value, and then select Done.

    Screenshot of pop-up dialog setting new entity properties

  4. Select Intents from the left menu, then FindForm intent to see the regular expression labeled in the utterances.

    Screenshot of Label utterance with existing entity and regex pattern

    Because the entity is not a machine-learned entity, the entity is applied to the utterances and displayed in the LUIS website as soon as it is created.

Add example utterances to the None intent

The client application needs to know if an utterance is not meaningful or appropriate for the application. The None intent is added to each application as part of the creation process to determine if an utterance can't be answered by the client application.

If LUIS returns the None intent for an utterance, your client application can ask if the user wants to end the conversation or give more directions for continuing the conversation.

Caution

Do not leave the None intent empty.

  1. Select Intents from the left panel.

  2. Select the None intent. Add three utterances that your user might enter but are not relevant to your Human Resources app:

    Example utterances
    Barking dogs are annoying
    Order a pizza for me
    Penguins in the ocean

Train the app before testing or publishing

  1. In the top right side of the LUIS website, select the Train button.

    Train button

  2. Training is complete when you see the green status bar at the top of the website confirming success.

    Trained status bar

Publish the app to query from the endpoint

In order to receive a LUIS prediction in a chat bot or other client application, you need to publish the app to the endpoint.

  1. Select Publish in the top right navigation.

    LUIS publish to endpoint button in top right menu

  2. Select the Production slot and the Publish button.

    LUIS publish to endpoint

  3. Publishing is complete when you see the green status bar at the top of the website confirming success.

    LUIS publish to endpoint

  4. Select the endpoints link in the green status bar to go to the Keys and endpoints page. The endpoint URLs are listed at the bottom.

Get intent and entity prediction from endpoint

  1. In the Manage section (top right menu), on the Keys and endpoints page (left menu), select the endpoint URL at the bottom of the page. This action opens another browser tab with the endpoint URL in the address bar.

    The endpoint URL looks like https://<region>.api.cognitive.microsoft.com/luis/v2.0/apps/<appID>?verbose=true&subscription-key=<YOUR_KEY>&<optional-name-value-pairs>&q=<user-utterance-text>.

  2. Go to the end of the URL in the address and enter the following utterance:

    When were HRF-123456 and hrf-234567 published in the last year?

    The last querystring parameter is q, the utterance query. This utterance is not the same as any of the labeled utterances so it is a good test and should return the FindForm intent with the two form numbers of HRF-123456 and hrf-234567.

    {
      "query": "When were HRF-123456 and hrf-234567 published in the last year?",
      "topScoringIntent": {
        "intent": "FindForm",
        "score": 0.9988884
      },
      "intents": [
        {
          "intent": "FindForm",
          "score": 0.9988884
        },
        {
          "intent": "None",
          "score": 0.00204812363
        }
      ],
      "entities": [
        {
          "entity": "hrf-123456",
          "type": "HRF-number",
          "startIndex": 10,
          "endIndex": 19
        },
        {
          "entity": "hrf-234567",
          "type": "HRF-number",
          "startIndex": 25,
          "endIndex": 34
        }
      ]
    }
    

    By using a regular expression entity, LUIS extracts named data, which is more programmatically helpful to the client application receiving the JSON response.

Clean up resources

When no longer needed, delete the LUIS app. To do so, select My apps from the top left menu. Select the ellipsis (...) to the right of the app name in the app list, select Delete. On the pop-up dialog Delete app?, select Ok.

Next steps

This tutorial created a new intent, added example utterances, then created a regular expression entity to extract well-formatted data from the utterances. After training, and publishing the app, a query to the endpoint identified the intention and returned the extracted data.