Extract data with entities

An entity extracts data from a user utterance at prediction runtime. An optional, secondary purpose is to boost the prediction of the intent or other entities by using the entity as a feature.

There are several types of entities:

  • machine-learning entity - this is the primary entity. You should design your schema with this entity type before using other entities.
  • Non-machine-learning used as a required feature - for exact text matches, pattern matches, or detection by prebuilt entities
  • Pattern.any - to extract free-form text such as book titles from a Pattern

machine-learning entities provide the widest range of data extraction choices. Non-machine-learning entities work by text matching and are used as a required feature for a machine-learning entity or intent.

Entities represent data

Entities are data you want to pull from the utterance, such as names, dates, product names, or any significant group of words. An utterance can include many entities or none at all. A client application may need the data to perform its task.

Entities need to be labeled consistently across all training utterances for each intent in a model.

You can define your own entities or use prebuilt entities to save time for common concepts such as datetimeV2, ordinal, email, and phone number.

Utterance Entity Data
Buy 3 tickets to New York Prebuilt number
Destination
3
New York

While intents are required, entities are optional. You do not need to create entities for every concept in your app, but only for those where the client application needs the data or the entity acts as a hint or signal to another entity or intent.

As your application develops and a new need for data is identified, you can add appropriate entities to your LUIS model later.

Entity represents data extraction

The entity represents a data concept inside the utterance. An intent classifies the entire utterance.

Consider the following four utterances:

Utterance Intent predicted Entities extracted Explanation
Help help - Nothing to extract.
Send something sendSomething - Nothing to extract. The model does not have a required feature to extract something in this context, and there is no recipient stated.
Send Bob a present sendSomething Bob, present The model extracts Bob by adding a required feature of prebuilt entity personName. A machine-learning entity has been used to extract present.
Send Bob a box of chocolates sendSomething Bob, box of chocolates The two important pieces of data, Bob and the box of chocolates, have been extracted by machine-learning entities.

Label entities in all intents

Entities extract data regardless of the predicted intent. Make sure you label all example utterances in all intents. The None intent missing entity labeling causes confusion even if there were far more training utterances for the other intents.

Design entities for decomposition

machine-learning entities allow you to design your app schema for decomposition, breaking a large concept into subentities.

Designing for decomposition allows LUIS to return a deep degree of entity resolution to your client application. This allows your client application to focus on business rules and leave data resolution to LUIS.

A machine-learning entity triggers based on the context learned through example utterances.

machine-learning entities are the top-level extractors. Subentities are child entities of machine-learning entities.

Effective machine learned entities

To build the machine learned entities effectively:

  • Your labeling should be consistent across the intents. This includes even utterances you provide in the None intent that include this entity. Otherwise the model will not be able to determine the sequences effectively.
  • If you have a machine learned entity with subentities, make sure that the different orders and variants of the entity and subentities are presented in the labeled utterances. Labeled example utterances should include all valid forms, and include entities that appear and are absent and also reordered within the utterance.
  • You should avoid overfitting the entities to a very fixed set. Overfitting happens when the model doesn't generalize well, and is a common problem in machine learning models. This implies the app would not work on new data adequately. In turn, you should vary the labeled example utterances so the app is able to generalize beyond the limited examples you provide. You should vary the different subentities with enough change for the model to think more of the concept instead of just the examples shown.

Effective prebuilt entities

To build effective entities that extract common data, such as those provided by the prebuilt entities, we recommend the following process.

Improve the extraction of data by bringing your own data to an entity as a feature. That way all the additional labels from your data will learn the context of where person names exist in your application.

Types of entities

A subentity to a parent should be a machine-learning entity. The subentity can use a non-machine-learning entity as a feature.

Choose the entity based on how the data should be extracted and how it should be represented after it is extracted.

Entity type Purpose
Machine-learned Extract nested, complex data learned from labeled examples.
List List of items and their synonyms extracted with exact text match.
Pattern.any Entity where finding the end of entity is difficult to determine because the entity is free-form. Only available in patterns.
Prebuilt Already trained to extract specific kind of data such as URL or email. Some of these prebuilt entities are defined in the open-source Recognizers-Text project. If your specific culture or entity isn't currently supported, contribute to the project.
Regular Expression Uses regular expression for exact text match.

Extraction versus resolution

Entities extract data as the data appears in the utterance. Entities do not change or resolve the data. The entity won't provide any resolution if the text is a valid value for the entity or not.

There are ways to bring resolution into the extraction, but you should be aware that this limits the ability of the app to be immune against variations and mistakes.

List entities and regular expression (text-matching) entities can be used as required features to a subentity and that acts as a filter to the extraction. You should use this carefully as not to hinder the ability of the app to predict.

An utterance may contain two or more occurrences of an entity where the meaning of the data is based on context within the utterance. An example is an utterance for booking a flight that has two geographical locations, origin and destination.

Book a flight from Seattle to Cairo

The two locations need to be extracted in a way that the client-application knows the type of each location in order to complete the ticket purchase.

To extract the origin and destination, create two subentities as part of the ticket order machine-learning entity. For each of the subentities, create a required feature that uses geographyV2.

Using required features to constrain entities

Learn more about required features

Pattern.any entity

A Pattern.any is only available in a Pattern.

Exceeding app limits for entities

If you need more than the limit, contact support. To do so, gather detailed information about your system, go to the LUIS website, and then select Support. If your Azure subscription includes support services, contact Azure technical support.

Entity prediction status and errors

The LUIS portal shows when the entity has a different entity prediction than the entity you selected for an example utterance. This different score is based on the current trained model.

The LUIS portal shows when the entity has a different entity prediction than the entity you selected for an example utterance.

The erroring text is highlighted within the example utterance, and the example utterance line has an error indicator to the right, shown as a red triangle.

Use this information to resolve entity errors using one or more of the following:

  • The highlighted text is mislabeled. To fix, review, correct, and retrain.
  • Create a feature for the entity to help identify the entity's concept
  • Add more example utterances and label with the entity
  • Review active learning suggestions for any utterances received at the prediction endpoint that can help identify the entity's concept.

Next steps

Learn concepts about good utterances.

See Add entities to learn more about how to add entities to your LUIS app.

See Tutorial: Extract structured data from user utterance with machine-learning entities in Language Understanding (LUIS) to learn how to extract structured data from an utterance using the machine-learning entity.