使用意圖和實體從語句文字解壓縮資料Extract data from utterance text with intents and entities

LUIS 可讓您從使用者的自然語言語句取得資訊。LUIS gives you the ability to get information from a user's natural language utterances. 此資訊的擷取方式使得它可供程式、應用程式或 Chatbot 用來執行動作。The information is extracted in a way that it can be used by a program, application, or chat bot to take action. 在下列各節中,您將透過 JSON 範例,了解從意圖和實體會傳回哪些資料。In the following sections, learn what data is returned from intents and entities with examples of JSON.

最難擷取的資料是機器學習資料,因為它不是全文相符的資料。The hardest data to extract is the machine-learned data because it isn't an exact text match. 機器學習實體的資料擷取必須是撰寫循環的一部分,直到您確信收到預期的資料為止。Data extraction of the machine-learned entities needs to be part of the authoring cycle until you're confident you receive the data you expect.

資料位置和金鑰使用方式Data location and key usage

LUIS 會從已發佈的端點提供資料。LUIS provides the data from the published endpoint. HTTPS 要求 (POST 或 GET) 除了包含一些額外的設定 (例如預備或生產環境) 之外,也包含語句。The HTTPS request (POST or GET) contains the utterance as well as some optional configurations such as staging or production environments.

https://westus.api.cognitive.microsoft.com/luis/v2.0/apps/<appID>?subscription-key=<subscription-key>&verbose=true&timezoneOffset=0&q=book 2 tickets to paris

當您編輯 LUIS 應用程式時,從該 LUIS 應用程式的 [設定] 頁面,以及從您 URL 的一部分 (在 /apps/ 之後),都可以取得 appIDThe appID is available on the Settings page of your LUIS app as well as part of the URL (after /apps/) when you're editing that LUIS app. subscription-key 是用來查詢您應用程式的端點金鑰。The subscription-key is the endpoint key used for querying your app. 在學習 LUIS 期間,雖然您可以使用免費的撰寫/入門金鑰,但請務必將端點金鑰變更為支援預期的 LUIS 使用方式的金鑰。While you can use your free authoring/starter key while you're learning LUIS, it is important to change the endpoint key to a key that supports your expected LUIS usage. timezoneOffset 單位為分鐘。The timezoneOffset unit is minutes.

HTTPS 回應包含 LUIS 可以根據目前已發佈之預備或生產環境端點模型來判斷的所有意圖和實體資訊。The HTTPS response contains all the intent and entity information LUIS can determine based on the current published model of either the staging or production endpoint. 端點 URL 是在 LUIS 網站、[管理] 區段、[金鑰和端點] 頁面上找到的。The endpoint URL is found on the LUIS website, in the Manage section, on the Keys and endpoints page.

來自意圖的資料Data from intents

主要資料是最高分的意圖名稱The primary data is the top scoring intent name. 端點回應為:The endpoint response is:

{
  "query": "when do you open next?",
  "topScoringIntent": {
    "intent": "GetStoreInfo",
    "score": 0.984749258
  },
  "entities": []
}
資料物件Data Object 資料類型Data Type 資料位置Data Location Value
IntentIntent StringString topScoringIntent.intenttopScoringIntent.intent "GetStoreInfo""GetStoreInfo"

如果您的聊天機器人或 LUIS 呼叫應用程式根據一個以上的意圖分數進行決策,則會傳回所有意圖的分數。If your chatbot or LUIS-calling app makes a decision based on more than one intent score, return all the intents' scores.

設定 querystring 參數,verbose=trueSet the querystring parameter, verbose=true. 端點回應為:The endpoint response is:

{
  "query": "when do you open next?",
  "topScoringIntent": {
    "intent": "GetStoreInfo",
    "score": 0.984749258
  },
  "intents": [
    {
      "intent": "GetStoreInfo",
      "score": 0.984749258
    },
    {
      "intent": "None",
      "score": 0.2040639
    }
  ],
  "entities": []
}

意圖會依最高分到最低分排序。The intents are ordered from highest to lowest score.

資料物件Data Object 資料類型Data Type 資料位置Data Location Value 分數Score
IntentIntent StringString intents[0].intentintents[0].intent "GetStoreInfo""GetStoreInfo" 0.9847492580.984749258
IntentIntent StringString intents[1].intentintents[1].intent "None""None" 0.01682188730.0168218873

如果您新增預先建置的定義域,則意圖名稱除了會指出意圖之外,也會指出該定義域,例如 UtiltiesCommunicationIf you add prebuilt domains, the intent name indicates the domain, such as Utilties or Communication as well as the intent:

{
  "query": "Turn on the lights next monday at 9am",
  "topScoringIntent": {
    "intent": "Utilities.ShowNext",
    "score": 0.07842206
  },
  "intents": [
    {
      "intent": "Utilities.ShowNext",
      "score": 0.07842206
    },
    {
      "intent": "Communication.StartOver",
      "score": 0.0239675418
    },
    {
      "intent": "None",
      "score": 0.0168218873
    }],
  "entities": []
}
DomainDomain 資料物件Data Object 資料類型Data Type 資料位置Data Location Value
公用事業Utilities IntentIntent StringString intents[0].intentintents[0].intent "Utilities.ShowNext""Utilities.ShowNext"
通訊Communication IntentIntent StringString intents[1].intentintents[1].intent Communication.StartOver"Communication.StartOver"
IntentIntent StringString intents[2].intentintents[2].intent "None""None"

來自實體的資料Data from entities

大多數 Chatbot 和應用程式都不僅僅只是需要意圖名稱。Most chatbots and applications need more than the intent name. 這個額外的選擇性資料來自在語句中探索到的實體。This additional, optional data comes from entities discovered in the utterance. 每個類型的實體會傳回與比對相關的不同資訊。Each type of entity returns different information about the match.

語句中的單一單字或片語可能會與多個實體相符。A single word or phrase in an utterance can match more than one entity. 在該情況下,系統會傳回每個相符的實體及其分數。In that case, each matching entity is returned with its score.

所有實體都會在端點回應的實體陣列中傳回:All entities are returned in the entities array of the response from the endpoint:

"entities": [
  {
    "entity": "bob jones",
    "type": "Name",
    "startIndex": 0,
    "endIndex": 8,
    "score": 0.473899543
  },
  {
    "entity": "3",
    "type": "builtin.number",
    "startIndex": 16,
    "endIndex": 16,
    "resolution": {
      "value": "3"
    }
  }
]

傳回的 Token 化實體Tokenized entity returned

有數個文化特性會傳回 entity 值已Token 化的實體物件。Several cultures return the entity object with the entity value tokenized. LUIS 在實體物件中傳回的 startIndex 和 endIndex 不會對應至新的 Token 化值,而是會對應至原始查詢,以便讓您透過程式設計方式擷取原始實體。The startIndex and endIndex returned by LUIS in the entity object do not map to the new, tokenized value but instead to the original query in order for you to extract the raw entity programmatically.

例如,在德文中,das Bauernbrot 會經由 Token 化變成 das bauern brotFor example, in German, the word das Bauernbrot is tokenized into das bauern brot. 系統會傳回 Token 化值 das bauern brot,而只要透過程式設計方式,即可從原始查詢的 startIndex 和 endIndex 判斷出原始值,將 das Bauernbrot 提供給您。The tokenized value, das bauern brot, is returned and the original value can be programmatically determined from the startIndex and endIndex of the original query, giving you das Bauernbrot.

簡單實體資料Simple entity data

簡單實體是一個機器學習值。A simple entity is a machine-learned value. 它可以是一個單字或片語。It can be a word or phrase.

複合實體資料Composite entity data

複合實體是由其他實體所組成,例如預建實體、簡單、正則運算式和列出實體。A composite entity is made up of other entities, such as prebuilt entities, simple, regular expression, and list entities. 個別實體會構成一個完整的提體。The separate entities form a whole entity.

清單實體資料List entity data

清單實體代表一組固定且封閉的相關單字及其同義字。List entities represent a fixed, closed set of related words along with their synonyms. LUIS 並不會探索清單實體的額外值。LUIS does not discover additional values for list entities. 使用建議功能,以根據目前的清單查看適用於新字組的建議。Use the Recommend feature to see suggestions for new words based on the current list. 如果有多個清單實體具有相同的值,則在端點查詢中會傳回每個實體。If there is more than one list entity with the same value, each entity is returned in the endpoint query.

預先建置的實體資料Prebuilt entity data

探索預先建置實體時,會使用開放原始碼 Recognizers-Text 專案,根據規則運算式比對進行探索。Prebuilt entities are discovered based on a regular expression match using the open-source Recognizers-Text project. 預先建置的實體會在實體陣列中傳回,並使用前面加上 builtin:: 的類別名稱。Prebuilt entities are returned in the entities array and use the type name prefixed with builtin::. 以下文字是一個範例語句,其中含有所傳回的預先建置實體:The following text is an example utterance with the returned prebuilt entities:

Dec 5th send to +1 360-555-1212

"entities": [
    {
      "entity": "dec 5th",
      "type": "builtin.datetimeV2.date",
      "startIndex": 0,
      "endIndex": 6,
      "resolution": {
        "values": [
          {
            "timex": "XXXX-12-05",
            "type": "date",
            "value": "2017-12-05"
          },
          {
            "timex": "XXXX-12-05",
            "type": "date",
            "value": "2018-12-05"
          }
        ]
      }
    },
    {
      "entity": "1",
      "type": "builtin.number",
      "startIndex": 18,
      "endIndex": 18,
      "resolution": {
        "value": "1"
      }
    },
    {
      "entity": "360",
      "type": "builtin.number",
      "startIndex": 20,
      "endIndex": 22,
      "resolution": {
        "value": "360"
      }
    },
    {
      "entity": "555",
      "type": "builtin.number",
      "startIndex": 26,
      "endIndex": 28,
      "resolution": {
        "value": "555"
      }
    },
    {
      "entity": "1212",
      "type": "builtin.number",
      "startIndex": 32,
      "endIndex": 35,
      "resolution": {
        "value": "1212"
      }
    },
    {
      "entity": "5th",
      "type": "builtin.ordinal",
      "startIndex": 4,
      "endIndex": 6,
      "resolution": {
        "value": "5"
      }
    },
    {
      "entity": "1 360 - 555 - 1212",
      "type": "builtin.phonenumber",
      "startIndex": 18,
      "endIndex": 35,
      "resolution": {
        "value": "1 360 - 555 - 1212"
      }
    }
  ]

規則運算式實體資料Regular expression entity data

正則運算式實體會根據您所提供的正則運算式模式來解壓縮實體。A regular expression entity extracts an entity based on a regular expression pattern you provide.

擷取名稱Extracting names

從語句中取得名稱相當困難,因為名稱幾乎可以是字母與單字的任何組合。Getting names from an utterance is difficult because a name can be almost any combination of letters and words. 視所要擷取的名稱類型而定,您會有數個選項。Depending on what type of name you're extracting, you have several options. 下列建議不是規定,而是指導方針。The following suggestions are not rules but more guidelines.

新增預建的 PersonName 和 GeographyV2 實體Add prebuilt PersonName and GeographyV2 entities

PersonNameGeographyV2 實體可在某些語言文化特性中使用。PersonName and GeographyV2 entities are available in some language cultures.

人名Names of people

人名可依據語言和文化特性而有些微的格式。People's name can have some slight format depending on language and culture. 請使用預先建立的 personName 實體或具有名字和姓氏角色簡單實體Use either a prebuilt personName entity or a simple entity with roles of first and last name.

如果您使用簡單實體,請務必提供在語句的不同部分中使用名字和姓氏的範例、語句不同的長度,以及跨所有意圖的語句,包括 None 意圖。If you use the simple entity, make sure to give examples that use the first and last name in different parts of the utterance, in utterances of different lengths, and utterances across all intents including the None intent. 請定期檢閱端點語句,以標記任何未正確預測的名稱。Review endpoint utterances on a regular basis to label any names that were not predicted correctly.

地名Names of places

位置名稱已設定且已知,例如城市、縣/市、州、省和國家/地區。Location names are set and known such as cities, counties, states, provinces, and countries/regions. 使用預先建立的實體 geographyV2 來解壓縮位置資訊。Use the prebuilt entity geographyV2 to extract location information.

全新和新興的名稱New and emerging names

有些應用程式需要能夠尋找全新和新興的名稱,例如產品或公司。Some apps need to be able to find new and emerging names such as products or companies. 這些類型的名稱是最棘手的資料解壓縮類型。These types of names are the most difficult type of data extraction. 簡單實體 開始,並新增片語清單Begin with a simple entity and add a phrase list. 請定期檢閱端點語句,以標記任何未正確預測的名稱。Review endpoint utterances on a regular basis to label any names that were not predicted correctly.

模式角色資料Pattern roles data

角色是實體的內容相關差異。Roles are contextual differences of entities.

機構名稱為 Location,具有兩個角色,OriginDestinationEntity name is Location, with two roles, Origin and Destination.

"entities": [
  {
    "entity": "bob jones",
    "type": "Employee",
    "startIndex": 5,
    "endIndex": 13,
    "score": 0.922820568,
    "role": ""
  },
  {
    "entity": "seattle",
    "type": "Location",
    "startIndex": 20,
    "endIndex": 26,
    "score": 0.948008537,
    "role": "Origin"
  },
  {
    "entity": "redmond",
    "type": "Location",
    "startIndex": 31,
    "endIndex": 37,
    "score": 0.7047979,
    "role": "Destination"
  }
]

Pattern.any 實體資料Pattern.any entity data

Pattern。 any是僅用於模式範本語句的可變長度預留位置,用來標記實體開始和結束的位置。Pattern.any is a variable-length placeholder used only in a pattern's template utterance to mark where the entity begins and ends.

情感分析Sentiment analysis

如果已設定情感分析,LUIS JSON 回應就會包含情感分析。If Sentiment analysis is configured, the LUIS json response includes sentiment analysis. 若要深入了解情感分析,請參閱文字分析文件。Learn more about sentiment analysis in the Text Analytics documentation.

情感資料Sentiment data

情感資料是一個介於 1 與 0 之間的分數,指出資料的正面 (較接近 1) 或負面 (較接近 0) 情感。Sentiment data is a score between 1 and 0 indicating the positive (closer to 1) or negative (closer to 0) sentiment of the data.

當文化特性為 en-us 時,回應為:When culture is en-us, the response is:

"sentimentAnalysis": {
  "label": "positive",
  "score": 0.9163064
}

針對所有其他文化特性,回應為:For all other cultures, the response is:

"sentimentAnalysis": {
  "score": 0.9163064
}

關鍵片語擷取實體資料Key phrase extraction entity data

關鍵片語擷取實體會傳回語句中文字分析所提供的關鍵片語。The key phrase extraction entity returns key phrases in the utterance, provided by Text Analytics.

{
  "query": "Is there a map of places with beautiful views on a favorite trail?",
  "topScoringIntent": {
    "intent": "GetJobInformation",
    "score": 0.764368951
  },
  "intents": [
    ...
  ],
  "entities": [
    {
      "entity": "beautiful views",
      "type": "builtin.keyPhrase",
      "startIndex": 30,
      "endIndex": 44
    },
    {
      "entity": "map of places",
      "type": "builtin.keyPhrase",
      "startIndex": 11,
      "endIndex": 23
    },
    {
      "entity": "favorite trail",
      "type": "builtin.keyPhrase",
      "startIndex": 51,
      "endIndex": 64
    }
  ]
}

與多個實體相符的資料Data matching multiple entities

LUIS 會傳回在語句中探索到的所有實體。LUIS returns all entities discovered in the utterance. 因此,您的 Chatbot 可能需要根據結果進行決策。As a result, your chatbot may need to make decision based on the results. 一個語句可以包含許多實體:An utterance can have many entities in an utterance:

book me 2 adult business tickets to paris tomorrow on air france

LUIS 端點可以在不同的實體中探索相同的資料。The LUIS endpoint can discover the same data in different entities.

{
  "query": "book me 2 adult business tickets to paris tomorrow on air france",
  "topScoringIntent": {
    "intent": "BookFlight",
    "score": 1.0
  },
  "intents": [
    {
      "intent": "BookFlight",
      "score": 1.0
    },
    {
      "intent": "Concierge",
      "score": 0.04216196
    },
    {
      "intent": "None",
      "score": 0.03610297
    }
  ],
  "entities": [
    {
      "entity": "air france",
      "type": "Airline",
      "startIndex": 54,
      "endIndex": 63,
      "score": 0.8291798
    },
    {
      "entity": "adult",
      "type": "Category",
      "startIndex": 10,
      "endIndex": 14,
      "resolution": {
        "values": [
          "adult"
        ]
      }
    },
    {
      "entity": "paris",
      "type": "Cities",
      "startIndex": 36,
      "endIndex": 40,
      "resolution": {
        "values": [
          "Paris"
        ]
      }
    },
    {
      "entity": "tomorrow",
      "type": "builtin.datetimeV2.date",
      "startIndex": 42,
      "endIndex": 49,
      "resolution": {
        "values": [
          {
            "timex": "2018-02-21",
            "type": "date",
            "value": "2018-02-21"
          }
        ]
      }
    },
    {
      "entity": "paris",
      "type": "Location::ToLocation",
      "startIndex": 36,
      "endIndex": 40,
      "score": 0.9730773
    },
    {
      "entity": "2",
      "type": "builtin.number",
      "startIndex": 8,
      "endIndex": 8,
      "resolution": {
        "value": "2"
      }
    },
    {
      "entity": "business",
      "type": "Seat",
      "startIndex": 16,
      "endIndex": 23,
      "resolution": {
        "values": [
          "business"
        ]
      }
    },
    {
      "entity": "2 adult business",
      "type": "TicketSeatOrder",
      "startIndex": 8,
      "endIndex": 23,
      "score": 0.8784727
    }
  ],
  "compositeEntities": [
    {
      "parentType": "TicketSeatOrder",
      "value": "2 adult business",
      "children": [
        {
          "type": "Category",
          "value": "adult"
        },
        {
          "type": "builtin.number",
          "value": "2"
        },
        {
          "type": "Seat",
          "value": "business"
        }
      ]
    }
  ]
}

與多個清單實體相符的資料Data matching multiple list entities

如果單字或片語與多個清單實體相符,端點查詢會傳回每個清單實體。If a word or phrase matches more than one list entity, the endpoint query returns each List entity.

如果查詢為 when is the best time to go to red rock?,且應用程式在多個清單中有 red 一字,LUIS 就會辨識所有實體,並在 JSON 端點回應中傳回實體陣列:For the query when is the best time to go to red rock?, and the app has the word red in more than one list, LUIS recognizes all the entities and returns an array of entities as part of the JSON endpoint response:

{
  "query": "when is the best time to go to red rock?",
  "topScoringIntent": {
    "intent": "Calendar.Find",
    "score": 0.06701678
  },
  "entities": [
    {
      "entity": "red",
      "type": "Colors",
      "startIndex": 31,
      "endIndex": 33,
      "resolution": {
        "values": [
          "Red"
        ]
      }
    },
    {
      "entity": "red rock",
      "type": "Cities",
      "startIndex": 31,
      "endIndex": 38,
      "resolution": {
        "values": [
          "Destinations"
        ]
      }
    }
  ]
}

後續步驟Next steps

請參閱新增實體,以深入了解如何將實體新增至 LUIS 應用程式。See Add entities to learn more about how to add entities to your LUIS app.