ai_extract function

Applies to: check marked yes Databricks SQL

Important

This feature is in Public Preview.

In the preview,

  • The underlying language model can handle several languages, however these functions are tuned for English.
  • There is rate limiting for the underlying Foundation Model APIs, see Foundation Model APIs limits to update these limits.

The ai_extract() function allows you to invoke a state-of-the-art generative AI model to extract entities specified by labels from a given text using SQL. This function uses a chat model serving endpoint made available by Databricks Foundation Model APIs.

Requirements

Important

The underlying models that might be used at this time are licensed under the Apache 2.0 license or Llama 2 community license. Databricks recommends reviewing these licenses to ensure compliance with any applicable terms. If models emerge in the future that perform better according to Databricks’s internal benchmarks, Databricks may change the model (and the list of applicable licenses provided on this page).

Currently, Mixtral-8x7B Instruct is the underlying model that powers these AI functions.

Syntax

ai_extract(content, labels)

Arguments

  • content: A STRING expression.
  • labels: An ARRAY<STRING> literal. Each element is a type of entity to be extracted.

Returns

A STRUCT where each field corresponds to an entity type specified in labels. Each field contains a string representing the extracted entity. If more than one candidate for any entity type is found, only one is returned.

If content is NULL, the result is NULL.

Examples

> SELECT ai_extract(
    'John Doe lives in New York and works for Acme Corp.',
    array('person', 'location', 'organization')
  );
 {"person": "John Doe", "location": "New York", "organization": "Acme Corp."}

> SELECT ai_extract(
    'Send an email to jane.doe@example.com about the meeting at 10am.',
    array('email', 'time')
  );
 {"email": "jane.doe@example.com", "time": "10am"}