ai_similarity function

Applies to: check marked yes Databricks SQL

Important

This feature is in Public Preview.

In the preview,

  • The underlying language model can handle several languages, however these functions are tuned for English.
  • There is rate limiting for the underlying Foundation Model APIs, see Foundation Model APIs limits to update these limits.

The ai_similarity() function invokes a state-of-the-art generative AI model from Databricks Foundation Model APIs to compare two strings and computes the semantic similarity score using SQL.

Requirements

Important

The underlying models that might be used at this time are licensed under the MIT License or Llama 2 community license. Databricks recommends reviewing these licenses to ensure compliance with any applicable terms. If models emerge in the future that perform better according to Databricks’s internal benchmarks, Databricks may change the model (and the list of applicable licenses provided on this page).

Currently, bge-large-en-v1.5 is the underlying model that powers this AI function.

Syntax

ai_similarity(expr1, expr2)

Arguments

  • expr1: A STRING expression.
  • expr2: A STRING expression.

Returns

A FLOAT value, representing the semantic similarity between the two input strings. The output score is relative and should only be used for ranking. Score of 1 means the two text are equal.

Examples

> SELECT ai_similarity('Apache Spark', 'Apache Spark');
  1.0

> SELECT
   company_name
  FROM
   customers
  ORDER BY ai_similarity(company_name, 'Databricks') DESC
  LIMIT 1

  Databricks Inc.