Analyze Text (Preview REST API)

API Version: 2021-04-30-Preview

Important

This preview adds "normalizer", used for testing case-insensitivity and text processing on filters and sorts.

The Analyze Text API shows how an analyzer breaks text into tokens, and how a normalizer preprocesses text. It's intended for interactive testing so that you can evaluate a given analyzer or normalizer for debugging purposes.

POST https://[service name].search.windows.net/indexes/[index name]/analyze?api-version=[api-version]
    Content-Type: application/json
    api-key: [admin key]

Testing an analyzer or normalizer is a standalone task. If you're using an analyzer or normalizer during indexing or query execution, you'll specify it in Create or Update Index on individual fields.

URI parameters

Parameter Description
service name Required. The name of your search service.
index name Required. The name of the index containing the field you want to analyze.
api-version Required. The current preview version is 2021-04-30-Preview. See API versions for the full list.

Request headers

The following table describes the required and optional request headers.

Fields Description
Content-Type Required. Set this to application/json
api-key Required. The api-key is used to authenticate the request to your Search service. It is a string value, unique to your service. Analyzer requests must include an api-key header set to your admin key (as opposed to a query key). You can find the API key in your search service dashboard in the Azure portal.

Request body

{
  "text": "Text to analyze",
  "analyzer": "analyzer_name"
}

or

{
  "text": "Text to analyze",
  "tokenizer": "tokenizer_name",
  "tokenFilters": (optional) [ "token_filter_name" ],
  "charFilters": (optional) [ "char_filter_name" ]
}

or

{
  "text": "Text to normalize",
  "normalizer": "normalizer_name"
}

Request contains the following properties:

Property Description
text Required. The text to be analyzed or normalized.
analyzer The analyzer used to break the text into tokens. This property is the name of a built-in analyzer, the name of a language analyzer, or the name of custom analyzer in the index definition. To learn more about the process of lexical analysis, see Analysis in Azure Cognitive Search.
tokenizer The tokenizer used to break the text into tokens. This property is the name of a predefined tokenizer or the name of a custom tokenizer in the index definition.
tokenFilters A collection of token filters used to process the text. The values of the collection need to be the names of predefined token filters or the names of custom token filters in the index definition. For testing analyzers, this property must be used alongside the tokenizer property. For testing normalizers, this property can be used independently.
charFilters A collection of character filters used to process the text. The values of the collection need to be the names of predefined character filters or the names of custom character filters in the index definition. For testing analyzers, this property must be used alongside the tokenizer property. For testing normalizers, this property can be used independently.
normalizer The normalizer used to process the text. This property is the name of a predefined normalizer or the name of custom normalizer in the index definition. To learn more about normalizers, see Text normalization for filtering, faceting, and sorting.

Response

Status Code: 200 OK is returned for a successful response.

The response body is in the following format:

    {
      "tokens": [
        {
          "token": string (token),
          "startOffset": number (index of the first character of the token),
          "endOffset": number (index of the last character of the token),
          "position": number (position of the token in the input text)
        },
        ...
      ]
    }

Examples

Request body includes the string and analyzer or normalizer you want to use.

     {
       "text": "The quick brown fox",
       "analyzer": "standard"
     }

The response shows the tokens emitted by the analyzer for the string you provide.

{
    "tokens": [
        {
            "token": "the",
            "startOffset": 0,
            "endOffset": 3,
            "position": 0
        },
        {
            "token": "quick",
            "startOffset": 4,
            "endOffset": 9,
            "position": 1
        },
        {
            "token": "brown",
            "startOffset": 10,
            "endOffset": 15,
            "position": 2
        },
        {
            "token": "fox",
            "startOffset": 16,
            "endOffset": 19,
            "position": 3
        }
    ]
}

See also