您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

REST 教程:在 Azure 认知搜索中索引和搜索半结构化数据(JSON blob)REST Tutorial: Index and search semi-structured data (JSON blobs) in Azure Cognitive Search

Azure 认知搜索可使用一个知晓如何读取半结构化数据的索引器来编制 Azure blob 存储中 JSON 文档和数组的索引。Azure Cognitive Search can index JSON documents and arrays in Azure blob storage using an indexer that knows how to read semi-structured data. 半结构化数据包含用于分隔数据中的内容的标记或标签。Semi-structured data contains tags or markings which separate content within the data. 它的本质是提供必须全面索引的非结构化数据和符合数据模型的正式结构化数据之间的一个折中,例如可以按字段编制索引的关系数据库架构。It splits the difference between unstructured data, which must be fully indexed, and formally structured data that adheres to a data model, such as a relational database schema, that can be indexed on a per-field basis.

在本教程中,使用 Azure 认知搜索 REST API 和 REST 客户端执行以下任务:In this tutorial, use the Azure Cognitive Search REST APIs and a REST client to perform the following tasks:

  • 为 Azure blob 容器配置 Azure 认知搜索数据源Configure an Azure Cognitive Search data source for an Azure blob container
  • 创建 Azure 认知搜索索引以包含可搜索的内容Create an Azure Cognitive Search index to contain searchable content
  • 配置和运行索引器以读取容器和从 Azure blob 存储中提取可搜索内容Configure and run an indexer to read the container and extract searchable content from Azure blob storage
  • 搜索刚刚创建的索引Search the index you just created

先决条件Prerequisites

本快速入门使用以下服务、工具和数据。The following services, tools, and data are used in this quickstart.

创建 Azure 认知搜索服务或在当前订阅下查找现有服务Create an Azure Cognitive Search service or find an existing service under your current subscription. 可在本教程中使用免费服务。You can use a free service for this tutorial.

创建一个 Azure 存储帐户,用于存储示例数据。Create an Azure storage account for storing the sample data.

Postman 桌面应用,用户将请求发送到 Azure 认知搜索。Postman desktop app for sending requests to Azure Cognitive Search.

Clinical-trials-json.zip 包含本教程使用的数据。Clinical-trials-json.zip contains the data used in this tutorial. 请下载此文件并将其解压缩到其自身的文件夹。Download and unzip this file to its own folder. 数据源自 clinicaltrials.gov,已为本教程转换为 JSON。Data originates from clinicaltrials.gov, converted to JSON for this tutorial.

获取密钥和 URLGet a key and URL

REST 调用需要在每个请求中使用服务 URL 和访问密钥。REST calls require the service URL and an access key on every request. 搜索服务是使用这二者创建的,因此,如果向订阅添加了 Azure 认知搜索,则请按以下步骤获取必需信息:A search service is created with both, so if you added Azure Cognitive Search to your subscription, follow these steps to get the necessary information:

  1. 登录到 Azure 门户,在搜索服务的“概述”页中获取 URL。Sign in to the Azure portal, and in your search service Overview page, get the URL. 示例终结点可能类似于 https://mydemo.search.windows.netAn example endpoint might look like https://mydemo.search.windows.net.

  2. 在“设置” “密钥”中,获取有关该服务的完全权限的管理员密钥 > 。In Settings > Keys, get an admin key for full rights on the service. 有两个可交换的管理员密钥,为保证业务连续性而提供,以防需要滚动一个密钥。There are two interchangeable admin keys, provided for business continuity in case you need to roll one over. 可以在请求中使用主要或辅助密钥来添加、修改和删除对象。You can use either the primary or secondary key on requests for adding, modifying, and deleting objects.

获取 HTTP 终结点和访问密钥Get an HTTP endpoint and access key

所有请求对发送到服务的每个请求都需要 API 密钥。All requests require an api-key on every request sent to your service. 具有有效的密钥可以在发送请求的应用程序与处理请求的服务之间建立信任关系,这种信任关系以每个请求为基础。Having a valid key establishes trust, on a per request basis, between the application sending the request and the service that handles it.

准备示例数据Prepare sample data

  1. 登录到 Azure 门户,导航到你的 Azure 存储帐户,单击“Blob”,然后单击“+ 容器”。Sign in to the Azure portal, navigate to your Azure storage account, click Blobs, and then click + Container.

  2. 创建一个 Blob 容器用于包含示例数据。Create a Blob container to contain sample data. 可以将“公共访问级别”设为任何有效值。You can set the Public Access Level to any of its valid values.

  3. 创建容器后,将其打开,然后在命令栏中选择“上传”。After the container is created, open it and select Upload on the command bar.

    在命令栏上上传Upload on command bar

  4. 导航到包含示例文件的文件夹。Navigate to the folder containing the sample files. 选择所有这些文件,然后单击“上传”。Select all of them and then click Upload.

    上传文件Upload files

上传完成后,这些文件应会显示在数据容器内其自身的子文件夹中。After the upload completes, the files should appear in their own subfolder inside the data container.

设置 PostmanSet up Postman

启动 Postman 并设置 HTTP 请求。Start Postman and set up an HTTP request. 如果不熟悉此工具,请参阅使用 Postman 探索 Azure 认知搜索 REST API 了解详细信息。If you are unfamiliar with this tool, see Explore Azure Cognitive Search REST APIs using Postman.

本教程中每个调用的请求方法是 POSTThe request method for every call in this tutorial is POST. 标头键为“Content-type”和“api-key”。The header keys are "Content-type" and "api-key." 上述标头键的值分别为“application/json”和你的“admin key”(“admin key”是搜索主密钥的占位符)。The values of the header keys are "application/json" and your "admin key" (the admin key is a placeholder for your search primary key) respectively. 正文是调用的实际内容的放置位置。The body is where you place the actual contents of your call. 根据所用的客户端,在如何构造查询方面可能存在一些差异,但基本思路相同。Depending on the client you're using, there may be some variations on how you construct your query, but those are the basics.

半结构化搜索

我们将使用 Postman 向搜索服务发出三个 API 调用,以创建数据源、索引和索引器。We are using Postman to make three API calls to your search service in order to create a data source, an index, and an indexer. 数据源包含指向存储帐户的指针以及 JSON 数据。The data source includes a pointer to your storage account and your JSON data. 加载数据时,搜索服务会建立连接。Your search service makes the connection when loading the data.

查询字符串必须指定 api-version,并且每个调用都应返回“201 已创建”。Query strings must specify an api-version and each call should return a 201 Created. 用于使用 JSON 数组的正式版 api-version 为 2019-05-06The generally available api-version for using JSON arrays is 2019-05-06.

从 REST 客户端执行以下三个 API 调用。Execute the following three API calls from your REST client.

创建数据源Create a data source

创建数据源 API 可创建一个 Azure 认知搜索对象,用于指定要编制索引的数据。The Create Data Source APIcreates an Azure Cognitive Search object that specifies what data to index.

此调用的终结点为 https://[service name].search.windows.net/datasources?api-version=2019-05-06The endpoint of this call is https://[service name].search.windows.net/datasources?api-version=2019-05-06. 请将 [service name] 替换为搜索服务的名称。Replace [service name] with the name of your search service.

对于此调用,请求正文必须包含存储帐户名称、存储帐户密钥和 Blob 容器名称。For this call, the request body must include the name of your storage account, storage account key, and blob container name. 可在 Azure 门户上存储帐户的“访问密钥”中找到存储帐户密钥。The storage account key can be found in the Azure portal inside your storage account's Access Keys. 下图显示了该位置:The location is shown in the following image:

半结构化搜索

在执行该调用之前,请务必替换调用正文中的 [storage account name][storage account key][blob container name]Make sure to replace [storage account name], [storage account key], and [blob container name] in the body of your call before executing the call.

{
    "name" : "clinical-trials-json",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "DefaultEndpointsProtocol=https;AccountName=[storage account name];AccountKey=[storage account key];" },
    "container" : { "name" : "[blob container name]"}
}

响应应如下所示:The response should look like:

{
    "@odata.context": "https://exampleurl.search.windows.net/$metadata#datasources/$entity",
    "@odata.etag": "\"0x8D505FBC3856C9E\"",
    "name": "clinical-trials-json",
    "description": null,
    "type": "azureblob",
    "subtype": null,
    "credentials": {
        "connectionString": "DefaultEndpointsProtocol=https;AccountName=[mystorageaccounthere];AccountKey=[[myaccountkeyhere]]];"
    },
    "container": {
        "name": "[mycontainernamehere]",
        "query": null
    },
    "dataChangeDetectionPolicy": null,
    "dataDeletionDetectionPolicy": null
}

创建索引Create an index

第二次调用的是创建索引 API,用于创建可存储所有可搜索数据的 Azure 认知搜索索引。The second call is Create Index API, creating an Azure Cognitive Search index that stores all searchable data. 索引指定所有参数及其属性。An index specifies all the parameters and their attributes.

此调用的 URL 为 https://[service name].search.windows.net/indexes?api-version=2019-05-06The URL for this call is https://[service name].search.windows.net/indexes?api-version=2019-05-06. 请将 [service name] 替换为搜索服务的名称。Replace [service name] with the name of your search service.

首先替换 URL。First replace the URL. 然后,将以下代码复制并粘贴到正文,并运行查询。Then copy and paste the following code into your body and run the query.

{
  "name": "clinical-trials-json-index",  
  "fields": [
  {"name": "FileName", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": false, "filterable": false, "sortable": true},
  {"name": "Description", "type": "Edm.String", "searchable": true, "retrievable": false, "facetable": false, "filterable": false, "sortable": false},
  {"name": "MinimumAge", "type": "Edm.Int32", "searchable": false, "retrievable": true, "facetable": true, "filterable": true, "sortable": true},
  {"name": "Title", "type": "Edm.String", "searchable": true, "retrievable": true, "facetable": false, "filterable": true, "sortable": true},
  {"name": "URL", "type": "Edm.String", "searchable": false, "retrievable": false, "facetable": false, "filterable": false, "sortable": false},
  {"name": "MyURL", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": false, "filterable": false, "sortable": false},
  {"name": "Gender", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": true, "filterable": true, "sortable": false},
  {"name": "MaximumAge", "type": "Edm.Int32", "searchable": false, "retrievable": true, "facetable": true, "filterable": true, "sortable": true},
  {"name": "Summary", "type": "Edm.String", "searchable": true, "retrievable": true, "facetable": false, "filterable": false, "sortable": false},
  {"name": "NCTID", "type": "Edm.String", "key": true, "searchable": true, "retrievable": true, "facetable": false, "filterable": true, "sortable": true},
  {"name": "Phase", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": true, "filterable": true, "sortable": false},
  {"name": "Date", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": false, "filterable": false, "sortable": true},
  {"name": "OverallStatus", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": true, "filterable": true, "sortable": false},
  {"name": "OrgStudyId", "type": "Edm.String", "searchable": true, "retrievable": true, "facetable": false, "filterable": true, "sortable": false},
  {"name": "HealthyVolunteers", "type": "Edm.String", "searchable": false, "retrievable": true, "facetable": true, "filterable": true, "sortable": false},
  {"name": "Keywords", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true, "facetable": true, "filterable": false, "sortable": false},
  {"name": "metadata_storage_last_modified", "type":"Edm.DateTimeOffset", "searchable": false, "retrievable": true, "filterable": true, "sortable": false},
  {"name": "metadata_storage_size", "type":"Edm.String", "searchable": false, "retrievable": true, "filterable": true, "sortable": false},
  {"name": "metadata_content_type", "type":"Edm.String", "searchable": true, "retrievable": true, "filterable": true, "sortable": false}
  ],
  "suggesters": [
  {
    "name": "sg",
    "searchMode": "analyzingInfixMatching",
    "sourceFields": ["Title"]
  }
  ]
}

响应应如下所示:The response should look like:

{
    "@odata.context": "https://exampleurl.search.windows.net/$metadata#indexes/$entity",
    "@odata.etag": "\"0x8D505FC00EDD5FA\"",
    "name": "clinical-trials-json-index",
    "fields": [
        {
            "name": "FileName",
            "type": "Edm.String",
            "searchable": false,
            "filterable": false,
            "retrievable": true,
            "sortable": true,
            "facetable": false,
            "key": false,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": null,
            "synonymMaps": []
        },
        {
            "name": "Description",
            "type": "Edm.String",
            "searchable": true,
            "filterable": false,
            "retrievable": false,
            "sortable": false,
            "facetable": false,
            "key": false,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": null,
            "synonymMaps": []
        },
        ...
          "scoringProfiles": [],
    "defaultScoringProfile": null,
    "corsOptions": null,
    "suggesters": [],
    "analyzers": [],
    "tokenizers": [],
    "tokenFilters": [],
    "charFilters": []
}

创建并运行索引器Create and run an indexer

索引器连接数据源,将数据导入目标搜索索引,并选择性地提供一个计划来自动执行数据刷新。An indexer connects the data source, imports data into the target search index, and optionally provides a schedule to automate the data refresh. REST API 为创建索引器The REST API is Create Indexer.

此调用的 URL 为 https://[service name].search.windows.net/indexers?api-version=2019-05-06The URL for this call is https://[service name].search.windows.net/indexers?api-version=2019-05-06. 请将 [service name] 替换为搜索服务的名称。Replace [service name] with the name of your search service.

首先替换 URL。First replace the URL. 然后,将以下代码复制并粘贴到正文,并发送请求。Then copy and paste the following code into your body and send the request. 系统会立即处理该请求。The request is processed immediately. 当响应返回时,便拥有了可进行全文搜索的索引。When the response comes back, you will have an index that is full-text searchable.

{
  "name" : "clinical-trials-json-indexer",
  "dataSourceName" : "clinical-trials-json",
  "targetIndexName" : "clinical-trials-json-index",
  "parameters" : { "configuration" : { "parsingMode" : "jsonArray" } }
}

响应应如下所示:The response should look like:

{
    "@odata.context": "https://exampleurl.search.windows.net/$metadata#indexers/$entity",
    "@odata.etag": "\"0x8D505FDE143D164\"",
    "name": "clinical-trials-json-indexer",
    "description": null,
    "dataSourceName": "clinical-trials-json",
    "targetIndexName": "clinical-trials-json-index",
    "schedule": null,
    "parameters": {
        "batchSize": null,
        "maxFailedItems": null,
        "maxFailedItemsPerBatch": null,
        "base64EncodeKeys": null,
        "configuration": {
            "parsingMode": "jsonArray"
        }
    },
    "fieldMappings": [],
    "enrichers": [],
    "disabled": null
}

搜索 JSON 文件Search your JSON files

加载第一个文档后,可立即开始搜索。You can start searching as soon as the first document is loaded. 为完成此任务,在门户中使用搜索浏览器For this task, use Search explorer in the portal.

在 Azure 门户中,打开搜索服务的“概述”页,在“索引”列表中找到创建的索引。In Azure portal, open the search service Overview page, find the index you created in the Indexes list.

请务必选择刚刚创建的索引。Be sure to choose the index you just created.

非结构化搜索

如前所述,可通过多种方式查询数据:全文搜索、系统属性或用户定义的元数据。As before, the data can be queried in a number of ways: full text search, system properties, or user-defined metadata. 仅当在创建目标索引期间已将系统属性和用户定义的元数据标记为$select可检索时,才能使用 参数搜索此类信息。Both system properties and user-defined metadata may only be searched with the $select parameter if they were marked as retrievable during creation of the target index. 索引中的参数在创建后不可更改。Parameters in the index may not be altered once they are created. 但是,可以添加更多的参数。However, additional parameters may be added.

$select=Gender,metadata_storage_size 就是一个基本查询的例子,它会将返回结果限制为这两个参数。An example of a basic query is $select=Gender,metadata_storage_size, which limits the return to those two parameters.

半结构化搜索

$filter=MinimumAge ge 30 and MaximumAge lt 75 是更复杂查询的例子,它只返回参数 MinimumAge 大于或等于 30 且参数 MaximumAge 小于 75 的结果。An example of more complex query would be $filter=MinimumAge ge 30 and MaximumAge lt 75, which returns only results where the parameters MinimumAge is greater than or equal to 30 and MaximumAge is less than 75.

半结构化搜索

请任意自行体验更多的查询。If you'd like to experiment and try a few more queries yourself, feel free to do so. 可以学习逻辑运算符(and、or、not)和比较运算符(eq、ne、gt、lt、ge、le)的用法。Know that you can use Logical operators (and, or, not) and comparison operators (eq, ne, gt, lt, ge, le). 字符串比较区分大小写。String comparisons are case-sensitive.

$filter 参数只适用于在创建索引时标记为可筛选的元数据。The $filter parameter only works with metadata that were marked filterable at the creation of your index.

清理资源Clean up resources

完成本教程后,最快的清理方式是删除包含 Azure 认知搜索服务的资源组。The fastest way to clean up after a tutorial is by deleting the resource group containing the Azure Cognitive Search service. 现在,可以删除资源组以永久删除其中的所有内容。You can delete the resource group now to permanently delete everything in it. 在门户中,资源组名称显示在 Azure 认知搜索服务的“概述”页上。In the portal, the resource group name is on the Overview page of Azure Cognitive Search service.

后续步骤Next steps

要编制 JSON Blob 的索引,有多种方法和多个选项。There are several approaches and multiple options for indexing JSON blobs. 下一步,查看并测试各种选项,找到最适合自己的方案。As a next step, review and test the various options to see what works best for your scenario.