您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 认知搜索中的知识存储简介Introduction to knowledge stores in Azure Cognitive Search

重要

知识存储目前以公开预览版提供。Knowledge store is currently in public preview. 提供的预览版功能不附带服务级别协议,我们不建议将其用于生产工作负荷。Preview functionality is provided without a service level agreement, and is not recommended for production workloads. 有关详细信息,请参阅 Microsoft Azure 预览版补充使用条款For more information, see Supplemental Terms of Use for Microsoft Azure Previews. REST API 版本 2019-05-06-Preview 提供预览版功能。The REST API version 2019-05-06-Preview provides preview features. 目前提供有限的门户支持,不提供 .NET SDK 支持。There is currently limited portal support, and no .NET SDK support.

"知识库" 是 Azure 认知搜索的一项功能,可保留AI 扩充管道的输出以进行独立的分析或下游处理。Knowledge store is a feature of Azure Cognitive Search that persists output from an AI enrichment pipeline for independent analysis or downstream processing. 扩充文档是管道的输出,是基于使用 AI 流程提取、结构化和分析的内容创建的。An enriched document is a pipeline's output, created from content that has been extracted, structured, and analyzed using AI processes. 在标准的 AI 管道中,扩充文档是临时的,仅在编制索引期间使用,然后被丢弃。In a standard AI pipeline, enriched documents are transitory, used only during indexing and then discarded. 扩充文档将通过知识存储保存起来。With knowledge store, enriched documents are preserved.

如果你过去使用过认知技能,你已经知道技能集通过一系列根据移动文档。If you have used cognitive skills in the past, you already know that skillsets move a document through a sequence of enrichments. 结果可以是搜索索引,也可以是知识存储中的投影(此预览版中新增的)。The outcome can be a search index, or (new in this preview) projections in a knowledge store. 两个输出,搜索索引和知识存储都是同一管道的产品;派生自相同的输入,但会生成以非常不同的方式进行结构化、存储和使用的输出。The two outputs, search index and knowledge store, are products of the same pipeline; derived from the same inputs, but resulting in output that is structured, stored, and used in very different ways.

在物理上,知识存储是一个 Azure 存储,可以是 Azure 表存储和/或 Azure Blob 存储。Physically, a knowledge store is Azure Storage, either Azure Table storage, Azure Blob storage, or both. 任何可以连接到 Azure 存储的工具或进程都可以使用知识存储的内容。Any tool or process that can connect to Azure Storage can consume the contents of a knowledge store.

管道中的知识存储示意图Knowledge store in pipeline diagram

知识存储的优势Benefits of knowledge store

知识存储为您提供了结构、上下文和实际内容-从非结构化和半结构化数据文件(如 blob、已经历过分析的图像文件,甚至是结构化数据)的搜集。A knowledge store gives you structure, context, and actual content - gleaned from unstructured and semi-structured data files like blobs, image files that have undergone analysis, or even structured data, reshaped into new forms. 在分步演练中,你可以看到密集 JSON 文档如何分区到子结构中、重建为新结构,并以其他方式提供给下游流程,如机器学习和数据科学工作负荷。In a step-by-step walkthrough, you can see first-hand how a dense JSON document is partitioned out into substructures, reconstituted into new structures, and otherwise made available for downstream processes like machine learning and data science workloads.

尽管查看 AI 扩充管道可产生的功能非常有用,但知识存储的实际潜能就是能够改变数据的形状。Although it's useful to see what an AI enrichment pipeline can produce, the real potential of a knowledge store is the ability to reshape data. 你可以从基本技能集入手,然后循环访问它以添加越来越多的结构级别,这样就能将它们合并成新结构,可用于除 Azure 认知搜索以外的其他应用。You might start with a basic skillset, and then iterate over it to add increasing levels of structure, which you can then combine into new structures, consumable in other apps besides Azure Cognitive Search.

知识存储的优势已枚举如下:Enumerated, the benefits of knowledge store include the following:

  • 在除搜索以外的分析和报表工具中使用扩充文档。Consume enriched documents in analytics and reporting tools other than search. 使用 Power Query 的 Power BI 是一个极具吸引力的选择,但任何可以连接到 Azure 存储的工具或应用都可以从你创建的知识存储中拉取。Power BI with Power Query is a compelling choice, but any tool or app that can connect to Azure Storage can pull from a knowledge store that you create.

  • 优化 AI 索引管道,同时调试步骤和技能集定义。Refine an AI-indexing pipeline while debugging steps and skillset definitions. 知识存储展示 AI 索引管道中的技能集定义的结果。A knowledge store shows you the product of a skillset definition in an AI-indexing pipeline. 这些结果可用于设计更好的技能集,因为你可以清楚地看到扩充是什么样的。You can use those results to design a better skillset because you can see exactly what the enrichments look like. 可以使用 Azure 存储中的存储资源管理器来查看知识存储的内容。You can use Storage Explorer in Azure Storage to view the contents of a knowledge store.

  • 将数据整形到新表单中。Shape the data into new forms. 整形在技能集中编码化,但重点是技能集现在可以提供此功能。The reshaping is codified in skillsets, but the point is that a skillset can now provide this capability. Azure 认知搜索中的整形程序技能已扩展为包含此任务。The Shaper skill in Azure Cognitive Search has been extended to accommodate this task. 通过整形,可以定义与数据预期用途保持一致的投影,同时保留关系。Reshaping allows you to define a projection that aligns with your intended use of the data while preserving relationships.

备注

熟悉 AI 扩充和认知技能?New to AI enrichment and cognitive skills? Azure 认知搜索与认知服务视觉和语言功能集成,以对图像文件使用光学字符识别 (OCR)、对文本文件使用实体识别和关键短语提取等来提取和扩充源数据。Azure Cognitive Search integrates with Cognitive Services Vision and Language features to extract and enrich source data using Optical Character Recognition (OCR) over image files, entity recognition and key phrase extraction from text files, and more. 有关详细信息,请参阅 Azure 认知搜索中的 AI 扩充For more information, see AI enrichment in Azure Cognitive Search.

物理存储Physical storage

通过技能组合中 knowledgeStore 定义的 projections 元素,清楚了解了知识存储的物理表达式。The physical expression of a knowledge store is articulated through the projections element of a knowledgeStore definition in a Skillset. 投影定义输出的结构,使其与预期用途匹配。The projection defines a structure of the output so that it matches your intended use.

可以将投影表述为表、对象或文件。Projections can be articulated as tables, objects, or files.

"knowledgeStore": { 
    "storageConnectionString": "<YOUR-AZURE-STORAGE-ACCOUNT-CONNECTION-STRING>", 
    "projections": [ 
        { 
            "tables": [ ], 
            "objects": [ ], 
            "files": [ ]
        },
                { 
            "tables": [ ], 
            "objects": [ ], 
            "files": [ ]
        }

在此结构中指定的投影类型确定了知识存储使用的存储类型。The type of projection you specify in this structure determines the type of storage used by knowledge store.

  • 在定义 tables时,将使用表存储。Table storage is used when you define tables. 当需要用于分析工具输入的表格报告结构或作为数据帧导出到其他数据存储区时,定义表投影。Define a table projection when you need tabular reporting structures for inputs to analytical tools or export as data frames to other data stores. 可以指定多个 tables 以获取已进行的文档的子集或交叉部分。You can specify multiple tables to get a subset or cross section of enriched documents. 在同一投影组内,将保留表关系,以便可以使用它们。Within the same projection group, table relationships are preserved so that you can work with all of them.

  • 在定义 objectsfiles时,将使用 Blob 存储。Blob storage is used when you define objects or files. object 的物理表示形式是一个表示已扩充文档的层次结构 JSON 结构。The physical representation of an object is a hierarchical JSON structure that represents an enriched document. file 是从文档中提取的图像,已原样传输到 Blob 存储。A file is an image extracted from a document, transferred intact to Blob storage.

单个投影对象包含一组 tablesobjectsfiles,在许多情况下,创建一个投影可能就足够了。A single projection object contains one set of tables, objects, files, and for many scenarios, creating one projection might be enough.

但是,可以创建多组 table-object-file 投影,并且如果需要不同的数据关系,也可以这样做。However, it is possible to create multiple sets of table-object-file projections, and you might do that if you want different data relationships. 在集内,数据是相关的,假设这些关系存在并且可以检测到它们。Within a set, data is related, assuming those relationships exist and can be detected. 如果创建其他集,则每个组中的文档将永远不相关。If you create additional sets, the documents in each group are never related. 使用多个投影组的一个示例可能是,如果你想要将相同的数据与你的在线系统一起使用,并且需要以特定的方式表示,则还需要在表示的数据科学管道中使用相同的数据做事.An example of using multiple projection groups might be if you want the same data projected for use with your online system and it needs to be represented a specific way, you also want the same data projected for use in a data science pipeline that is represented differently.

要求Requirements

需要Azure 存储空间Azure Storage is required. 它提供物理存储。It provides physical storage. 你可以使用 Blob 存储和/或表存储。You can use Blob storage, Table storage or both. Blob 存储用于不完整的已进行的文档,通常在输出转到下游进程时使用。Blob storage is used for intact enriched documents, usually when the output is going to downstream processes. 表存储用于已丰富的文档的切片,通常用于分析和报告。Table storage is for slices of enriched documents, commonly used for analysis and reporting.

技能组合是必需的。Skillset is required. 它包含 knowledgeStore 定义,并确定已扩充文档的结构和组合。It contains the knowledgeStore definition, and it determines the structure and composition of an enriched document. 不能使用空的技能组合创建知识库。You cannot create a knowledge store using an empty skillset. 技能组合中必须至少有一项技能。You must have at least one skill in a skillset.

需要索引器Indexer is required. 技能组合由索引器调用,该索引器驱动执行。A skillset is invoked by an indexer, which drives the execution. 索引器附带其自己的要求和属性集。Indexers come with their own set of requirements and attributes. 其中的一些属性对知识店有直接的关系:Several of these attributes have a direct bearing on a knowledge store:

  • 索引器需要受支持的 azure 数据源(最终创建知识库的管道将通过从 Azure 支持的源中提取数据开始)。Indexers require a supported Azure data source (the pipeline that ultimately creates the knowledge store starts by pulling data from a supported source on Azure).

  • 索引器需要搜索索引。Indexers require a search index. 索引器需要提供索引架构,即使您从未计划使用它也是如此。An indexer requires that you provide an index schema, even if you never plan to use it. 最小索引具有一个指定为键的字符串字段。A minimal index has one string field, designated as the key.

  • 索引器提供可选的字段映射,用于将源字段别名为目标字段。Indexers provide optional field mappings, used to alias a source field to a destination field. 如果需要修改默认字段映射(若要使用其他名称或类型),可以在索引器中创建字段映射If a default field mapping needs modification (to use a different name or type), you can create a field mapping within an indexer. 对于知识存储输出,目标可以是 blob 对象或表中的字段。For knowledge store output, the destination can be a field in a blob object or table.

  • 索引器具有计划和其他属性(如各种数据源提供的更改检测机制),也可应用于知识存储。Indexers have schedules and other properties, such as change detection mechanisms provided by various data sources, can also be applied to a knowledge store. 例如,可以按固定间隔计划扩充以刷新内容。For example, you can schedule enrichment at regular intervals to refresh the contents.

如何创建知识库How to create a knowledge store

若要创建知识库,请使用门户或预览 REST API (api-version=2019-05-06-Preview)。To create knowledge store, use the portal or the preview REST API (api-version=2019-05-06-Preview).

使用 Azure 门户Use the Azure portal

导入数据向导包含用于创建知识库的选项。The Import data wizard includes options for creating a knowledge store. 对于初始探索,请在四个步骤中创建第一个知识库For initial exploration, create your first knowledge store in four steps.

  1. 选择受支持的数据源。Select a supported data source.

  2. 指定 "扩充:附加资源"、"选择技能" 和 "指定知识存储"。Specify enrichment: attach a resource, select skills, and specify a knowledge store.

  3. 创建索引架构。Create an index schema. 向导需要它,可以为您推断一个。The wizard requires it and can infer one for you.

  4. 运行该向导。Run the wizard. 在最后一步中,将进行提取、扩充和存储操作。Extraction, enrichment, and storage occur in this last step.

使用 Create 技能组合和 preview REST APIUse Create Skillset and the preview REST API

knowledgeStore 是在技能组合中定义的,后者又由索引器调用。A knowledgeStore is defined within a skillset, which in turn is invoked by an indexer. 在扩充期间,Azure 认知搜索会在你的 Azure 存储帐户中创建一个空间,并将已扩充的文档作为 blob 或表投影,具体取决于你的配置。During enrichment, Azure Cognitive Search creates a space in your Azure Storage account and projects the enriched documents as blobs or into tables, depending on your configuration.

目前,预览版 REST API 是可通过编程方式创建知识库的唯一机制。Currently, the preview REST API is the only mechanism by which you can create a knowledge store programmatically. 一种简单的浏览方法是使用 Postman 和 REST API 创建您的第一个知识存储An easy way to explore is create your first knowledge store using Postman and the REST API.

此预览功能的参考内容位于本文的API 参考部分。Reference content for this preview feature is located in the API reference section of this article.

如何与工具和应用程序连接How to connect with tools and apps

只要扩充存在于存储中,连接到 Azure Blob 存储或表存储的任何工具或技术,都可用于浏览、分析或使用内容。Once the enrichments exist in storage, any tool or technology that connects to Azure Blob or Table storage can be used to explore, analyze, or consume the contents. 请从以下列表入手:The following list is a start:

API 参考API reference

本部分是Create 技能组合(REST API)参考文档的一个版本,已修改为包括 knowledgeStore 定义。This section is a version of the Create Skillset (REST API) reference doc, modified to include a knowledgeStore definition.

示例-嵌入在技能组合中的 knowledgeStoreExample - knowledgeStore embedded in a Skillset

下面的示例演示技能组合定义底部的 knowledgeStoreThe following example shows knowledgeStore at the bottom of a skillset definition.

  • 使用POSTPUT来表述请求。Use POST or PUT to formulate the request.
  • 使用 REST API 的 api-version=2019-05-06-Preview 版本来访问知识存储功能。Use the api-version=2019-05-06-Preview version of the REST API to access knowledge store functionality.
POST https://[servicename].search.windows.net/skillsets?api-version=2019-05-06-Preview
api-key: [admin key]
Content-Type: application/json

请求正文是一个 JSON 文档,用于定义技能组合,其中包括 knowledgeStoreThe body of request is a JSON document that defines a skillset, which includes knowledgeStore.

{
  "name": "my-skillset-name",
  "description": "Extract organization entities and generate a positive-negative sentiment score from each document.",
  "skills":
  [
    {
      "@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
      "categories": [ "Organization" ],
      "defaultLanguageCode": "en",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "organizations",
          "targetName": "organizations"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SentimentSkill",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "score",
          "targetName": "mySentiment"
        }
      ]
    },
  ],
  "cognitiveServices": 
    {
    "@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
    "description": "mycogsvcs resource in West US 2",
    "key": "<YOUR-COGNITIVE-SERVICES-KEY>"
    },
    "knowledgeStore": { 
        "storageConnectionString": "<YOUR-AZURE-STORAGE-ACCOUNT-CONNECTION-STRING>", 
        "projections": [ 
            { 
                "tables": [  
                { "tableName": "Organizations", "generatedKeyName": "OrganizationId", "source": "/document/organizations*"}, 
                { "tableName": "Sentiment", "generatedKeyName": "SentimentId", "source": "/document/mySentiment"}
                ], 
                "objects": [ ], 
                "files": [  ]       
            }    
        ]     
    } 
}

请求正文语法Request body syntax

下面的 JSON 指定 knowledgeStore,它是由 indexer (未显示)调用skillset的一部分。The following JSON specifies a knowledgeStore, which is part of a skillset, which is invoked by an indexer (not shown). 如果已熟悉 AI 扩充,则技能组合会确定已扩充文档的构成。If you are already familiar with AI enrichment, a skillset determines the composition of an enriched document. 技能集必须至少包含一个技能,如果调制的是数据结构,则该技能很可能是整形程序技能。A skillset must contain at least one skill, most likely a Shaper skill if you are modulating data structures.

构造请求负载的语法如下。The syntax for structuring the request payload is as follows.

{   
    "name" : "Required for POST, optional for PUT requests which sets the name on the URI",  
    "description" : "Optional. Anything you want, or null",  
    "skills" : "Required. An array of skills. Each skill has an odata.type, name, input and output parameters",
    "cognitiveServices": "A key to Cognitive Services, used for billing.",
    "knowledgeStore": { 
        "storageConnectionString": "<YOUR-AZURE-STORAGE-ACCOUNT-CONNECTION-STRING>", 
        "projections": [ 
            { 
                "tables": [ 
                    { "tableName": "<NAME>", "generatedKeyName": "<FIELD-NAME>", "source": "<DOCUMENT-PATH>" },
                    { "tableName": "<NAME>", "generatedKeyName": "<FIELD-NAME>", "source": "<DOCUMENT-PATH>" },
                    . . .
                ], 
                "objects": [ 
                    {
                    "storageContainer": "<BLOB-CONTAINER-NAME>", 
                    "source": "<DOCUMENT-PATH>", 
                    }
                ], 
                "files": [ 
                    {
                    "storageContainer": "<BLOB-CONTAINER-NAME>",
                    "source": "/document/normalized_images/*"
                    }
                ]  
            },
            {
                "tables": [ ],
                "objects": [ ],
                "files":  [ ]
            }  
        ]     
    } 
}

knowledgeStore 有两个属性: Azure 存储帐户 storageConnectionString 和定义物理存储的 projectionsA knowledgeStore has two properties: a storageConnectionString to an Azure Storage account, and projections that defines physical storage. 你可以使用任何存储帐户,但在同一区域中使用服务是经济高效的。You can use any storage account, but it's cost-effective to use services in the same region.

projections 集合包含投影对象。A projections collection contains projection objects. 每个投影对象都必须具有 tablesobjects``files (每个都是指定的或 null)。Each projection object must have tables, objects, files (one of each), which are either specified or null. 上述语法显示了两个对象,一个完全指定,另一个完全为空。The syntax above shows two objects, one fully specified and the other fully null. 在投影对象中,在存储后,数据中的任何关系(如果检测到)都将保留。Within a projection object, once it is expressed in storage, any relationships among the data, if detected, are preserved.

根据需要创建任意数量的投影对象,以支持隔离和特定方案(例如,用于浏览的数据结构,与数据科学工作负荷中所需的相同)。Create as many projection objects as you need to support isolation and specific scenarios (for example, data structures used for exploration, versus those needed in a data science workload). 你可以通过将 sourcestorageContainertable 设置为对象中的不同值,来针对特定方案获取隔离和自定义。You can get isolation and customization for specific scenarios by setting source and storageContainer or table to different values within an object. 有关详细信息和示例,请参阅在知识存储中使用投影For more information and examples, see Working with projections in a knowledge store.

属性Property 适用于Applies to DescriptionDescription
storageConnectionString knowledgeStore 必需。Required. 采用以下格式: DefaultEndpointsProtocol=https;AccountName=<ACCOUNT-NAME>;AccountKey=<ACCOUNT-KEY>;EndpointSuffix=core.windows.netIn this format: DefaultEndpointsProtocol=https;AccountName=<ACCOUNT-NAME>;AccountKey=<ACCOUNT-KEY>;EndpointSuffix=core.windows.net
projections knowledgeStore 必需。Required. tablesobjectsfiles 及其各自的属性组成的属性对象的集合。A collection of property objects consisting of tables, objects, files and their respective properties. 未使用的投影可以设置为 null。Unused projections can be set to null.
source 所有投影All projections 扩充树的节点的路径,该节点是投影的根。The path to the node of the enrichment tree that is the root of the projection. 此节点是技能组合中任何技能的输出。This node is the output of any of the skills in the skillset. 路径从 /document/开始,表示扩充的文档,但可扩展到 /document/content/ 或文档树中的节点。Paths start with /document/, representing the enriched document but can be extended to /document/content/ or to nodes within the document tree. 示例: /document/countries/* (所有国家/地区)或 /document/countries/*/states/* (所有国家/地区中的所有州)。Examples: /document/countries/* (all countries), or /document/countries/*/states/* (all states in all countries). 有关文档路径的详细信息,请参阅技能组合概念和组合For more information on document paths, see Skillset concepts and composition.
tableName tables 要在 Azure 表存储中创建的表。A table to create in Azure Table storage.
storageContainer objectsfilesobjects, files 要在 Azure Blob 存储中创建的容器的名称。Name of a container to create in Azure Blob storage.
generatedKeyName tables 在表中创建的、用于唯一标识文档的列。A column created in the table that uniquely identifies a document. 扩充管道用生成的值填充此列。The enrichment pipeline populates this column with generated values.

响应Response

对于成功的请求,应看到状态代码“201 Created”。For a successful request, you should see status code "201 Created". 默认情况下,响应正文将包含已创建的技能组合定义的 JSON。By default, the response body will contain the JSON for the skillset definition that was created. 请注意,在调用引用此技能组合的索引器之前,不会创建知识库。Recall that the knowledge store is not created until you invoke an indexer that references this skillset.

后续步骤Next steps

知识存储提供扩充文档的持久性,在设计技能集,或者在创建新的结构和内容供可访问 Azure 存储帐户的任何客户端应用程序使用时,知识存储非常有用。Knowledge store offers persistence of enriched documents, useful when designing a skillset, or the creation of new structures and content for consumption by any client applications capable of accessing an Azure Storage account.

最简单的方法是通过门户创建丰富的文档,但您也可以使用 Postman 和 REST API,这在您希望深入了解如何创建和引用对象时更有用。The simplest approach for creating enriched documents is through the portal, but you can also use Postman and REST API, which is more useful if you want insight into how objects are created and referenced.