您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 中的知识存储中的预测认知搜索Projections in a knowledge store in Azure Cognitive Search

重要

知识存储目前以公开预览版提供。Knowledge store is currently in public preview. 提供的预览版功能不附带服务级别协议,我们不建议将其用于生产工作负荷。Preview functionality is provided without a service level agreement, and is not recommended for production workloads. 有关详细信息,请参阅 Microsoft Azure 预览版补充使用条款For more information, see Supplemental Terms of Use for Microsoft Azure Previews. REST API 版本 2019-05-06-Preview 提供预览版功能。The REST API version 2019-05-06-Preview provides preview features. 目前提供有限的门户支持,不提供 .NET SDK 支持。There is currently limited portal support, and no .NET SDK support.

Azure 认知搜索通过内置认知技能和自定义技能作为索引的一部分,扩充内容。Azure Cognitive Search enables content enrichment through built-in cognitive skills and custom skills as part of indexing. 根据创建以前不存在的新信息:从图像中提取信息,检测情绪、关键短语和文本中的实体,只需对其进行命名。Enrichments create new information where none previously existed: extracting information from images, detecting sentiment, key phrases, and entities from text, to name a few. 根据还会将结构添加到无差别文本。Enrichments also add structure to undifferentiated text. 所有这些过程都将导致使全文搜索更有效的文档。All of these processes result in documents that make full text search more effective. 在许多情况下,对于除搜索以外的方案(例如用于知识挖掘),已丰富的文档很有用。In many instances, enriched documents are useful for scenarios other than search, such as for knowledge mining.

"知识库" 是 "知识存储" 的一个组件,它是可保存到物理存储中以供知识库挖掘使用的丰富文档的视图。Projections, a component of knowledge store, are views of enriched documents that can be saved to physical storage for knowledge mining purposes. 使用投影,您可以将数据 "投影" 到满足您的需求的形状,同时保留关系,以便 Power BI 的工具可以读取数据而无需额外的精力。A projection lets you "project" your data into a shape that aligns with your needs, preserving relationships so that tools like Power BI can read the data with no additional effort.

投影可以是表格形式的,其中的数据存储在 Azure 表存储中的行和列中,或存储在 Azure Blob 存储中的 JSON 对象。Projections can be tabular, with data stored in rows and columns in Azure Table storage, or JSON objects stored in Azure Blob storage. 您可以在数据被充实时定义多个数据投影。You can define multiple projections of your data as it is being enriched. 如果希望单个用例的数据形状不同,则多个投影会很有用。Multiple projections are useful when you want the same data shaped differently for individual use cases.

知识存储支持三种类型的投影:The knowledge store supports three types of projections:

  • :对于最能表示为行和列的数据,表投影允许在表存储中定义架构化形状或投影。Tables: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage. 只有有效的 JSON 对象可以投影为表,已扩充的文档可以包含未命名为 JSON 对象的节点,并在投影这些对象时,使用整形程序技能或内联造型创建有效的 JSON 对象。Only valid JSON objects can be projected as tables, the enriched document can contain nodes that are not named JSON objects and when projecting these objects, create a valid JSON object with a shaper skill or inline shaping.

  • 对象:如果需要数据和根据的 JSON 表示形式,则会将对象投影保存为 blob。Objects: When you need a JSON representation of your data and enrichments, object projections are saved as blobs. 只有有效的 JSON 对象可以投影为对象,已扩充的文档可以包含未命名为 JSON 对象的节点,并在投影这些对象时,使用整形程序技能或内联造型创建有效的 JSON 对象。Only valid JSON objects can be projected as objects, the enriched document can contain nodes that are not named JSON objects and when projecting these objects, create a valid JSON object with a shaper skill or inline shaping.

  • 文件:当需要保存从文档中提取的图像时,文件投影允许您将规范化映像保存到 blob 存储中。Files: When you need to save the images extracted from the documents, file projections allow you to save the normalized images to blob storage.

若要查看在上下文中定义的投影,请单步执行在 REST 中创建知识库To see projections defined in context, step through Create a knowledge store in REST.

投影组Projection groups

在某些情况下,需要在不同的形状中投影您的数据,以满足不同的目标。In some cases, you will need to project your enriched data in different shapes to meet different objectives. 知识库允许您定义多组投影。The knowledge store allows you to define multiple groups of projections. 投影组具有独占性和关联性的以下重要特性。Projection groups have the following key characteristics of mutual exclusivity and relatedness.

相互独占性Mutual exclusivity

投影到单个组中的所有内容都独立于投影到其他投影组中的数据。All content projected into a single group is independent of data projected into other projection groups. 这种独立性意味着,你可以让相同的数据形状不同,并且在每个投影组中重复。This independence implies that you can have the same data shaped differently, yet repeated in each projection group.

关联性Relatedness

投影组现在允许你跨投影类型投影文档,同时保留跨投影类型的关系。Projection groups now allow you to project your documents across projection types while preserving the relationships across projection types. 在单个投影组内投影的所有内容都将跨投影类型保留数据中的关系。All content projected within a single projection group preserves relationships within the data across projection types. 在表中,关系基于生成的键,每个子节点保留对父节点的引用。Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. 跨类型(表、对象和文件),当跨不同类型投影单个节点时,将保留关系。Across types (tables, objects and files), relationships are preserved when a single node is projected across different types. 例如,假设有一个文档包含图像和文本。For example, consider a scenario where you have a document containing images and text. 您可以为表或对象创建文本,并将图像投影到其中的表或对象具有包含文件 URL 的列/属性的文件。You could project the text to tables or objects and the images to files where the tables or objects have a column/property containing the file URL.

输入造型Input shaping

以正确的形状或结构获取数据是有效使用的关键,就是表或对象。Getting your data in the right shape or structure is key to effective use, be it tables or objects. 基于你计划访问和使用数据的方式对数据进行形状或结构设计的功能是在技能组合中作为整形程序技能公开的一项关键功能。The ability to shape or structure your data based on how you plan to access and use it is a key capability exposed as the Shaper skill within the skillset.

当在扩充树中具有与投影架构匹配的对象时,可以更轻松地定义投影。Projections are easier to define when you have an object in the enrichment tree that matches the schema of the projection. 更新的整形程序技能使你可以从扩充树的不同节点撰写对象,并将其作为新节点的父对象。The updated Shaper skill allows you to compose an object from different nodes of the enrichment tree and parent them under a new node. 利用整形技能,你可以通过嵌套对象定义复杂类型。The Shaper skill allows you to define complex types with nested objects.

如果定义了一个新的形状,其中包含需要投影的所有元素,则现在可以将此形状用作投影的源,或用作其他技能的输入。When you have a new shape defined that contains all the elements you need to project out, you can now use this shape as the source for your projections or as an input to another skill.

投影切片Projection slicing

定义投影组时,可以将扩充树中的单个节点切片为多个相关的表或对象。When defining a projection group, a single node in the enrichment tree can be sliced into multiple related tables or objects. 添加源路径为现有投影的子节点的投影将导致子节点从父节点上切分并投影到新的相关表或对象。Adding a projection with a source path that is a child of an existing projection will result in the child node being sliced out of the parent node and projected into the new yet related table or object. 利用此方法,您可以在可以作为所有投影的源的整形程序技能中定义单个节点。This technique allows you to define a single node in a shaper skill that can be the source for all of your projections.

表投影Table projections

由于这使得导入更简单,因此建议通过 Power BI 来浏览数据。Because it makes importing easier, we recommend table projections for data exploration with Power BI. 另外,表投影允许更改表关系之间的基数。Additionally, table projections allow for changing the cardinality between table relationships.

您可以将索引中的单个文档投影到多个表中,从而保留这些关系。You can project a single document in your index into multiple tables, preserving the relationships. 投影到多个表时,整个形状将投影到每个表中,除非子节点是同一组内另一个表的源。When projecting to multiple tables, the complete shape will be projected into each table, unless a child node is the source of another table within the same group.

定义表投影Defining a table projection

在技能组合的 knowledgeStore 元素中定义表投影时,首先将扩充树上的节点映射到表源。When defining a table projection within the knowledgeStore element of your skillset, start by mapping a node on the enrichment tree to the table source. 通常,此节点是添加到技能列表中的整形程序技能的输出,用于生成需要投影到表中的特定形状。Typically this node is the output of a Shaper skill that you added to the list of skills to produce a specific shape that you need to project into tables. 选择要投影到的节点可以切分到多个表中。The node you choose to project can be sliced to project into multiple tables. 表定义是要投影的表的列表。The tables definition is a list of tables that you want to project.

每个表都需要三个属性:Each table requires three properties:

  • tableName: Azure 存储中表的名称。tableName: The name of the table in Azure Storage.

  • generatedKeyName:唯一标识此行的键的列名。generatedKeyName: The column name for the key that uniquely identifies this row.

  • 源:你从其扩充的根据的节点。source: The node from the enrichment tree you are sourcing your enrichments from. 此节点通常是整形者的输出,但可以是任何技能的输出。This node is usually the output of a shaper, but could be the output of any of the skills.

下面是表投影的示例。Here is an example of table projections.

{
    "name": "your-skillset",
    "skills": [
      …your skills
    ],
"cognitiveServices": {
… your cognitive services key info
    },

    "knowledgeStore": {
      "storageConnectionString": "an Azure storage connection string",
      "projections" : [
        {
          "tables": [
            { "tableName": "MainTable", "generatedKeyName": "SomeId", "source": "/document/EnrichedShape" },
            { "tableName": "KeyPhrases", "generatedKeyName": "KeyPhraseId", "source": "/document/EnrichedShape/*/KeyPhrases/*" },
            { "tableName": "Entities", "generatedKeyName": "EntityId", "source": "/document/EnrichedShape/*/Entities/*" }
          ]
        },
        {
          "objects": [ ]
        },
        {
            "files": [ ]
        }
      ]
    }
}

如本示例中所示,关键短语和实体将建模为不同的表,并且将包含返回给每行的父(MainTable)的引用。As demonstrated in this example, the key phrases and entities are modeled into different tables and will contain a reference back to the parent (MainTable) for each row.

对象投影Object projections

对象投影是扩充树的 JSON 表示形式,可源自任何节点。Object projections are JSON representations of the enrichment tree that can be sourced from any node. 在许多情况下,创建表投影的相同整形程序技能可用于生成对象投影。In many cases, the same Shaper skill that creates a table projection can be used to generate an object projection.

{
    "name": "your-skillset",
    "skills": [
      …your skills
    ],
"cognitiveServices": {
… your cognitive services key info
    },

    "knowledgeStore": {
      "storageConnectionString": "an Azure storage connection string",
      "projections" : [
        {
          "tables": [ ]
        },
        {
          "objects": [
            {
              "storageContainer": "hotelreviews", 
              "source": "/document/hotel"
            }
          ]
        },
        {
            "files": [ ]
        }
      ]
    }
}

生成对象投影需要几个特定于对象的属性:Generating an object projection requires a few object-specific attributes:

  • storageContainer:将在其中保存对象的 blob 容器storageContainer: The blob container where the objects will be saved
  • 源:扩充树的节点的路径,该节点是投影的根source: The path to the node of the enrichment tree that is the root of the projection

文件投影File projection

文件投影类似于对象投影,只对 normalized_images 集合进行操作。File projections are similar to object projections and only act on the normalized_images collection. 与对象投影类似,文件投影保存在 blob 容器中,其文件夹前缀为文档 ID 的 base64 编码值。Similar to object projections, file projections are saved in the blob container with folder prefix of the base64 encoded value of the document ID. 文件投影不能与对象投影共享同一个容器,需要将其投影到不同的容器。File projections cannot share the same container as object projections and need to be projected into a different container.

{
    "name": "your-skillset",
    "skills": [
      …your skills
    ],
"cognitiveServices": {
… your cognitive services key info
    },

    "knowledgeStore": {
      "storageConnectionString": "an Azure storage connection string",
      "projections" : [
        {
          "tables": [ ]
        },
        {
          "objects": [ ]
        },
        {
            "files": [
                 {
                  "storageContainer": "ReviewImages",
                  "source": "/document/normalized_images/*"
                }
            ]
        }
      ]
    }
}

投影生命周期Projection lifecycle

您的投影具有与数据源中的源数据相关联的生命周期。Your projections have a lifecycle that is tied to the source data in your data source. 随着数据的更新和重新编制索引,将用根据的结果更新投影,确保预测最终与数据源中的数据保持一致。As your data is updated and reindexed, your projections are updated with the results of the enrichments ensuring your projections are eventually consistent with the data in your data source. 投影继承为索引配置的删除策略。The projections inherit the delete policy you've configured for your index. 删除索引器或搜索服务本身时不会删除投影。Projections are not deleted when the indexer or the search service itself is deleted.

使用投影Using projections

在运行索引器后,您可以读取通过投影指定的容器或表中的投影数据。After the indexer is run, you can read the projected data in the containers or tables you specified through projections.

对于分析,Power BI 中的浏览就像将 Azure 表存储设置为数据源一样简单。For analytics, exploration in Power BI is as simple as setting Azure Table storage as the data source. 您可以使用中的关系轻松地创建一组针对您的数据的可视化效果。You can easily create a set of visualizations on your data using the relationships within.

或者,如果您需要在数据科学管道中使用大量数据,则可以将blob 中的数据加载到 Pandas 数据帧中。Alternatively, if you need to use the enriched data in a data science pipeline, you could load the data from blobs into a Pandas DataFrame.

最后,如果需要从知识库中导出数据,Azure 数据工厂会提供用于导出数据的连接器,并将数据存储在所选数据库中。Finally, if you need to export your data from the knowledge store, Azure Data Factory has connectors to export the data and land it in the database of your choice.

后续步骤Next steps

下一步,使用示例数据和说明创建第一个知识存储。As a next step, create your first knowledge store using sample data and instructions.