将域或复合域附加到引用数据-Data Quality Services (DQS)Attach domain or composite domain to reference data - Data Quality Services (DQS)

适用于:Applies to: 是SQL ServerSQL Server(所有支持的版本)yesSQL ServerSQL Server (all supported versions) 适用于:Applies to: 是SQL ServerSQL Server(所有支持的版本)yesSQL ServerSQL Server (all supported versions)

本主题介绍如何将数据质量知识库中的域/复合域附加到 Azure Marketplace 中的引用数据服务,以便针对高质量引用数据生成知识。This topic describes how to attach domains/composite domains in a data quality knowledge base to a reference data service in Azure Marketplace to build knowledge against the high-quality reference data. 每个引用数据服务包含一个架构(数据列)。Each reference data service contains a schema (data columns). 在将域或复合域附加到引用数据服务后,必须将此附加域或所附加的复合域内的各个域映射到引用数据服务架构中的相应列。After attaching a domain or a composite domain to a reference data service, you must map the attached domain or the individual domains within the attached composite domain to the appropriate columns in a reference data service schema. 通过将复合域附加到引用数据服务,您可以只将一个域附加到引用数据服务,然后将复合域内的各域映射到引用数据服务架构中的相应列。Attaching a composite domain to a reference data service enables you to attach just one domain to a reference data service, and then map the individual domains within the composite domain to appropriate columns in the reference data service schema.

重要

本文提及以前可从 Azure DataMarket 获取的第三方参考数据服务。This article mentions third-party reference data services that were previously available from the Azure DataMarket. DataMarket 和数据服务(包括 Melissa 地址数据)在 2016 年 12 月 31 日之后已不再使用。DataMarket and Data Services - including Melissa address data, for example - were discontinued after 12/31/2016. 因此,无法继续使用 DataMarket 中的指定服务运行本文中的示例。As a result, you can no longer run the examples in this article with the specified services from DataMarket. 但仍可使用第三方参考数据提供商提供的在线参考数据服务。You can still use reference data services that are available directly online from third-party reference data providers.

警告

当将域映射到引用数据服务架构中的列时,附加到引用数据服务的复合域会出现在域下拉列表中。The composite domain attached to a reference data service is available in the domains drop-down list while mapping domains to the columns in the reference data service schema. 不要将复合域映射到引用数据服务中的列;只需将复合域内的各个域映射到引用数据服务架构中的相应列。Do not map the composite domain to a column in the reference data service schema; you must only map individual domains within a composite domain to the appropriate columns in the reference data service schema. 否则,将导致错误。Otherwise, it will result in an error.

如果您应该选择使用某一引用数据服务,则引用数据服务架构可能具有一个必须映射到适当域的必填列。A reference data service schema can have a mandatory column that must be mapped with appropriate domain should you choose to use the reference data service. 引用数据架构中的必填列使用“(M)”与列名称区分开来。The mandatory column in a reference data schema is identified with "(M)" against the column name. 例如,“AddressLine”是“Melissa Data - Address Data”中的必填架构列,而“CompanyName”是“Digital Trowel Inc. - Us companies and professional data for SQL users”中的必填架构列****************。For example, AddressLine is the mandatory schema column in Melissa Data - Address Data and CompanyName is the mandatory schema column in Digital Trowel Inc. - Us companies and professional data for SQL users.

在此主题中,我们将创建四个域:Address Line、City、State 和 Zip。在复合域 Address Verification 之下,将该复合域附加到 Melissa Data - Address Check 引用数据服务,然后将复合域内的各个域映射到引用数据服务架构中的相应列************************。In this topic, we will create four domains: Address Line, City, State, and Zip, under a composite domain, Address Verification, attach the composite domain to the Melissa Data - Address Check reference data service, and then map the individual domains within the composite domain to appropriate columns in the reference data service schema.

开始之前Before You Begin

先决条件Prerequisites

您必须配置了 “数据库引擎服务”Data Quality Services (DQS) 后才能使用引用数据服务。You must have configured “数据库引擎服务”Data Quality Services (DQS) to use reference data services. 请参阅将 DQS 配置为使用引用数据See Configure DQS to Use Reference Data.

SecuritySecurity

权限Permissions

您必须具有针对 DQS_MAIN 数据库的 dqs_kb_editor 角色才能将域映射到引用数据。You must have the dqs_kb_editor role on the DQS_MAIN database to map domains to reference data.

将域映射到 Melissa 数据中的引用数据Map domains to reference data from Melissa Data

  1. 启动数据质量客户端。Start Data Quality Client. 有关此操作的信息,请参阅For information about doing so, see运行 Data Quality Client 应用程序Run the Data Quality Client Application.

  2. 数据质量客户端Data Quality Client 主屏幕中,在 “知识库管理” 下,单击 “新建知识库”In the 数据质量客户端Data Quality Client home screen, under Knowledge Base Management, click New knowledge base.

  3. “新建知识库” 屏幕上,为新的知识库键入名称,单击 “域管理” 活动,然后单击 “创建”In the New knowledge base screen, type a name for the new knowledge base, click the Domain Management activity, and click Create.

  4. “域管理” 屏幕中,单击 “创建域” 图标以创建一个域。In the Domain Management screen, click the Create a domain icon to create a domain. 创建下列四个域: Address LineCityStateZipCreate the following four domains: Address Line, City, State, and Zip.

  5. 单击 “创建复合域” 图标以便创建一个复合域。Click the Create a composite domain icon to create a composite domain. “创建复合域” 对话框中,在 “复合域名称” 框中键入 Address Verification ,并且在该复合域中包括在步骤 3 中创建的所有域。In the Create a composite domain dialog box, type Address Verification in the Composite Domain Name box, and include all the domains created in step 3 in the composite domain. 单击 “确定”Click OK.

  6. 在左侧的 “域” 窗格中,通过单击 Address Verification选择该复合域,然后单击右侧的 “引用数据” 选项卡。In the Domain pane on the left side, select the composite domain by clicking Address Verification, and then click the Reference Data tab on the right side.

  7. 单击 “浏览” 图标。Click the Browse icon.

  8. “联机引用数据提供程序目录” 对话框中:In the Online Reference Data Providers Catalog dialog box:

    1. 在“DataMarket Data Quality Services”下,选中“Melissa Data - Address”复选框********。Under DataMarket Data Quality Services, select the Melissa Data - Address Check box.

    2. 将 Melissa Data - Address Check 引用数据服务的列映射到适当的域(Address Line、City、State 和 Zip)。Map the columns of the Melissa Data - Address Check reference data service with the appropriate domains (Address Line, City, State, and Zip). 您通过在 “RDS 架构” 列中选择某一引用数据服务列,然后在 “域” 列中选择适当的域,映射这些列。You map the columns by selecting a reference data service column in the RDS Schema column, and then selecting the appropriate domain in the Domain column. 若要在表中添加更多的行,请单击 “添加架构项” 图标。To add more rows in the table, click the Add Schema Entry icon.

    3. 单击 “确定” 保存更改并关闭 “联机引用数据提供程序目录” 对话框。Click OK to save the changes, and close the Online Reference Data Providers Catalog dialog box.

      “联机引用数据访问接口目录”对话框Online Reference Data Providers Catalog dialog box

      备注

      • 在 "联机引用数据提供程序目录" 对话框中, DataMarket data Quality Services节点将显示已在 Azure Marketplace 中订阅的所有引用数据服务提供程序。In the Online Reference Data Providers Catalog dialog box, the DataMarket Data Quality Services node displays all the reference data service providers that you have subscribed to in Azure Marketplace. 如果您已在 DQS 中配置了直接联机第三方引用数据服务提供程序,则它们将显示在称作 “第三方直接联机提供程序” 的另一个节点下(现在未提供,因为没有在 DQS 中配置直接联机第三方引用数据服务提供程序)。If you have configured direct online third-party reference data service providers in DQS, they will appear under another node called 3rd Party Direct Online Providers (not available now as no direct online third-party reference data service providers are configured in DQS).
  9. 您将返回到 "引用数据" 选项卡。如果需要,请在 "提供程序设置" 区域中更改以下框中的值:You will return to the Reference Data tab. In the Provider Settings area, change values in the following boxes, if required:

    • 自动更正阈值:将自动完成从其置信度高于此域值的引用数据服务进行的更正。Auto Correction Threshold: Corrections from reference data service with confidence level above this threshold values will be automatically done. 以相应百分比值的小数表示形式输入一个值。Enter a value in the decimal notation of the corresponding percentage value. 例如,对于 90% 输入 0.9。For example, enter 0.9 for 90%.

    • 建议的候选项:要从引用数据服务显示的建议候选项的数目。Suggested Candidates: Number of suggested candidates to display from the reference data service.

    • 最低置信度:将忽略来自其置信度低于该值的引用数据服务的建议。Min Confidence: Suggestions from reference data service with confidence level lower than this value will be ignored. 以相应百分比值的小数表示形式输入一个值。Enter a value in the decimal notation of the corresponding percentage value. 例如,对于 60% 输入 0.6。For example, enter 0.6 for 60%.

  10. 单击 “完成” 将发布知识库。Click Finish to publish the knowledge base. 在知识库成功发布后,将会出现一条确认消息。A confirmation message appears after the knowledge base is published successfully.

现在可以将此知识库用于数据质量项目中的清理活动,以便基于 Melissa Data 通过 Azure Marketplace 提供的知识标准化和清理源数据中的美国地址。You can now use this knowledge base for cleansing activity in a data quality project to standardize and cleanse US addresses in your source data based on the knowledge provided by Melissa Data through Azure Marketplace.

跟进:在将域映射到引用数据后Follow Up: After Mapping a Domain to Reference Data

创建一个数据质量项目,然后通过将其与本主题中创建的知识库进行比较,对包含美国地址的源数据运行清理活动。Create a data quality project, and run the cleansing activity on your source data containing US addresses by comparing it against the knowledge base created in this topic. 请参阅使用引用数据(外部)知识清理数据See Cleanse Data Using Reference Data (External) Knowledge.

另请参阅See Also

DQS 中的引用数据服务 Reference Data Services in DQS
数据清理Data Cleansing