CREATE EXTERNAL DATA SOURCE (Transact-SQL)CREATE EXTERNAL DATA SOURCE (Transact-SQL)

使用 SQL Server、SQL 数据库、SQL 数据仓库或分析平台系统(并行数据仓库或 PDW)创建进行查询的外部数据源。Creates an external data source for querying using SQL Server, SQL Database, SQL Data Warehouse, or Analytics Platform System (Parallel Data Warehouse or PDW).

本文提供所选任何 SQL 产品的语法、参数、注解、权限和示例。This article provides the syntax, arguments, remarks, permissions, and examples for whichever SQL product you choose.

有关语法约定的详细信息,请参阅 Transact-SQL 语法约定For more information about the syntax conventions, see Transact-SQL Syntax Conventions.

单击一个产品!Click a product!

在下一行中,单击你感兴趣的产品名称。In the following row, click whichever product name you are interested in. 单击时此网页上的此位置会显示适合你单击的任何产品的不同内容。The click displays different content here on this webpage, appropriate for whichever product you click.

* SQL Server *  * SQL Server *   SQL 数据库SQL Database SQL 数据
数据仓库
SQL Data
Warehouse
Analytics Platform
System (PDW)
Analytics Platform
System (PDW)

 

概述:SQL ServerOverview: SQL Server

为 PolyBase 查询创建外部数据源。Creates an external data source for PolyBase queries. 外部数据源用于建立连接以及支持以下这些用例:External data sources are used to establish connectivity and support these primary use cases:

  • 使用 PolyBase 执行数据虚拟化和数据加载Data virtualization and data load using PolyBase
  • 使用 BULK INSERTOPENROWSET 大容量加载操作Bulk load operations using BULK INSERT or OPENROWSET

适用范围: SQL Server 2016 或更高版本APPLIES TO: SQL Server 2016 (or higher)

语法Syntax

CREATE EXTERNAL DATA SOURCE <data_source_name>  
WITH
(    LOCATION                  = '<prefix>://<path>[:<port>]'
[,   CONNECTION_OPTIONS        = '<name_value_pairs>']
[,   CREDENTIAL                = <credential_name> ]
[,   PUSHDOWN                  = ON | OFF]
[,   TYPE                      = HADOOP | BLOB_STORAGE ]
[,   RESOURCE_MANAGER_LOCATION = '<resource_manager>[:<port>]'
)
[;]

参数Arguments

data_source_namedata_source_name

指定数据源的用户定义名称。Specifies the user-defined name for the data source. 该名称在 SQL Server 数据库中必须唯一。The name must be unique within the database in SQL Server.

LOCATION = '<prefix>://<path[:port]>'LOCATION = '<prefix>://<path[:port]>'

提供连接协议和外部数据源的路径。Provides the connectivity protocol and path to the external data source.

外部数据源External Data Source 位置前缀Location prefix 位置路径Location path 产品/服务支持的位置Supported locations by product / service
Cloudera 或 HortonworksCloudera or Hortonworks hdfs <Namenode>[:port] SQL Server (2016+)SQL Server (2016+)
Azure Blob 存储Azure Blob Storage wasb[s] <container>@<storage_account>.blob.core.windows.net SQL Server (2016+)SQL Server (2016+)
SQL ServerSQL Server sqlserver <server_name>[\<instance_name>][:port] SQL Server (2019+)SQL Server (2019+)
OracleOracle oracle <server_name>[:port] SQL Server (2019+)SQL Server (2019+)
TeradataTeradata teradata <server_name>[:port] SQL Server (2019+)SQL Server (2019+)
MongoDB 或 CosmosDBMongoDB or CosmosDB mongodb <server_name>[:port] SQL Server (2019+)SQL Server (2019+)
ODBCODBC odbc <server_name>[:port] SQL Server (2019+) - 仅 WindowsSQL Server (2019+) - Windows only
批量操作Bulk Operations https <storage_account>.blob.core.windows.net/<container> SQL Server (2017+)SQL Server (2017+)

位置路径:Location path:

  • <Namenode> = Hadoop 群集中 Namenode 的计算机名称、名称服务 URI 或 IP 地址。<Namenode> = the machine name, name service URI, or IP address of the Namenode in the Hadoop cluster. PolyBase 必须解析 Hadoop 群集使用的任何 DNS 名称。PolyBase must resolve any DNS names used by the Hadoop cluster.
  • port = 外部数据源侦听的端口。port = The port that the external data source is listening on. 在 Hadoop 中,可以使用 fs.defaultFS 配置参数查找该端口。In Hadoop, the port can be found using the fs.defaultFS configuration parameter. 默认值为 8020。The default is 8020.
  • <container> = 保存数据的存储帐户的容器。<container> = the container of the storage account holding the data. 根容器是只读的,数据无法写回容器。Root containers are read-only, data can't be written back to the container.
  • <storage_account> = azure 资源的存储帐户名称。<storage_account> = the storage account name of the azure resource.
  • <server_name> = 主机名。<server_name> = the host name.
  • <instance_name> = SQL Server 命名实例的名称。<instance_name> = the name of the SQL Server named instance. 如果在目标实例上运行 SQL Server Browser 服务,则使用此路径。Used if you have SQL Server Browser Service running on the target instance.

设置位置时的其他说明和指南:Additional notes and guidance when setting the location:

  • 创建对象时,SQL 引擎不会验证外部数据源是否存在。The SQL engine doesn't verify the existence of the external data source when the object is created. 要进行验证,请使用外部数据源创建外部表。To validate, create an external table using the external data source.
  • 查询 Hadoop 时,所有表使用相同的外部数据源,以确保查询语义一致。Use the same external data source for all tables when querying Hadoop to ensure consistent querying semantics.
  • 可使用 sqlserver 位置前缀,将 SQL Server 2019 连接到 SQL Server、SQL 数据库或 SQL 数据仓库。You can use the sqlserver location prefix to connect SQL Server 2019 to SQL Server, SQL Database, or SQL Data Warehouse.
  • 通过 ODBC 连接时,请指定 Driver={<Name of Driver>}Specify the Driver={<Name of Driver>} when connecting via ODBC.
  • wasb 是 Azure blob 存储的默认协议。wasb is the default protocol for Azure blob storage. wasbs 是可选的,但建议使用,因为会使用安全的 SSL 连接发送数据。wasbs is optional but recommended as data will be sent using a secure SSL connection.
  • 要确保在 Hadoop Namenode 故障转移期间成功进行 PolyBase 查询,请考虑针对 Hadoop 群集的 Namenode 使用虚拟 IP 地址。To ensure successful PolyBase queries during a Hadoop Namenode fail-over, consider using a virtual IP address for the Namenode of the Hadoop cluster. 如果不这样做,请执行 ALTER EXTERNAL DATA SOURCE 命令以指向新位置。If you don't, execute an ALTER EXTERNAL DATA SOURCE command to point to the new location.

CONNECTION_OPTIONS = key_value_pair CONNECTION_OPTIONS = key_value_pair

通过 ODBC 连接到外部数据源时指定其他选项。Specifies additional options when connecting over ODBC to an external data source.

驱动程序的名称是必需的(最基本的要求),但设置其他选项(例如 APP='<your_application_name>'ApplicationIntent= ReadOnly|ReadWrite)也很有用,可以帮助进行故障排除。The name of the driver is required as a minimum but there are other options such as APP='<your_application_name>' or ApplicationIntent= ReadOnly|ReadWrite which are also useful to set and can assist with troubleshooting.

请参阅 ODBC 产品文档,获取允许的 CONNECTION_OPTIONS 列表Refer to the ODBC product documentation for a list of permitted CONNECTION_OPTIONS

PUSHDOWN = 打开 | 关闭 PUSHDOWN = ON | OFF

说明是否可以将计算下推到外部数据源。States whether computation can be pushed down to the external data source. 它默认处于打开状态。It is on by default.

在外部数据源级别连接到 SQL Server、Oracle、Teradata、MongoDB 或 ODBC 时,PUSHDOWN 受支持。PUSHDOWN is supported when connecting to SQL Server, Oracle, Teradata, MongoDB, or ODBC at the external data source level.

通过提示实现在查询级别启用或禁用下推。Enabling or disabling push-down at the query level is achieved through a hint.

CREDENTIAL = credential_name CREDENTIAL = credential_name

指定用于对外部数据源进行身份验证的数据库范围凭据。Specifies a database-scoped credential for authenticating to the external data source.

创建凭证时的其他说明和指导:Additional notes and guidance when creating a credential:

  • 只有在 blob 得到保护的情况下才需要 CREDENTIALCREDENTIAL is only required if the blob has been secured. 允许匿名访问的数据集不需要 CREDENTIALCREDENTIAL isn't required for data sets that allow anonymous access.
  • TYPE = BLOB_STORAGE 时,必须使用 SHARED ACCESS SIGNATURE 作为标识创建凭据。When the TYPE = BLOB_STORAGE the credential must be created using SHARED ACCESS SIGNATURE as the identity. 此外,应按如下所示配置 SAS 令牌:Furthermore, the SAS token should be configured as follows:
    • 配置为密码时排除前导 ?Exclude the leading ? when configured as the secret
    • 至少对应加载的文件具有读取权限(例如 srt=o&sp=rHave at least read permission on the file that should be loaded (for example srt=o&sp=r)
    • 使用有效的有效期(所有日期均采用 UTC 时间)。Use a valid expiration period (all dates are in UTC time).

有关使用具有 SHARED ACCESS SIGNATURETYPE = BLOB_STORAGECREDENTIAL 的示例,请参阅创建外部数据源以执行批量操作并将数据从 Azure Blob 存储检索到 SQL 数据库For an example of using a CREDENTIAL with SHARED ACCESS SIGNATURE and TYPE = BLOB_STORAGE, see Create an external data source to execute bulk operations and retrieve data from Azure Blob Storage into SQL Database

要创建数据库范围凭据,请参阅 CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)To create a database scoped credential, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).

TYPE = [HADOOP | BLOB_STORAGE ] TYPE = [ HADOOP | BLOB_STORAGE ]

指定要配置的外部数据源的类型。Specifies the type of the external data source being configured. 此参数并非总是必需的。This parameter isn't always required.

  • 如果外部数据源是 Cloudera、Hortonworks、Azure Blob 存储,请使用 HADOOP。Use HADOOP when the external data source is Cloudera, Hortonworks, or Azure Blob Storage.
  • 在使用 BULK INSERTOPENROWSETSQL Server 2017 (14.x)SQL Server 2017 (14.x) 执行批量操作时,可使用 BLOB_STORAGE。Use BLOB_STORAGE when executing bulk operations with BULK INSERT, or OPENROWSET with SQL Server 2017 (14.x)SQL Server 2017 (14.x).

重要

如果使用任何其他外部数据源,请勿设置 TYPEDo not set TYPE if using any other external data source.

有关使用 TYPE = HADOOP 从 Azure Blob 存储加载数据的示例,请参阅创建外部数据源以引用 Azure Blob 存储For an example of using TYPE = HADOOP to load data from Azure Blob Storage, see Create external data source to reference Azure blob storage.

RESOURCE_MANAGER_LOCATION = 'ResourceManager_URI[:port]' RESOURCE_MANAGER_LOCATION = 'ResourceManager_URI[:port]'

连接到 Hortonworks 或 Cloudera 时配置此可选值。Configure this optional value when connecting to Hortonworks or Cloudera.

定义 RESOURCE_MANAGER_LOCATION 后,查询优化器将根据成本做出决策以提高性能。When the RESOURCE_MANAGER_LOCATION is defined, the query optimizer will make a cost-based decision to improve performance. MapReduce 作业可用于将计算下推到 Hadoop。A MapReduce job can be used to push down the computation to Hadoop. 指定 RESOURCE_MANAGER_LOCATION 可以显着减少 Hadoop 和 SQL 之间传输的数据量,从而提高查询性能。Specifying the RESOURCE_MANAGER_LOCATION can significantly reduce the volume of data transferred between Hadoop and SQL, which can lead to improved query performance.

如果未指定资源管理器,则会为 PolyBase 查询禁用到 Hadoop 的计算下推。If the Resource Manager isn't specified, pushing compute to Hadoop is disabled for PolyBase queries.

如果未指定端口,则使用“hadoop 连接”配置的当前设置选择默认值。If the port isn't specified, the default value is chosen using the current setting for 'hadoop connectivity' configuration.

Hadoop 连接Hadoop Connectivity 默认资源管理器端口Default Resource Manager Port
11 5030050300
22 5030050300
33 80218021
44 80328032
55 80508050
66 80328032
77 80508050

有关受支持的 Hadoop 版本的完整列表,请参阅 PolyBase 连接配置 (Transact-SQL)For a complete list of supported Hadoop versions, see PolyBase Connectivity Configuration (Transact-SQL).

重要

创建外部数据源时,不会验证 RESOURCE_MANAGER_LOCATION 值。The RESOURCE_MANAGER_LOCATION value is not validated when you create the external data source. 每次尝试下推时,输入不正确的值都可能会导致查询执行失败,因为提供的值无法解析。Entering an incorrect value may cause query failure at execution time whenever push-down is attempted as the provided value would not be able to resolve.

创建外部数据源以引用启用了下推功能的 Hadoop 中提供了具体示例和详细指南。Create external data source to reference Hadoop with push-down enabled provides a concrete example and further guidance.

权限Permissions

需要 SQL Server 的数据库上的 CONTROL 权限。Requires CONTROL permission on database in SQL Server.

锁定Locking

在 EXTERNAL DATA SOURCE 对象上采用共享锁。Takes a shared lock on the EXTERNAL DATA SOURCE object.

SecuritySecurity

PolyBase 支持大多数外部数据源的基于代理的身份验证。PolyBase supports proxy based authentication for most external data sources. 创建数据库范围凭据以创建代理帐户。Create a database scoped credential to create the proxy account.

连接到 SQL Server 大数据群集中的存储或数据池时,会将用户的凭据传递到后端系统。When you connect to the storage or data pool in a SQL Server big data cluster, the user's credentials are passed through to the back-end system. 在数据池本身中创建登录名以启用直通身份验证。Create logins in the data pool itself to enable pass through authentication.

目前,不支持类型为 HADOOP 的 SAS 令牌。Currently a SAS token with type HADOOP is unsupported. 它仅在使用存储帐户访问密钥时才受支持。It's only supported with a storage account access key. 尝试创建类型为 HADOOP 的外部数据源和使用 SAS 凭据失败,并显示以下错误:Attempting to create an external data source with type HADOOP and a SAS credential fails with the following error:

Msg 105019, Level 16, State 1 - EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect. Java exception message: Parameters provided to connect to the Azure storage account are not valid.: Error [Parameters provided to connect to the Azure storage account are not valid.] occurred while accessing external file.'

示例:SQL Server (2016+)Examples: SQL Server (2016+)

A.A. 在 SQL 2019 中创建外部数据源以引用 OracleCreate external data source in SQL 2019 to reference Oracle

要创建引用 Oracle 的外部数据源,请确保具有数据库范围凭据。To create an external data source that references Oracle, ensure you have a database scoped credential. 也可以选择针对此数据源启用或禁用计算下推。You may optionally also enable or disable push-down of computation against this data source.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '!MyC0mpl3xP@ssw0rd!
;

-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL OracleProxyAccount
WITH
     IDENTITY   = 'oracle_username'
,    SECRET     = 'oracle_password'
;

CREATE EXTERNAL DATA SOURCE MyOracleServer
WITH
(    LOCATION   = 'oracle://145.145.145.145:1521'
,    CREDENTIAL = OracleProxyAccount
,    PUSHDOWN   = ON
;

有关其他数据源(如 MongoDB)的更多示例,请参阅配置 PolyBase 以访问 MongoDB 中的外部数据For additional examples to other data sources such as MongoDB, see Configure PolyBase to access external data in MongoDB

B.B. 创建外部数据源以引用 HadoopCreate external data source to reference Hadoop

若要创建外部数据源以引用 Hortonworks 或 Cloudera Hadoop 群集,请指定 Hadoop Namenode 的计算机名称或 IP 地址以及端口。To create an external data source to reference your Hortonworks or Cloudera Hadoop cluster, specify the machine name, or IP address of the Hadoop Namenode and port.

CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION = 'hdfs://10.10.10.10:8050'
,    TYPE     = HADOOP
)
;

C.C. 创建外部数据源以引用 Hadoop 并启用下推Create external data source to reference Hadoop with push-down enabled

指定 RESOURCE_MANAGER_LOCATION 选项以便为 PolyBase 查询启用到 Hadoop 的下推计算。Specify the RESOURCE_MANAGER_LOCATION option to enable push-down computation to Hadoop for PolyBase queries. 启用后,PolyBase 会根据成本作出决策,以确定是否应将查询计算下推到 Hadoop。Once enabled, PolyBase makes a cost-based decision to determine whether the query computation should be pushed to Hadoop.

CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION                  = 'hdfs://10.10.10.10:8020'
,    TYPE                      = HADOOP
,    RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;

D.D. 创建外部数据源以引用受 Kerberos 保护的 HadoopCreate external data source to reference Kerberos-secured Hadoop

若要验证 Hadoop 群集是否受 Kerberos 保护,请检查 Hadoop core-site.xml 中的 hadoop.security.authentication 属性值。To verify if the Hadoop cluster is Kerberos-secured, check the value of hadoop.security.authentication property in Hadoop core-site.xml. 若要引用受 Kerberos 保护的 Hadoop 群集,必须指定包含 Kerberos 用户名和密码的数据库范围凭据。To reference a Kerberos-secured Hadoop cluster, you must specify a database scoped credential that contains your Kerberos username and password. 数据库主密钥用于加密数据库范围凭据密钥。The database master key is used to encrypt the database scoped credential secret.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Kerberos user name and password.
CREATE DATABASE SCOPED CREDENTIAL HadoopUser1
WITH
     IDENTITY   = '<hadoop_user_name>'
,    SECRET     = '<hadoop_password>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION                  = 'hdfs://10.10.10.10:8050'
,    CREDENTIAL                = HadoopUser1
,    TYPE                      = HADOOP
,    RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;

E.E. 创建外部数据源以引用 Azure blob 存储Create external data source to reference Azure blob storage

在此示例中,外部数据源是名为 logs 的 Azure 存储帐户下的 Azure blob 存储容器(名为 daily)。In this example, the external data source is an Azure blob storage container called daily under Azure storage account named logs. Azure 存储外部数据源仅用于数据传输。The Azure storage external data source is for data transfer only. 它不支持谓词下推。It doesn't support predicate push-down.

此示例演示如何创建数据库范围凭据以用于对 Azure 存储进行身份验证。This example shows how to create the database scoped credential for authentication to Azure storage. 在数据库凭据机密中指定 Azure 存储帐户密钥。Specify the Azure storage account key in the database credential secret. 可以在数据库范围凭据标识中指定任何字符串,因为在对 Azure 存储进行身份验证的过程中不会使用它。You can specify any string in database scoped credential identity as it isn't used during authentication to Azure storage.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
     IDENTITY   = '<my_account>'
,    SECRET     = '<azure_storage_account_key>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyAzureStorage
WITH
(    LOCATION   = 'wasbs://daily@logs.blob.core.windows.net/'
,    CREDENTIAL = AzureStorageCredential
,    TYPE       = HADOOP
)
;

示例:批量操作Examples: Bulk Operations

备注

为批量操作配置外部数据源时,请勿在 LOCATION URL 的末尾放置尾随 /、文件名或共享访问签名参数 。Do not put a trailing /, file name, or shared access signature parameters at the end of the LOCATION URL when configuring an external data source for bulk operations.

F.F. 创建外部数据源以用于从 Azure Blob 存储检索数据的批量操作Create an external data source for bulk operations retrieving data from Azure Blob storage

适用于: SQL Server 2017 (14.x)SQL Server 2017 (14.x)Applies to: SQL Server 2017 (14.x)SQL Server 2017 (14.x). 对使用 BULK INSERTOPENROWSET 的批量操作使用以下数据源。Use the following data source for bulk operations using BULK INSERT or OPENROWSET. 凭据必须设置 SHARED ACCESS SIGNATURE 作为标识、不应在 SAS 令牌中具有前导 ?、必须对应加载的文件(例如 srt=o&sp=r)至少具有读取权限,并且有效期应有效(所有日期均采用 UTC 时间)。The credential must set SHARED ACCESS SIGNATURE as the identity, mustn't have the leading ? in the SAS token, must have at least read permission on the file that should be loaded (for example srt=o&sp=r), and the expiration period should be valid (all dates are in UTC time). 有关共享访问签名的详细信息,请参阅使用共享访问签名 (SAS)For more information on shared access signatures, see Using Shared Access Signatures (SAS).

CREATE DATABASE SCOPED CREDENTIAL AccessAzureInvoices
WITH
     IDENTITY = 'SHARED ACCESS SIGNATURE'
--   REMOVE ? FROM THE BEGINNING OF THE SAS TOKEN
,    SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************'
;

CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH
(    LOCATION   = 'https://newinvoices.blob.core.windows.net/week3'
,    CREDENTIAL = AccessAzureInvoices
,    TYPE       = BLOB_STORAGE
)
;

若要查看这一使用中的示例,请参阅 BULK INSERTTo see this example in use, see BULK INSERT.

另请参阅See Also

SQL ServerSQL Server SQL 数据库 **  * SQL Database *   SQL 数据
数据仓库
SQL Data
Warehouse
Analytics Platform
System (PDW)
Analytics Platform
System (PDW)

 

概述:Azure SQL DatabaseOverview: Azure SQL Database

为弹性查询创建外部数据源。Creates an external data source for elastic queries. 外部数据源用于建立连接以及支持以下这些用例:External data sources are used to establish connectivity and support these primary use cases:

  • 使用 BULK INSERTOPENROWSET 大容量加载操作Bulk load operations using BULK INSERT or OPENROWSET
  • 使用弹性查询通过 SQL 数据库查询远程 SQL 数据库或 SQL 数据仓库实例Query remote SQL Database or SQL Data Warehouse instances using SQL Database with elastic query
  • 使用弹性查询查询分片的 Azure SQL 数据库Query a sharded Azure SQL Database using elastic query

语法Syntax

CREATE EXTERNAL DATA SOURCE <data_source_name>  
WITH
(    LOCATION                  = '<prefix>://<path>[:<port>]'
[,   CREDENTIAL                = <credential_name> ]
[,   TYPE                      = BLOB_STORAGE | RDBMS | SHARD_MAP_MANAGER ]
[,   DATABASE_NAME             = '<database_name>' ]
[,   SHARD_MAP_NAME            = '<shard_map_manager>' ]
)
[;]

参数Arguments

data_source_namedata_source_name

指定数据源的用户定义名称。Specifies the user-defined name for the data source. 该名称在 SQL 数据库 (SQL DB) 中必须唯一。The name must be unique within the database in SQL Database (SQL DB).

LOCATION = '<prefix>://<path[:port]>'LOCATION = '<prefix>://<path[:port]>'

提供连接协议和外部数据源的路径。Provides the connectivity protocol and path to the external data source.

外部数据源External Data Source 位置前缀Location prefix 位置路径Location path
批量操作Bulk Operations https <storage_account>.blob.core.windows.net/<container>
弹性查询(分片)Elastic Query (shard) 可选Not required <shard_map_server_name>.database.windows.net
弹性查询(远程)Elastic Query (remote) 可选Not required <remote_server_name>.database.windows.net

位置路径:Location path:

  • <shard_map_server_name> = Azure 中托管分片映射管理器的逻辑服务器名称。<shard_map_server_name> = The logical server name in Azure that is hosting the shard map manager. DATABASE_NAME 参数提供用于托管分片映射的数据库,SHARD_MAP_NAME 用于分片映射本身。The DATABASE_NAME argument provides the database used to host the shard map and SHARD_MAP_NAME is used for the shard map itself.
  • <remote_server_name> = 弹性查询的目标逻辑服务器名称。<remote_server_name> = The target logical server name for the elastic query. 使用 DATABASE_NAME 参数指定数据库名称。The database name is specified using the DATABASE_NAME argument.

设置位置时的其他说明和指南:Additional notes and guidance when setting the location:

  • 创建对象时,SQL 数据库引擎不会验证外部数据源是否存在。The SQL Database engine doesn't verify the existence of the external data source when the object is created. 要进行验证,请使用外部数据源创建外部表。To validate, create an external table using the external data source.

CREDENTIAL = credential_name CREDENTIAL = credential_name

指定用于对外部数据源进行身份验证的数据库范围凭据。Specifies a database-scoped credential for authenticating to the external data source.

创建凭证时的其他说明和指导:Additional notes and guidance when creating a credential:

  • 若要从 Azure Blob 存储加载数据到 SQL 数据库,请使用 Azure 存储密钥。To load data from Azure Blob storage into SQL Database, use an Azure Storage Key.
  • 只有在 blob 得到保护的情况下才需要 CREDENTIALCREDENTIAL is only required if the blob has been secured. 允许匿名访问的数据集不需要 CREDENTIALCREDENTIAL isn't required for data sets that allow anonymous access.
  • TYPE = BLOB_STORAGE 时,必须使用 SHARED ACCESS SIGNATURE 作为标识创建凭据。When the TYPE = BLOB_STORAGE the credential must be created using SHARED ACCESS SIGNATURE as the identity. 此外,应按如下所示配置 SAS 令牌:Furthermore, the SAS token should be configured as follows:
    • 配置为密码时排除前导 ?Exclude the leading ? when configured as the secret
    • 至少对应加载的文件具有读取权限(例如 srt=o&sp=rHave at least read permission on the file that should be loaded (for example srt=o&sp=r)
    • 使用有效的有效期(所有日期均采用 UTC 时间)。Use a valid expiration period (all dates are in UTC time).

有关使用具有 SHARED ACCESS SIGNATURETYPE = BLOB_STORAGECREDENTIAL 的示例,请参阅创建外部数据源以执行批量操作并将数据从 Azure Blob 存储检索到 SQL 数据库For an example of using a CREDENTIAL with SHARED ACCESS SIGNATURE and TYPE = BLOB_STORAGE, see Create an external data source to execute bulk operations and retrieve data from Azure Blob Storage into SQL Database

要创建数据库范围凭据,请参阅 CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)To create a database scoped credential, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).

TYPE = [ BLOB_STORAGE | RDBMS | SHARD_MAP_MANAGER] TYPE = [ BLOB_STORAGE | RDBMS | SHARD_MAP_MANAGER]

指定要配置的外部数据源的类型。Specifies the type of the external data source being configured. 此参数并非总是必需的。This parameter isn't always required.

  • 使用 RDBMS 通过 SQL 数据库中的弹性查询进行跨数据库查询。Use RDBMS for cross-database queries using elastic query from SQL Database.
  • 在连接到分片的 SQL 数据库时,使用 SHARD_MAP_MANAGER 创建外部数据源。Use SHARD_MAP_MANAGER when creating an external data source when connecting to a sharded SQL Database.
  • 在使用 BULK INSERTOPENROWSET 执行批量操作时,可使用 BLOB_STORAGE。Use BLOB_STORAGE when executing bulk operations with BULK INSERT, or OPENROWSET.

重要

如果使用任何其他外部数据源,请勿设置 TYPEDo not set TYPE if using any other external data source.

DATABASE_NAME = database_name DATABASE_NAME = database_name

TYPE 设置为 RDBMSSHARD_MAP_MANAGER 时,配置此参数。Configure this argument when the TYPE is set to RDBMS or SHARD_MAP_MANAGER.

TYPETYPE DATABASE_NAME 的值Value of DATABASE_NAME
RDBMSRDBMS 使用 LOCATION 提供的服务器上的远程数据库的名称The name of the remote database on the server provided using LOCATION
SHARD_MAP_MANAGERSHARD_MAP_MANAGER 作为分片映射管理器运行的数据库的名称Name of the database operating as the shard map manager

有关如何创建 TYPE = RDBMS 的外部数据源的示例,请参阅创建 RDBMS 外部数据源For an example showing how to create an external data source where TYPE = RDBMS refer to Create an RDBMS external data source

SHARD_MAP_NAME = shard_map_name SHARD_MAP_NAME = shard_map_name

TYPE 参数设置为 SHARD_MAP_MANAGER 时使用,仅用于设置分片映射的名称。Used when the TYPE argument is set to SHARD_MAP_MANAGER only to set the name of the shard map.

有关如何创建 TYPE = SHARD_MAP_MANAGER 的外部数据源的示例,请参阅创建分片映射管理器外部数据源For an example showing how to create an external data source where TYPE = SHARD_MAP_MANAGER refer to Create a shard map manager external data source

权限Permissions

需要 SQL 数据库中的数据库的 CONTROL 权限。Requires CONTROL permission on database in SQL Database.

锁定Locking

在 EXTERNAL DATA SOURCE 对象上采用共享锁。Takes a shared lock on the EXTERNAL DATA SOURCE object.

示例:Examples:

A.A. 创建分片映射管理器外部数据源Create a shard map manager external data source

若要创建外部数据源以引用 SHARD_MAP_MANAGER,请指定托管 SQL 数据库中的分片映射管理器或虚拟机上的 SQL Server 数据库的 SQL 数据库服务器名称。To create an external data source to reference a SHARD_MAP_MANAGER, specify the SQL Database server name that hosts the shard map manager in SQL Database or a SQL Server database on a virtual machine.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

CREATE DATABASE SCOPED CREDENTIAL ElasticDBQueryCred
WITH
     IDENTITY   = '<username>'
,    SECRET     = '<password>'
;

CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
(    TYPE             = SHARD_MAP_MANAGER
,    LOCATION         = '<server_name>.database.windows.net'
,    DATABASE_NAME    = 'ElasticScaleStarterKit_ShardMapManagerDb'
,    CREDENTIAL       = ElasticDBQueryCred
,    SHARD_MAP_NAME   = 'CustomerIDShardMap'
)
;

有关分步教程,请参阅跨扩展云数据库进行报告(预览)For a step-by-step tutorial, see Getting started with elastic queries for sharding (horizontal partitioning).

B.B. 创建 RDBMS 外部数据源Create an RDBMS external data source

若要创建外部数据源以引用 RDBMS,请指定 SQL 数据库中的远程数据库的 SQL 数据库服务器名称。To create an external data source to reference an RDBMS, specifies the SQL Database server name of the remote database in SQL Database.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

CREATE DATABASE SCOPED CREDENTIAL SQL_Credential  
WITH
     IDENTITY  = '<username>'
,    SECRET    = '<password>'
;

CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
(    TYPE          = RDBMS
,    LOCATION      = '<server_name>.database.windows.net'
,    DATABASE_NAME = 'Customers'
,    CREDENTIAL    = SQL_Credential
)
;

有关 RDBMS 的分步教程,请参阅跨数据库查询(纵向分区)入门(预览)For a step-by-step tutorial on RDBMS, see Getting started with cross-database queries (vertical partitioning).

示例:批量操作Examples: Bulk Operations

备注

为批量操作配置外部数据源时,请勿在 LOCATION URL 的末尾放置尾随 /、文件名或共享访问签名参数 。Do not put a trailing /, file name, or shared access signature parameters at the end of the LOCATION URL when configuring an external data source for bulk operations.

C.C. 创建外部数据源以用于从 Azure Blob 存储检索数据的批量操作Create an external data source for bulk operations retrieving data from Azure Blob storage

对使用 BULK INSERTOPENROWSET 的批量操作使用以下数据源。Use the following data source for bulk operations using BULK INSERT or OPENROWSET. 凭据必须设置 SHARED ACCESS SIGNATURE 作为标识、不应在 SAS 令牌中具有前导 ?、必须对应加载的文件(例如 srt=o&sp=r)至少具有读取权限,并且有效期应有效(所有日期均采用 UTC 时间)。The credential must set SHARED ACCESS SIGNATURE as the identity, mustn't have the leading ? in the SAS token, must have at least read permission on the file that should be loaded (for example srt=o&sp=r), and the expiration period should be valid (all dates are in UTC time). 有关共享访问签名的详细信息,请参阅使用共享访问签名 (SAS)For more information on shared access signatures, see Using Shared Access Signatures (SAS).

CREATE DATABASE SCOPED CREDENTIAL AccessAzureInvoices
WITH
     IDENTITY = 'SHARED ACCESS SIGNATURE'
--   REMOVE ? FROM THE BEGINNING OF THE SAS TOKEN
,    SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************'
;

CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH
(    LOCATION   = 'https://newinvoices.blob.core.windows.net/week3'
,    CREDENTIAL = AccessAzureInvoices
,    TYPE       = BLOB_STORAGE
)
;

若要查看这一使用中的示例,请参阅 BULK INSERTTo see this example in use, see BULK INSERT.

另请参阅See Also

SQL ServerSQL Server SQL 数据库SQL Database *SQL 数据
仓库*
 
* SQL Data
Warehouse *
 
Analytics Platform
System (PDW)
Analytics Platform
System (PDW)

 

概述:Azure SQL 数据仓库Overview: Azure SQL Data Warehouse

为 PolyBase 创建外部数据源。Creates an external data source for PolyBase. 外部数据源用于建立连接以及支持以下主要用例:使用 PolyBase 执行数据虚拟化和数据加载External data sources are used to establish connectivity and support the following primary use case: Data virtualization and data load using PolyBase

重要

使用 SQL 数据库的弹性查询创建外部数据源,以查询 SQL 数据仓库实例,请参阅 SQL 数据库To create an external data source to query a SQL Data Warehouse instances using SQL Database with elastic query, see SQL Database.

语法Syntax

CREATE EXTERNAL DATA SOURCE <data_source_name>  
WITH
(    LOCATION                  = '<prefix>://<path>[:<port>]'
[,   CREDENTIAL                = <credential_name> ]
[,   TYPE                      =  HADOOP
)
[;]

参数Arguments

data_source_namedata_source_name

指定数据源的用户定义名称。Specifies the user-defined name for the data source. 该名称在 SQL 数据仓库 (SQL DW) 中必须唯一。The name must be unique within the database in SQL Data Warehouse (SQL DW).

LOCATION = '<prefix>://<path[:port]>'LOCATION = '<prefix>://<path[:port]>'

提供连接协议和外部数据源的路径。Provides the connectivity protocol and path to the external data source.

外部数据源External Data Source 位置前缀Location prefix 位置路径Location path
Azure Blob 存储Azure Blob Storage wasb[s] <container>@<storage_account>.blob.core.windows.net
Azure Data Lake Store Gen 1Azure Data Lake Store Gen 1 adl <storage_account>.azuredatalake.net
Azure Data Lake Store Gen 2Azure Data Lake Store Gen 2 abfs[s] <container>@<storage_account>.dfs.core.windows.net

位置路径:Location path:

  • <container> = 保存数据的存储帐户的容器。<container> = the container of the storage account holding the data. 根容器是只读的,数据无法写回容器。Root containers are read-only, data can't be written back to the container.
  • <storage_account> = azure 资源的存储帐户名称。<storage_account> = the storage account name of the azure resource.

设置位置时的其他说明和指南:Additional notes and guidance when setting the location:

  • 默认选项是在预配 Azure Data Lake Storage Gen 2 时使用“启用安全 SSL 连接”。The default option is to use enable secure SSL connections when provisioning Azure Data Lake Storage Gen 2. 启用此链接后,必须选择在安全的 SSL 连接时使用 abfssWhen this is enabled, you must use abfss when a secure SSL connection is selected. 请注意,abfss 也适用于不安全的 SSL 连接。Note abfssworks for unsecure SSL connections as well.
  • 创建对象时,SQL 数据仓库引擎不会验证外部数据源是否存在。The SQL Data Warehouse engine doesn't verify the existence of the external data source when the object is created. 要进行验证,请使用外部数据源创建外部表。To validate, create an external table using the external data source.
  • 查询 Hadoop 时,所有表使用相同的外部数据源,以确保查询语义一致。Use the same external data source for all tables when querying Hadoop to ensure consistent querying semantics.
  • wasb 是 Azure blob 存储的默认协议。wasb is the default protocol for Azure blob storage. wasbs 是可选的,但建议使用,因为会使用安全的 SSL 连接发送数据。wasbs is optional but recommended as data will be sent using a secure SSL connection.

CREDENTIAL = credential_name CREDENTIAL = credential_name

指定用于对外部数据源进行身份验证的数据库范围凭据。Specifies a database-scoped credential for authenticating to the external data source.

创建凭证时的其他说明和指导:Additional notes and guidance when creating a credential:

  • 请使用 Azure 存储密钥将数据从 Azure Blob 存储或 Azure Data Lake Store (ADLS) Gen 2 加载到 SQL DW。To load data from Azure Blob Storage or Azure Data Lake Store (ADLS) Gen 2 into SQL DW, use an Azure Storage Key.
  • 只有在 blob 得到保护的情况下才需要 CREDENTIALCREDENTIAL is only required if the blob has been secured. 允许匿名访问的数据集不需要 CREDENTIALCREDENTIAL isn't required for data sets that allow anonymous access.

要创建数据库范围凭据,请参阅 CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)To create a database scoped credential, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).

TYPE = HADOOP TYPE = HADOOP

指定要配置的外部数据源的类型。Specifies the type of the external data source being configured. 此参数并非总是必需的。This parameter isn't always required.

  • 如果外部数据源是 Azure Blob Storage、ADLS Gen 1 或 ADLS Gen 2,请使用 HADOOP。Use HADOOP when the external data source is Azure Blob Storage, ADLS Gen 1, or ADLS Gen 2.

重要

如果使用任何其他外部数据源,请勿设置 TYPEDo not set TYPE if using any other external data source.

有关使用 TYPE = HADOOP 从 Azure Blob 存储加载数据的示例,请参阅创建外部数据源以引用 Azure Blob 存储For an example of using TYPE = HADOOP to load data from Azure Blob Storage, see Create external data source to reference Azure blob storage.

权限Permissions

需要 SQL 数据仓库中数据库的 CONTROL 权限。Requires CONTROL permission on database in SQL Data Warehouse.

锁定Locking

在 EXTERNAL DATA SOURCE 对象上采用共享锁。Takes a shared lock on the EXTERNAL DATA SOURCE object.

SecuritySecurity

PolyBase 支持大多数外部数据源的基于代理的身份验证。PolyBase supports proxy based authentication for most external data sources. 创建数据库范围凭据以创建代理帐户。Create a database scoped credential to create the proxy account.

连接到 SQL Server 大数据群集中的存储或数据池时,会将用户的凭据传递到后端系统。When you connect to the storage or data pool in a SQL Server big data cluster, the user's credentials are passed through to the back-end system. 在数据池本身中创建登录名以启用直通身份验证。Create logins in the data pool itself to enable pass through authentication.

目前,不支持类型为 HADOOP 的 SAS 令牌。Currently a SAS token with type HADOOP is unsupported. 它仅在使用存储帐户访问密钥时才受支持。It's only supported with a storage account access key. 尝试创建类型为 HADOOP 的外部数据源和使用 SAS 凭据失败,并显示以下错误:Attempting to create an external data source with type HADOOP and a SAS credential fails with the following error:

Msg 105019, Level 16, State 1 - EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect. Java exception message: Parameters provided to connect to the Azure storage account are not valid.: Error [Parameters provided to connect to the Azure storage account are not valid.] occurred while accessing external file.'

示例:Examples:

A.A. 创建外部数据源以引用 Azure blob 存储Create external data source to reference Azure blob storage

在此示例中,外部数据源是名为 logs 的 Azure 存储帐户下的 Azure blob 存储容器(名为 daily)。In this example, the external data source is an Azure blob storage container called daily under Azure storage account named logs. Azure 存储外部数据源仅用于数据传输。The Azure storage external data source is for data transfer only. 它不支持谓词下推。It doesn't support predicate push-down.

此示例演示如何创建数据库范围凭据以用于对 Azure 存储进行身份验证。This example shows how to create the database scoped credential for authentication to Azure storage. 在数据库凭据机密中指定 Azure 存储帐户密钥。Specify the Azure storage account key in the database credential secret. 可以在数据库范围凭据标识中指定任何字符串,因为在对 Azure 存储进行身份验证的过程中不会使用它。You can specify any string in database scoped credential identity as it isn't used during authentication to Azure storage.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
     IDENTITY   = '<my_account>'
,    SECRET     = '<azure_storage_account_key>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyAzureStorage
WITH
(    LOCATION   = 'wasbs://daily@logs.blob.core.windows.net/'
,    CREDENTIAL = AzureStorageCredential
,    TYPE       = HADOOP
)
;

B.B. 创建外部数据源以使用服务主体引用 Azure Data Lake Store Gen 1 或 Azure Data Lake Store Gen 2Create external data source to reference Azure Data Lake Store Gen 1 or 2 using a service principal

Azure Data Lake Store 连接可基于 ADLS URI 和 Azure Active Directory 应用程序的服务主体。Azure Data lake Store connectivity can be based on your ADLS URI and your Azure Active directory Application's service principal. 可以在 [使用 Active Directory 域服务进行 Data Lake Store 身份验证][azure_ad[] 中找到有关创建此应用程序的文档。Documentation for creating this application can be found at [Data lake store authentication using Active Directory][azure_ad[].

-- If you do not have a Master Key on your DW you will need to create one.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

-- These values come from your Azure Active Directory Application used to authenticate to ADLS
CREATE DATABASE SCOPED CREDENTIAL ADLS_credential
WITH
--   IDENTITY   = '<clientID>@<OAuth2.0TokenEndPoint>'
     IDENTITY   = '536540b4-4239-45fe-b9a3-629f97591c0c@https://login.microsoftonline.com/42f988bf-85f1-41af-91ab-2d2cd011da47/oauth2/token'
--,  SECRET     = '<KEY>'
,    SECRET     = 'BjdIlmtKp4Fpyh9hIvr8HJlUida/seM5kQ3EpLAmeDI='
;

-- For Gen 1 - Create an external data source
-- TYPE: HADOOP - PolyBase uses Hadoop APIs to access data in Azure Data Lake Storage.
-- LOCATION: Provide Data Lake Storage Gen 1 account name and URI
-- CREDENTIAL: Provide the credential created in the previous step
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH
(    LOCATION       = 'adl://newyorktaxidataset.azuredatalakestore.net'
,    CREDENTIAL     = ADLS_credential
,    TYPE           = HADOOP
)
;

-- For Gen 2 - Create an external data source
-- TYPE: HADOOP - PolyBase uses Hadoop APIs to access data in Azure Data Lake Storage.
-- LOCATION: Provide Data Lake Storage Gen 2 account name and URI
-- CREDENTIAL: Provide the credential created in the previous step
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH
(    LOCATION       = 'abfss://newyorktaxidataset.azuredatalakestore.net' -- Please note the abfss endpoint when your account has secure transfer enabled
,    CREDENTIAL     = ADLS_credential
,    TYPE           = HADOOP
)
;

C.C. 创建外部数据源,以使用存储帐户密钥引用 Azure Data Lake Store Gen 2Create external data source to reference Azure Data Lake Store Gen 2 using the storage account key

-- If you do not have a Master Key on your DW you will need to create one.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

CREATE DATABASE SCOPED CREDENTIAL ADLS_credential
WITH
--   IDENTITY   = '<storage_account_name>'
     IDENTITY   = 'newyorktaxidata'
--,  SECRET     = '<storage_account_key>'
,    SECRET     = 'yz5N4+bxSb89McdiysJAzo+9hgEHcJRJuXbF/uC3mhbezES/oe00vXnZEl14U0lN3vxrFKsphKov16C0w6aiTQ=='
;

-- Note this example uses a Gen 2 secured endpoint (abfss)
CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
(    LOCATION   = 'abfss://2013@newyorktaxidataset.dfs.core.windows.net'
,    CREDENTIAL = ADLS_credential
,    TYPE       = HADOOP
)
[;]

D.D. 创建外部数据源以引用与 Azure Data Lake Store Gen 2 的 Polybase 连接Create external data source to reference Polybase connectivity to Azure Data Lake Store Gen 2

连接到具有托管标识机制的 Azure Data Lake Store Gen2 帐户时,无需指定密码。There is no need to specify SECRET when connecting to Azure Data Lake Store Gen2 account with Managed Identity mechanism.

-- If you do not have a Master Key on your DW you will need to create one
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'

--Create database scoped credential with **IDENTITY = 'Managed Service Identity'**

CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Service Identity';

--Create external data source with abfss:// scheme for connecting to your Azure Data Lake Store Gen2 account

CREATE EXTERNAL DATA SOURCE ext_datasource_with_abfss WITH (TYPE = hadoop, LOCATION = 'abfss://myfile@mystorageaccount.dfs.core.windows.net', CREDENTIAL = msi_cred);

另请参阅See Also

SQL ServerSQL Server SQL 数据库SQL Database SQL 数据
数据仓库
SQL Data
Warehouse
* Analytics
Platform System (PDW) *
 
* Analytics
Platform System (PDW) *
 

 

概述:分析平台系统Overview: Analytics Platform System

为 PolyBase 查询创建外部数据源。Creates an external data source for PolyBase queries. 外部数据源用于建立连接以及支持以下用例:使用 PolyBase 执行数据虚拟化和数据加载External data sources are used to establish connectivity and support the following use case: Data virtualization and data load using PolyBase

语法Syntax

CREATE EXTERNAL DATA SOURCE <data_source_name>  
WITH
(    LOCATION                  = '<prefix>://<path>[:<port>]'
[,   CREDENTIAL                = <credential_name> ]
[,   TYPE                      = HADOOP ]
[,   RESOURCE_MANAGER_LOCATION = '<resource_manager>[:<port>]'
)
[;]

参数Arguments

data_source_namedata_source_name

指定数据源的用户定义名称。Specifies the user-defined name for the data source. 该名称必须在 Analytics 平台系统(并行数据仓库或 PDW)中的服务器上是唯一的。The name must be unique within the server in Analytics Platform System (Parallel Data Warehouse or PDW).

LOCATION = '<prefix>://<path[:port]>'LOCATION = '<prefix>://<path[:port]>'

提供连接协议和外部数据源的路径。Provides the connectivity protocol and path to the external data source.

外部数据源External Data Source 位置前缀Location prefix 位置路径Location path
Cloudera 或 HortonworksCloudera or Hortonworks hdfs <Namenode>[:port]
Azure Blob 存储Azure Blob Storage wasb[s] <container>@<storage_account>.blob.core.windows.net

位置路径:Location path:

  • <Namenode> = Hadoop 群集中 Namenode 的计算机名称、名称服务 URI 或 IP 地址。<Namenode> = the machine name, name service URI, or IP address of the Namenode in the Hadoop cluster. PolyBase 必须解析 Hadoop 群集使用的任何 DNS 名称。PolyBase must resolve any DNS names used by the Hadoop cluster.
  • port = 外部数据源侦听的端口。port = The port that the external data source is listening on. 在 Hadoop 中,可以使用 fs.defaultFS 配置参数查找该端口。In Hadoop, the port can be found using the fs.defaultFS configuration parameter. 默认值为 8020。The default is 8020.
  • <container> = 保存数据的存储帐户的容器。<container> = the container of the storage account holding the data. 根容器是只读的,数据无法写回容器。Root containers are read-only, data can't be written back to the container.
  • <storage_account> = azure 资源的存储帐户名称。<storage_account> = the storage account name of the azure resource.

设置位置时的其他说明和指南:Additional notes and guidance when setting the location:

  • 创建对象时,PDW 引擎不会验证外部数据源是否存在。The PDW engine doesn't verify the existence of the external data source when the object is created. 要进行验证,请使用外部数据源创建外部表。To validate, create an external table using the external data source.
  • 查询 Hadoop 时,所有表使用相同的外部数据源,以确保查询语义一致。Use the same external data source for all tables when querying Hadoop to ensure consistent querying semantics.
  • wasb 是 Azure blob 存储的默认协议。wasb is the default protocol for Azure blob storage. wasbs 是可选的,但建议使用,因为会使用安全的 SSL 连接发送数据。wasbs is optional but recommended as data will be sent using a secure SSL connection.
  • 要确保在 Hadoop Namenode 故障转移期间成功进行 PolyBase 查询,请考虑针对 Hadoop 群集的 Namenode 使用虚拟 IP 地址。To ensure successful PolyBase queries during a Hadoop Namenode fail-over, consider using a virtual IP address for the Namenode of the Hadoop cluster. 如果不这样做,请执行 ALTER EXTERNAL DATA SOURCE 命令以指向新位置。If you don't, execute an ALTER EXTERNAL DATA SOURCE command to point to the new location.

CREDENTIAL = credential_name CREDENTIAL = credential_name

指定用于对外部数据源进行身份验证的数据库范围凭据。Specifies a database-scoped credential for authenticating to the external data source.

创建凭证时的其他说明和指导:Additional notes and guidance when creating a credential:

  • 请使用 Azure 存储密钥将数据从 Azure Blob 存储或 Azure Data Lake Store (ADLS) Gen 2 加载到 SQL DW 或 PDW。To load data from either Azure Blob storage or Azure Data Lake Store (ADLS) Gen 2 into SQL DW or PDW, use an Azure Storage Key.
  • 只有在 blob 得到保护的情况下才需要 CREDENTIALCREDENTIAL is only required if the blob has been secured. 允许匿名访问的数据集不需要 CREDENTIALCREDENTIAL isn't required for data sets that allow anonymous access.

TYPE = [ HADOOP ] TYPE = [ HADOOP ]

指定要配置的外部数据源的类型。Specifies the type of the external data source being configured. 此参数并非总是必需的。This parameter isn't always required.

  • 如果外部数据源是 Cloudera、Hortonworks、Azure Blob 存储,请使用 HADOOP。Use HADOOP when the external data source is Cloudera, Hortonworks, or Azure Blob Storage.

重要

如果使用任何其他外部数据源,请勿设置 TYPEDo not set TYPE if using any other external data source.

有关使用 TYPE = HADOOP 从 Azure Blob 存储加载数据的示例,请参阅创建外部数据源以引用 Azure Blob 存储For an example of using TYPE = HADOOP to load data from Azure Blob Storage, see Create external data source to reference Azure blob storage.

RESOURCE_MANAGER_LOCATION = 'ResourceManager_URI[:port]' RESOURCE_MANAGER_LOCATION = 'ResourceManager_URI[:port]'

连接到 Hortonworks 或 Cloudera 时配置此可选值。Configure this optional value when connecting to Hortonworks or Cloudera.

定义 RESOURCE_MANAGER_LOCATION 后,查询优化器将根据成本做出决策以提高性能。When the RESOURCE_MANAGER_LOCATION is defined, the query optimizer will make a cost-based decision to improve performance. MapReduce 作业可用于将计算下推到 Hadoop。A MapReduce job can be used to push down the computation to Hadoop. 指定 RESOURCE_MANAGER_LOCATION 可以显着减少 Hadoop 和 SQL 之间传输的数据量,从而提高查询性能。Specifying the RESOURCE_MANAGER_LOCATION can significantly reduce the volume of data transferred between Hadoop and SQL, which can lead to improved query performance.

如果未指定资源管理器,则会为 PolyBase 查询禁用到 Hadoop 的计算下推。If the Resource Manager isn't specified, pushing compute to Hadoop is disabled for PolyBase queries.

如果未指定端口,则使用“hadoop 连接”配置的当前设置选择默认值。If the port isn't specified, the default value is chosen using the current setting for 'hadoop connectivity' configuration.

Hadoop 连接Hadoop Connectivity 默认资源管理器端口Default Resource Manager Port
11 5030050300
22 5030050300
33 80218021
44 80328032
55 80508050
66 80328032
77 80508050

有关受支持的 Hadoop 版本的完整列表,请参阅 PolyBase 连接配置 (Transact-SQL)For a complete list of supported Hadoop versions, see PolyBase Connectivity Configuration (Transact-SQL).

重要

创建外部数据源时,不会验证 RESOURCE_MANAGER_LOCATION 值。The RESOURCE_MANAGER_LOCATION value is not validated when you create the external data source. 每次尝试下推时,输入不正确的值都可能会导致查询执行失败,因为提供的值无法解析。Entering an incorrect value may cause query failure at execution time whenever push-down is attempted as the provided value would not be able to resolve.

创建外部数据源以引用启用了下推功能的 Hadoop 中提供了具体示例和详细指南。Create external data source to reference Hadoop with push-down enabled provides a concrete example and further guidance.

权限Permissions

需要 Analytics Platform System(并行数据仓库或 PDW)中的数据库的 CONTROL 权限。Requires CONTROL permission on database in Analytics Platform System (Parallel Data Warehouse or PDW).

备注

在以前版本的 PDW 中,创建外部数据源需要 ALTER ANY EXTERNAL DATA SOURCE 权限。In previous releases of PDW, create external data source required ALTER ANY EXTERNAL DATA SOURCE permissions.

锁定Locking

在 EXTERNAL DATA SOURCE 对象上采用共享锁。Takes a shared lock on the EXTERNAL DATA SOURCE object.

SecuritySecurity

PolyBase 支持大多数外部数据源的基于代理的身份验证。PolyBase supports proxy based authentication for most external data sources. 创建数据库范围凭据以创建代理帐户。Create a database scoped credential to create the proxy account.

目前,不支持类型为 HADOOP 的 SAS 令牌。Currently a SAS token with type HADOOP is unsupported. 它仅在使用存储帐户访问密钥时才受支持。It's only supported with a storage account access key. 尝试创建类型为 HADOOP 的外部数据源和使用 SAS 凭据失败,并显示以下错误:Attempting to create an external data source with type HADOOP and a SAS credential fails with the following error:

Msg 105019, Level 16, State 1 - EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect. Java exception message: Parameters provided to connect to the Azure storage account are not valid.: Error [Parameters provided to connect to the Azure storage account are not valid.] occurred while accessing external file.'

示例:Examples:

A.A. 创建外部数据源以引用 HadoopCreate external data source to reference Hadoop

若要创建外部数据源以引用 Hortonworks 或 Cloudera Hadoop 群集,请指定 Hadoop Namenode 的计算机名称或 IP 地址以及端口。To create an external data source to reference your Hortonworks or Cloudera Hadoop cluster, specify the machine name, or IP address of the Hadoop Namenode and port.

CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION = 'hdfs://10.10.10.10:8050'
,    TYPE     = HADOOP
)
;

B.B. 创建外部数据源以引用 Hadoop 并启用下推Create external data source to reference Hadoop with push-down enabled

指定 RESOURCE_MANAGER_LOCATION 选项以便为 PolyBase 查询启用到 Hadoop 的下推计算。Specify the RESOURCE_MANAGER_LOCATION option to enable push-down computation to Hadoop for PolyBase queries. 启用后,PolyBase 会根据成本作出决策,以确定是否应将查询计算下推到 Hadoop。Once enabled, PolyBase makes a cost-based decision to determine whether the query computation should be pushed to Hadoop.

CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION                  = 'hdfs://10.10.10.10:8020'
,    TYPE                      = HADOOP
,    RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;

C.C. 创建外部数据源以引用受 Kerberos 保护的 HadoopCreate external data source to reference Kerberos-secured Hadoop

若要验证 Hadoop 群集是否受 Kerberos 保护,请检查 Hadoop core-site.xml 中的 hadoop.security.authentication 属性值。To verify if the Hadoop cluster is Kerberos-secured, check the value of hadoop.security.authentication property in Hadoop core-site.xml. 若要引用受 Kerberos 保护的 Hadoop 群集,必须指定包含 Kerberos 用户名和密码的数据库范围凭据。To reference a Kerberos-secured Hadoop cluster, you must specify a database scoped credential that contains your Kerberos username and password. 数据库主密钥用于加密数据库范围凭据密钥。The database master key is used to encrypt the database scoped credential secret.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Kerberos user name and password.
CREATE DATABASE SCOPED CREDENTIAL HadoopUser1
WITH
     IDENTITY   = '<hadoop_user_name>'
,    SECRET     = '<hadoop_password>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION                  = 'hdfs://10.10.10.10:8050'
,    CREDENTIAL                = HadoopUser1
,    TYPE                      = HADOOP
,    RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;

D.D. 创建外部数据源以引用 Azure blob 存储Create external data source to reference Azure blob storage

在此示例中,外部数据源是名为 logs 的 Azure 存储帐户下的 Azure blob 存储容器(名为 daily)。In this example, the external data source is an Azure blob storage container called daily under Azure storage account named logs. Azure 存储外部数据源仅用于数据传输。The Azure storage external data source is for data transfer only. 它不支持谓词下推。It doesn't support predicate push-down.

此示例演示如何创建数据库范围凭据以用于对 Azure 存储进行身份验证。This example shows how to create the database scoped credential for authentication to Azure storage. 在数据库凭据机密中指定 Azure 存储帐户密钥。Specify the Azure storage account key in the database credential secret. 可以在数据库范围凭据标识中指定任何字符串,因为在对 Azure 存储进行身份验证的过程中不会使用它。You can specify any string in database scoped credential identity as it isn't used during authentication to Azure storage.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
     IDENTITY   = '<my_account>'
,    SECRET     = '<azure_storage_account_key>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyAzureStorage
WITH
(    LOCATION   = 'wasbs://daily@logs.blob.core.windows.net/'
,    CREDENTIAL = AzureStorageCredential
,    TYPE       = HADOOP
)
;

另请参阅See Also