Configure PolyBase to access external data in S3-compatible object storage

Applies to: yesSQL Server 2022 (16.x) Preview

This article explains how to use PolyBase to query external data in an S3-compatible object storage.

Prerequisites

To use the S3-compatible object storage integration features, you will need the following tools and resources:

  • Install the PolyBase feature for SQL Server.
  • Install SQL Server Management Studio (SSMS) or Azure Data Studio.
  • S3-compatible storage.
  • An S3 bucket created. Buckets cannot be created or configured from SQL Server.
  • A user (Access Key ID) has been configured and the secret (Secret Key ID) and that user is known to you. You will need both to authenticate against the S3 object storage endpoint.
  • ListBucket permission on S3 user for browse privileges.
  • ReadOnly permission on S3 user for read privileges.
  • WriteOnly permission on S3 user for write privileges.
  • TLS must have been configured. It is assumed that all connections will be securely transmitted over HTTPS not HTTP. The endpoint will be validated by a certificate installed on the SQL Server OS Host.

Permission

In order for the proxy user to read the content of an S3 bucket, the user will need to be allowed to perform the following actions against the S3 endpoint:

  • ListBucket;
  • ReadOnly;

Pre-configuration

  1. Enable PolyBase in sp_configure:
exec sp_configure @configname = 'polybase enabled', @configvalue = 1
;
RECONFIGURE
;
exec sp_configure @configname = 'polybase enabled'
;
  1. Before you create a database scoped credential, the user database must have a master key to protect the credential. For more information, see CREATE MASTER KEY.

Create a database scoped credential

The following sample script creates a database scoped credential s3-dc in the source user database in SQL Server. For more information, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).

USE [database_name];
GO
IF NOT EXISTS(SELECT * FROM sys.credentials WHERE name = 's3_dc')
BEGIN
 CREATE DATABASE SCOPED CREDENTIAL s3_dc
 WITH IDENTITY = 'S3 Access Key',
 SECRET = '<AccessKeyID>:<SecretKeyID>' ;
END
GO

Verify the new database-scoped credential with sys.database_scoped_credentials (Transact-SQL):

SELECT * FROM sys.database_scoped_credentials;

Create an external data source

The following sample script creates an external data source s3_ds in the source user database in SQL Server. The external data source references the s3_dc database scoped credential. For more information, see CREATE EXTERNAL DATA SOURCE.

CREATE EXTERNAL DATA SOURCE s3_ds
WITH
(   LOCATION = 's3://<ip_address>:<port>/'
,   CREDENTIAL = s3_dc
);
GO

Verify the new external data source with sys.external_data_sources.

SELECT * FROM sys.external_data_sources;

Limitations

  1. SQL Server queries on an external table backed by S3-compliant object storage are limited to 1000 objects per prefix. This is because S3-compliant object listing is limited to 1000 object keys per prefix.
  2. For S3-compliant object storage, customers are not allowed to create their access key ID with a : character in it.
  3. The total URL length is limited to 259 characters. This means s3://<hostname>/<objectkey> shouldn't exceed 259 characters. The s3:// counts towards this limit, so the path length cannot exceed 259-5 = 254 characters.
  4. The SQL credential name is limited by 128 characters in UTF-16 format.
  5. The credential name created must contain the bucket name unless this credential is for a new external data source.
  6. Access Key ID and Secret Key ID must only contain alphanumeric values.

Next steps