Configure PolyBase to access external data in SQL Server

APPLIES TO: yesSQL Server (Windows only) noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

This article explains how to use PolyBase on a SQL Server instance to query external data in another SQL Server instance.

Prerequisites

If you haven't installed PolyBase, see PolyBase installation. The installation article explains the prerequisites.

Configure an external table

To query the data from a SQL Server data source, you must create external tables to reference the external data. This section provides sample code to create these external tables.

For optimal query performance, create statistics on external table columns, especially for the ones used for joins, filters, and aggregates.

These objects are created in this section:

  • CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)
  • CREATE EXTERNAL DATA SOURCE (Transact-SQL)
  • CREATE EXTERNAL TABLE (Transact-SQL)
  • CREATE STATISTICS (Transact-SQL)
  1. Create a master key on the database. A master key is required to encrypt the credential secret.

     CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo';  
    
  2. Create a database scoped credential.

    /*  specify credentials to external data source
    *  IDENTITY: user name for external source.  
    *  SECRET: password for external source.
    */
    CREATE DATABASE SCOPED CREDENTIAL SqlServerCredentials   
    WITH IDENTITY = 'username', Secret = 'password';
    
  3. Create an external data source with CREATE EXTERNAL DATA SOURCE. Specify the external data source location and credentials for SQL Server.

    /*  LOCATION: Location string should be of format '<vendor>://<server>[:<port>]'.
    *  PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
    *  CREDENTIAL: the database scoped credential, created above.
    */  
    CREATE EXTERNAL DATA SOURCE SQLServerInstance
    WITH ( 
    LOCATION = sqlserver://SqlServer,
    -- PUSHDOWN = ON | OFF,
     CREDENTIAL = SQLServerCredentials
    );
    
  4. Create schemas for external data.

    CREATE SCHEMA sqlserver;
    GO
    
  5. Create external tables that represent data stored in an external SQL Server instance with CREATE EXTERNAL TABLE.

    /*  LOCATION: sql server table/view in 'database_name.schema_name.object_name' format
    *  DATA_SOURCE: the external data source, created above.
    */
    CREATE EXTERNAL TABLE sqlserver.customer(
    C_CUSTKEY INT NOT NULL,
    C_NAME VARCHAR(25) NOT NULL,
    C_ADDRESS VARCHAR(40) NOT NULL,
    C_NATIONKEY INT NOT NULL,
    C_PHONE CHAR(15) NOT NULL,
    C_ACCTBAL DECIMAL(15,2) NOT NULL,
    C_MKTSEGMENT CHAR(10) NOT NULL,
    C_COMMENT VARCHAR(117) NOT NULL
     )
     WITH (
     LOCATION='tpch_10.dbo.customer',
     DATA_SOURCE=SqlServerInstance
    );
    
  6. Create statistics on an external table.

     CREATE STATISTICS CustomerCustKeyStatistics ON sqlserver.customer (C_CUSTKEY) WITH FULLSCAN; 
    

SQL Server connector compatible types

You can make a connection to other data sources that recognizes a SQL Server connection. Use the SQL Server PolyBase connector to create an external table of both Azure SQL Data Warehouse and Azure SQL Database. To accomplish this task, follow the same steps listed previously. Make sure the database scoped credential, server address, port, and location string correlate to that of the compatible data source you want to connect to.

Next steps

To learn more about PolyBase, see Overview of SQL Server PolyBase.