Configure PolyBase to access external data in SQL Server

APPLIES TO: yesSQL Server noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

PolyBase in SQL Server 2019 allows you to connect to ODBC -compatible data sources through the ODBC connector.

Prerequisites

Note = feature only works on SQL Server on Windows.

If you haven't installed PolyBase, see PolyBase installation.

Before creating a database scoped credential, a Master Key must be created.

First download and install the ODBC driver of the data source you want to connect to on each of the PolyBase nodes. Once the driver is properly installed, you can view and test the driver from the "ODBC Data Source Administrator".

PolyBase scale-out groups

IMPORTANT!

In order to improve query performance make sure that the driver has connection pooling enabled. This can be accomplished from the "ODBC Data Source Administrator".

Note

The name of the driver (example circled above) will need to be specified when creating the external data source (Step 3 below).

Create an External Table

To query the data from an ODBC data source, you must create external tables to reference the external data. This section provides sample code to create external tables.

The following Transact-SQL commands are used in this section:

  1. Create a database scoped credential for accessing the ODBC source.

    /*  specify credentials to external data source
    *  IDENTITY: user name for external source. 
    *  SECRET: password for external source.
    */
    CREATE DATABASE SCOPED CREDENTIAL credential_name WITH IDENTITY = 'username', Secret = 'password';
    
  2. Create an external data source with CREATE EXTERNAL DATA SOURCE.

    /*  LOCATION: Location string should be of format '<type>://<server>[:<port>]'.
    *  PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
    *CONNECTION_OPTIONS: Specify driver location
    *  CREDENTIAL: the database scoped credential, created above.
    */  
    CREATE EXTERNAL DATA SOURCE external_data_source_name
    WITH ( LOCATION = odbc://<ODBC server address>[:<port>],
    CONNECTION_OPTIONS = 'Driver={<Name of Installed Driver>};
    ServerNode = <name of server  address>:<Port>',
    -- PUSHDOWN = ON | OFF,
    CREDENTIAL = credential_nam );
    
  3. Optional: Create statistics on an external table.

    We recommend creating statistics on external table columns especially the ones used for joins, filters and aggregates,for optimal query performance.

    CREATE STATISTICS statistics_name ON customer (C_CUSTKEY) WITH FULLSCAN; 
    

Important

Once you have created an external data source, you can use the CREATE EXTERNAL TABLE command to create a queryable table over that source.

Next steps

To learn more about PolyBase, see Overview of SQL Server PolyBase.