Configure PolyBase to access external data in SQL Server

APPLIES TO: yesSQL Server noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

PolyBase in SQL Server 2019 allows you to connect to ODBC -compatible data sources through the ODBC connector.

Prerequisites

Note = feature only works on SQL Server on Windows.

If you haven't installed PolyBase, see PolyBase installation.

Before creating a database scoped credential, a Master Key must be created.

First download and install the ODBC driver of the data source you want to connect to on each of the PolyBase nodes. Once the driver is properly installed, you can view and test the driver from the "ODBC Data Source Administrator".

PolyBase scale-out groups

IMPORTANT!

In order to improve query performance make sure that the driver has connection pooling enabled. This can be accomplished from the "ODBC Data Source Administrator".

Note

The name of the driver (example circled above) will need to be specified when creating the external data source (Step 3 below).

Create an External Table

To query the data from an ODBC data source, you must create external tables to reference the external data. This section provides sample code to create external tables.

The following Transact-SQL commands are used in this section:

  1. Create a database scoped credential for accessing the ODBC source.

    /*  specify credentials to external data source
    *  IDENTITY: user name for external source. 
    *  SECRET: password for external source.
    */
    CREATE DATABASE SCOPED CREDENTIAL credential_name WITH IDENTITY = 'username', Secret = 'password';
    
  2. Create an external data source with CREATE EXTERNAL DATA SOURCE.

    /*  LOCATION: Location string should be of format '<type>://<server>[:<port>]'.
    *  PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
    *CONNECTION_OPTIONS: Specify driver location
    *  CREDENTIAL: the database scoped credential, created above.
    */  
    CREATE EXTERNAL DATA SOURCE external_data_source_name
    WITH ( LOCATION = odbc://<ODBC server address>[:<port>],
    CONNECTION_OPTIONS = 'Driver={<Name of Installed Driver>};
    ServerNode = <name of server  address>:<Port>',
    -- PUSHDOWN = ON | OFF,
    CREDENTIAL = credential_nam );
    
  3. Optional: Create statistics on an external table.

We are listening: If you find something outdated or incorrect in this article, such as a step or a code example, please tell us. You can click the This page button in the Feedback section at the bottom of this page. We read every item of feedback about SQL, typically the next day. Thanks.

For optimal query performance, we recommend creating statistics on external table columns especially the ones used for joins, filters, and aggregates.

```sql
CREATE STATISTICS statistics_name ON customer (C_CUSTKEY) WITH FULLSCAN; 
```

Important

Once you have created an external data source, you can use the CREATE EXTERNAL TABLE command to create a queryable table over that source.

Next steps

To learn more about PolyBase, see Overview of SQL Server PolyBase.