Improve PolyBase scale-out groups on Windows
This article describes how to setup a PolyBase scale-out group on Windows. This creates a cluster of SQL Server instances to process large data sets from external data sources, such as Hadoop or Azure Blob Storage, in a scale-out fashion for better query performance.
More than one machine in the same domain
A domain user account to run PolyBase services
The following steps summarize the process for creating a PolyBase scale-out group. The next section provides a more detailed walk-through of each step.
Install the same version of SQL Server with PolyBase on N machines.
Select one SQL Server instance as the head node. A head node can only be designated on an instance running SQL Server Enterprise.
Add remaining SQL Server instances as compute nodes using sp_polybase_join_group.
Monitor nodes in the group using sys.dm_exec_compute_nodes (Transact-SQL).
Optional. Remove a compute node from using sp_polybase_leave_group (Transact-SQL).
We are listening: If you find something outdated or incorrect in this article, such as a step or a code example, please tell us. You can click the This page button in the Feedback section at the bottom of this page. We read every item of feedback about SQL, typically the next day. Thanks.
This walks through the steps of configuring a PolyBase Group using:
Two machines in the domain PQTH4A The machine names are:
Domain account: PQTH4A\PolyBaseUser
Install SQL Server with PolyBase on all machines
On the Feature Selection page, select PolyBase Query Service for External Data.
On the Server Configuration page, use the domain account PQTH4A\PolyBaseUser for SQL Server PolyBase Engine and SQL Server PolyBase Data Movement Service.
On the PolyBase Configuration page, select the option Use the SQL Server instance as part of a PolyBase scale-out group. This opens the firewall to allow incoming connections to the PolyBase services.
After setup is complete, run services.msc. Verify that SQL Server, PolyBase Engine and PolyBase Data Movement Service are running.
Select one SQL Server as head node
After setup is complete, both machines can function as PolyBase Group head nodes. In this example, we will choose "MSSQLSERVER" on PQTH4A-CMP01 as the head node.
Add other SQL Server instances as compute nodes
Connect to SQL Server on PQTH4A-CMP02.
Run the stored procedure sp_polybase_join_group.
-- Enter head node details: -- head node machine name, head node dms control channel port, head node sql server name EXEC sp_polybase_join_group 'PQTH4A-CMP01', 16450, 'MSSQLSERVER';
Run services.msc on the compute node (PQTH4A-CMP02).
Shutdown the PolyBase engine and restart the PolyBase data movement service.
Optional: Remove a compute node
Connect to the compute node SQL Server (PQTH4A-CMP02).
Run the stored procedure sp_polybase_leave_group.
Run services.msc on the compute node that is being removed (PQTH4A-CMP02).
Start PolyBase Engine. Restart PolyBase data movement service.
Verify that the node has been removed by running the DMV sys.dm_exec_compute_nodes on PQTH4A-CMP01. Now, PQTH4A-CMP02 will function as a standalone head node
For troubleshooting, see PolyBase troubleshooting with dynamic management views.
For more information about PolyBase, see the PolyBase overview.