What are data pools in a SQL Server big data cluster?

Applies to: yesSQL Server 2019 (15.x)

This article describes the role of SQL Server data pools in a SQL Server 2019 Big Data Clusters. The following sections describe the architecture and functionality of a SQL data pool.

This 5-minute video introduces data pools and shows you how to query data from data pools:

Data pool architecture

A data pool consists of one or more SQL Server data pool instances. SQL data pool instances provide persistent SQL Server storage for the cluster. A data pool is used to ingest data from SQL queries or Spark jobs. To provide better performance across large data sets, data in a data pool is distributed into shards across the member SQL data pool instances.

Scale-out data marts

Data pools enable the creation of scale-out data marts, where external data from multiple sources is ingested into the data pool. Because data is distributed across data pool instances, parallel queries against the curated data are more efficient.

Scale-out data mart

Next steps

To learn more about the SQL Server Big Data Clusters, see the following resources: