The architecture of SQL Data Warehouse separates storage and compute, allowing each to scale independently. As a result, compute can be scaled to meet performance demands independent of the amount of data. A natural consequence of this architecture is that billing for compute and storage is separate.
This overview describes how scale out works with SQL Data Warehouse and how to utilize the pause, resume, and scale capabilities of SQL Data Warehouse. Consult the data warehouse units (DWUs) page to learn how DWUs and performance are related.
How compute management operations work in SQL Data Warehouse
The architecture for SQL Data Warehouse consists of a control node, compute nodes, and the storage layer spread across 60 distributions.
During a normal active session in SQL Data Warehouse, your system's head node manages the metadata and contains the distributed query optimizer. Beneath this head node are your compute nodes and your storage layer. For a DWU 400, your system has one head node, four compute nodes, and the storage layer, consisting of 60 distributions.
When you undergo a scale or pause operation, the system first kills all incoming queries and then rolls back transactions to ensure a consistent state. For scale operations, scaling will only occur once this transactional rollback has completed. For a scale-up operation, the system provisions the extra desired number of compute nodes, and then begins reattaching the compute nodes to the storage layer. For a scale-down operation, the unneeded nodes are released and the remaining compute nodes reattach themselves to the appropriate number of distributions. For a pause operation, all compute nodes are released and your system will undergo a variety of metadata operations to leave your final system in a stable state.
|DWU||#of compute nodes||# of distributions per node|
The three primary functions for managing compute are:
Each of these operations may take several minutes to complete. If you are scaling/pausing/resuming automatically, you may want to implement logic to ensure that certain operations have been completed before proceeding with another action.
Checking the database state through various endpoints will allow you to correctly implement automation of such operations. The portal will provide notification upon completion of an operation and the databases current state but does not allow for programmatic checking of state.
Compute management functionality does not exist across all endpoints.
|Pause/Resume||Scale||Check database state|
Performance in SQL Data Warehouse is measured in data warehouse units (DWUs) which is an abstracted measure of compute resources such as CPU, memory, and I/O bandwidth. A user who wishes to scale their system's performance can do so through various means, such as through the portal, T-SQL, and REST APIs.
How do I scale compute?
Compute power is managed for you SQL Data Warehouse by changing the DWU setting. Performance increases linearly as you add more DWU for certain operations. We offer DWU offerings that ensure that your performance will change noticeably when you scale your system up or down.
To adjust DWUs, you can use any of these individual methods.
- Scale compute power with Azure portal
- Scale compute power with PowerShell
- Scale compute power with REST APIs
- Scale compute power with TSQL
How many DWUs should I use?
To understand what your ideal DWU value is, try scaling up and down, and running a few queries after loading your data. Since scaling is quick, you can try various performance levels in an hour or less.
SQL Data Warehouse is designed to process large amounts of data. To see its true capabilities for scaling, especially at larger DWUs, you want to use a large data set which approaches or exceeds 1 TB.
Recommendations for finding the best DWU for your workload:
- For a data warehouse in development, begin by selecting a smaller DWU performance level. A good starting point is DW400 or DW200.
- Monitor your application performance, observing the number of DWUs selected compared to the performance you observe.
- Determine how much faster or slower performance should be for you to reach the optimum performance level for your requirements by assuming linear scale.
- Increase or decrease the number of DWUs in proportion to how much faster or slower you want your workload to perform.
- Continue making adjustments until you reach an optimum performance level for your business requirements.
Query performance only increases with more parallelization if the work can be split between compute nodes. If you find that scaling is not changing your performance, please check out our performance tuning articles to check whether your data is unevenly distributed or if you are introducing a large amount of data movement.
When should I scale DWUs?
Scaling DWUs alters the following important scenarios:
- Linearly changing performance of the system for scans, aggregations, and CTAS statements
- Increasing the number of readers and writers when loading with PolyBase
- Maximum number of concurrent queries and concurrency slots
Recommendations for when to scale DWUs:
- Before you perform a heavy data loading or transformation operation, scale up DWUs so that your data is available more quickly.
- During peak business hours, scale to accommodate larger numbers of concurrent queries.
To save costs, you can pause and resume compute resources on-demand. For example, if you won't be using the database during the night and on weekends, you can pause it during those times, and resume it during the day. You won't be charged for DWUs while the database is paused.
When you pause a database:
- Compute and memory resources are returned to the pool of available resources in the data center
- DWU costs are zero for the duration of the pause.
- Data storage is not affected and your data stays intact.
- SQL Data Warehouse cancels all running or queued operations.
To pause a database, use any of these individual methods.
When you resume a database:
- SQL Data Warehouse acquires compute and memory resources for your DWU setting.
- Compute charges for your DWUs resume.
- Your data will be available.
- You will need to restart your workload queries.
To resume a database, use any of these individual methods.
Check database state
To resume a database, use any of these individual methods.
- Check database state with T-SQL
- Check database state with PowerShell
- Check database state with REST APIs
Refer to the following articles to help you understand some additional key performance concepts: