Query timeout expired - Synapse serverless built-in pool

victorcuevasv 21 Reputation points
2021-06-08T01:38:17.083+00:00

I am evaluating some large queries on parquet data accessed via OPENROWSET views on the built-in Synapse serverless pool. Some of them fail after about 30 minutes of execution with a query timeout expired error. The queries fail due to timeout whether I evaluate them on Synapse Studio, SSMS, or via JDBC. In the case of JDBC, the setQueryTimeout method does not have an effect on the server side.

I would like to modify the query timeout to make it larger, however, so far I have not been able to find out how to do so or to determine if it is possible from the documentation. For instance, using sp_configure or a resource class does not work since they are not supported on serverless. Thank you very much for your help.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,405 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 77,901 Reputation points Microsoft Employee
    2021-06-08T09:26:37.397+00:00

    Hello @victorcuevasv ,

    Welcome to the Microsoft Q&A platform.

    Unfortunately, the timeout cannot be changed on Synapse Serverless pool.

    Best practice in this case is to optimize the query so that it will run faster.

    If the query has a lot of logical steps, breaking it into multiple queries with CETAS is a good approach as well.

    The below guidelines are to be followed for a better performance of the query.

    • Convert large CSV and JSON to Parquet. Parquet is a columnar format. Because it's compressed, its file sizes are smaller than CSV or JSON files that contain the same data. Serverless SQL pool is able to skip the columns and rows that are not needed in query if you are reading Parquet files. Serverless SQL pool will need less time and fewer storage requests to read it.
    • If a query targets a single large file, you'll benefit from splitting it into multiple smaller files.
    • Try to keep your CSV file size between 100 MB and 10 GB.
    • It's better to have equally sized files for a single OPENROWSET path or an external table LOCATION.
    • Partition your data by storing partitions to different folders or file names. See Use filename and filepath functions to target specific partitions.
    • Make sure that your client applications are collocated with the serverless SQL pool.
    • If you are using client applications outside of Azure (for example Power BI Desktop, SSMS, ADS), make sure that you are using the serverless pool in some region that is close to your client computer.
    • Make sure that the storage (Azure Data Lake, Cosmos DB) and serverless SQL pool are in the same region.
    • Try to optimize storage layout using partitioning and keeping your files in the range between 100 MB and 10 GB.
    • If you are returning a large number of results, make sure that you are using SSMS or ADS and not Synapse Studio. Synapse Studio is a web tool that is not designed for large result-sets.

    Please go through the below articles for your reference.

    Best practices for serverless SQL pool - Azure Synapse Analytics | Microsoft Learn.

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.


0 additional answers

Sort by: Most helpful