question

serverlesssql avatar image
0 Votes"
serverlesssql asked ·

Data Processed when using aggregate query

Hi, I have ~3GB of CSV data in Azure Storage and am using Serverless SQL Pools to query this data using a View. If I do a query such as this I get a data processed total of around 3GB (which is to be expected as it's scanning all the data)

SELECT COUNT()
FROM
OPENROWSET
(
BULK 'sourcedatadesktopdataflow/salesorderline/OrderYear=
/OrderMonth=/OrderDatePartition=/*.csv',
DATA_SOURCE = 'ExternalDataSourceDataLake',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE,
FIELDTERMINATOR =','
)
WITH
(
OrderLineID INT,
OrderDate DATE
)
AS fct

However, if I group by a column in the CSV data, the data processed is doubled to 6GB and I can't work out why this would be the case. Does anyone have any suggestions?

SELECT OrderDate,COUNT()
FROM
OPENROWSET
(
BULK 'sourcedatadesktopdataflow/salesorderline/OrderYear=
/OrderMonth=/OrderDatePartition=/*.csv',
DATA_SOURCE = 'ExternalDataSourceDataLake',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE,
FIELDTERMINATOR =','
)
WITH
(
OrderLineID INT,
OrderDate DATE
)
AS fct
GROUP BY OrderDate

azure-synapse-analytics
· 2
10 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @serverlesssql,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others . Otherwise, will respond back with the more details and we will try to help .
Thanks
Himanshu

0 Votes 0 ·

Hello @serverlesssql,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others . Otherwise, will respond back with the more details and we will try to help .
Thanks
Himanshu

0 Votes 0 ·
HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered ·

Hello @serverlesssql ,
Thanks for the ask and using the Microsoft Q&A platform .

Please check the query plan in both cases and see what the difference . It may be sorting/spooling/hashing + spilling .

Please do let me know how it goes .
Thanks
Himanshu
Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

10 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

serverlesssql avatar image
0 Votes"
serverlesssql answered ·

Hi, this is serverless sql pools, is there an option to show the query plan?

10 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.