question

arkiboys avatar image
0 Votes"
arkiboys asked ShaikMaheer-MSFT commented

Parquet files

Hello,
Is it better to use parquet files rather than azure sql server?
Looking at the system already in-place, In synapse pipelines, we pull data from on-prem sql server into parquet files and then we query the parquet files...
My question is:
Why not use Azure sql server to hold the data rather than parquet files?
Our goal is to provide reports in power BI.
I think it is easier to query Azure sql server rather than parquet files?...
What are your thoughts please?

Thank you

azure-data-factory
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @arkiboys ,

Could you please accept below provided answer. Accepting Answer will help community too. Thank you.

1 Vote 1 ·

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered ShaikMaheer-MSFT edited

Hi @arkiboys ,

Thank you for posting query in Microsoft Q&A Platform.

Is it better to use parquet files rather than azure sql server?

Relational databases are good for small or medium datasets and not meant for bigdata. When we want to store and process bigdata then we should go with unstructured or semi structured format of data such as csv files or parquet files etc.

Parquet files will have schema also in it along with data. To know more about Parquet files and advantage of using them, please check below link.
https://databricks.com/glossary/what-is-parquet

Why not use Azure sql server to hold the data rather than parquet files?

In real time, data engineer teams will try to gather data from different source systems such as on-prem systems or APIs etc.. and dump that all data at some bigdata storages such as Data lake stores. While they dump data they will dump in bigdata file formats such as parquet files or csv files. Reason, is then formats are capable of handle bigdata and process it.

And now, Once all sources data is available in Data lake store, data engineers try to process this big data and try to take subset of data from it and load it in to Azure data ware house systems in to tables. Now, from these tables power bi team creates reports.

So to include data will move Source systems --> Data lake ---subset of data---> Data ware house --> Power BI report.

But now, With Azure Synapse we have capability of connecting power BI directly with Data lake store. Hence we no need to have Data ware house at end to load subset of data.

Hope this will help. Thank you.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.