question

manishverma-7371 avatar image
0 Votes"
manishverma-7371 asked HimanshuSinha-MSFT commented

is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

Hi All,

we are follow up this url-
https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases

we want to know is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

our requirement is very sample we want write data frame data to azure sql table


azure-databricks
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered VermaManishKumar-8902 commented

Hello @manishverma-7371 ,

Thanks for the ask and using the forum .

Well the Adb is architechtured is such a way that they can read lot of data at the same time using executors . While writing the data on the SQL side i think you should plan better and implement partitions , otherwise while inserting huge data in the same table at the same time can turn out to be a nightmare . There is no limitation as such .

Thanks Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,
still i am not clear why nightmare is Microsoft test properly before they release.

what we are planning we create table insert and copy data from this table to DWH using copy activity. once copy is done then we drop this table.

this approach we think to implement. please suggest still we have chance for nightmare then we will not go with this solution

0 Votes 0 ·

Hi ,

this is urgent case we need to take decision, as client waiting for us. request you please confirm that there is no problem to transfer data.

else we take other decision write data in data lake gen2 and then we transfer, but in this approach additional cost for datalakegen2

0 Votes 0 ·

Hello @manishverma-7371 ,

Sorry for the confusion , in the intial post you did mentioned Azure SQL and not DWH . Since DWH is basically designed for huge load and you should be good .

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture

Thanks
Himanshu

0 Votes 0 ·
Show more comments
DavidBrowne-msft avatar image
0 Votes"
DavidBrowne-msft answered DavidBrowne-msft edited

There are purpose-built Spark Connectors for SQL Server and Azure SQL Database and for Azure Synapse SQL Pools (SQL DW).

You should use these for large loads, rather than the generic JDBC Spark Connector. But the JDBC Connector will work; it just may be slower than the purpose-built ones.



· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

thanks for replay,

i am not sure customer agreed to use open source Connector.

0 Votes 0 ·

can we close this on final decision not use JDBC Connector.
please confirm this so we can take decision

0 Votes 0 ·
RagunathanRamanujam-3564 avatar image
0 Votes"
RagunathanRamanujam-3564 answered HimanshuSinha-MSFT commented

Hi @VermaManishKumar-8902/@manishverma-7371

With all due respect, I would like to share some of my thoughts about your requirements. I am understanding that your requirement is reading billion records(Assuming source data coming in File) from the source using Databricks and write into SQL server Hyper scale.
I could not able to see any limitations on the JDBC driver supporting writing billion records as the issues starts to occur when you could not able to fit all of your source records into your Databricks Spark Cluster Memory and writing technique used .As a developer we need to come out with better design patterns to handle large volumes data inside Databricks. One of the better design pattern @HimanshuSinha-MSFT provided is partitioning the data and this will split your data into multiple partitions and also improve the writing speed as each partition data will be written independently.

Again with due respect, I am not Microsoft employee but have not seen release of any partial tested components by Microsoft without giving relevant documentation about it. We all will try to help if you provide what error is occurring when you are loading large volumes using Databricks and could able to find out the better design approach .

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks all,
for your feedback. so we can use table partitioning that will help us to improve writing speed. other wise we have other option we should write data in data lake where we have multiple partition part file that we will load into AZURE SQL(Hyper scale) using data factory copy activity

1 Vote 1 ·

Thanks @DavidBrowne-msft & @RagunathanRamanujam-3564 for your invaluable thoughts on this threads .

@VermaManishKumar-8902 : Please do let us know how it goes on your side when you implement the solution .

Thanks Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

1 Vote 1 ·