question

AnkitDubey-2488 avatar image
0 Votes"
AnkitDubey-2488 asked COZYROC Suspended answered

How to improve the insert performance of pyodbc insert for million records csv.

Hello everyone,
I am totally exhausted as a 100MB CSV file is taking 1 hour to be get inserted into the MSSQL database when I am using pyodbc cursor.execute() method.
I searched a lot to improve the speed but still, nothing is working as expected.
I am wondering how the SSIS packages were inserting the same CSV file in 10 minutes and why the python implementation is taking an hour to do the same task.

I really appreciate any help you can provide.

Thanks,
Ankit

sql-server-integration-services
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ZoeHui-MSFT avatar image
0 Votes"
ZoeHui-MSFT answered

Hi @AnkitDubey-2488,

Not familiar with python.

Just from SSIS side, SSIS is a platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data.

Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.

It is great for large ETL between heterogenous sources (and great for large insertions i.e. Bulk).

sql-server-integration-services

Regards,

Zoe


If the answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Hot issues October


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

COZYROC avatar image
0 Votes"
COZYROC Suspended answered

You are most probably doing row-by-row insertion. That's why it is slow. Microsoft SSIS drivers that you use support bulk-load and can insert 5000-10000 rows at a time. That's why the performance difference is so big.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.