Importing Data in Parallel with Table Level Locking
Microsoft SQL Server allows multiple clients to bulk import data in parallel into a single unindexed table. This can improve the performance of bulk import operations. Parallel data importing is supported by all three bulk import commands: bcp, BULK INSERT, and INSERT ... SELECT * FROM OPENROWSET(BULK...).
Only applications that use the ODBC-based or SQL OLE DB-based APIs can perform parallel data loading into a single table. Any application, including the bcp utility, that is based on the DB-Library client library supplied with Microsoft SQL Server version 6.5 or earlier cannot participate in parallel data loads into an instance of SQL Server.
If you do not specify the TABLOCK option/hint, multiple concurrent bulk import streams acquire locks at a granularity below the table level. Depending on the data distribution, the concurrent bulk load streams might block each other. Therefore, bulk importing data in parallel into an unindexed table is usually performed with TABLOCK.
When bulk importing data in parallel into an instance of SQL Server by using TABLOCK, consider the following:
The simplest scenario for importing data in parallel is loading data into an unindexed table (a heap).
If the table has no indexes, specify the TABLOCK option for the bulk-import operation. For more information, see Controlling the Locking Behavior for Bulk Import.
When indexes exist on a table, you cannot perform a parallel load operation by using the TABLOCK option. Also, the concurrent threads block each other if TABLOCK is not specified. Before a bulk-import operation, consider removing indexes from the table. For information about whether to keep or remove indexes, see Guidelines for Optimizing Bulk Import.
Divide the data to be imported among the clients into the same number of data files as there are clients. Place one of the files on each of the clients.
Best Practice To use the processor most efficiently, distribute the data evenly among the clients. Make sure that data files are similar in size if you want to import them in parallel from multiple clients into an instance of SQL Server. Otherwise, a lightly loaded client thread might finish early and, therefore, use that client's CPU inefficiently.
To achieve maximum performance, the batch size specified for each client should be the same as the size of the client's data file. For more information, see Managing Batches for Bulk Import.
After data has been bulk imported into the table, you can create any required indexes as follows:
- Create each clustered index, in turn, from a single client. For more information, see Creating Clustered Indexes.
- Create the nonclustered indexes. These can be created concurrently from different clients. For more information, see Creating Nonclustered Indexes.
If you do not specify TABLOCK, you can bulk import data in parallel regardless of the number of indexes on the target. But in this case, bulk optimizations are not possible, and you might encounter blocking when obtaining locks on individual rows or pages.