question

JediSQL avatar image
0 Votes"
JediSQL asked EchoLiu-msft commented

INSERT... WHERE NOT EXISTS, or IGNORE_DUP_KEY?

I'm looking for some factual knowledge about INSERT statement performance, not conjecture with "probably" or "I imagine."
Does one of these have better performance, or does it not really matter?

A) Using an "INSERT A... SELECT... FROM B... WHERE NOT EXISTS (SELECT FROM A... WHERE B.PrimaryKey = A.PrimaryKey);" style statement.

B) Create table A with IGNORE_DUP_KEY on the primary key and let the database engine reject the duplicates for me?

Thank you in advance,
Dan Jameson

sql-server-transact-sql
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Do you have any update?

0 Votes 0 ·
ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered ErlandSommarskog commented

Former SQL Server MVP Alex Kuznetsov investigated this quite a few years and back and published a blog post about his results. Unfortunately, that post does not seem to be available anymore.

But I have summarised his findings in one of my own article where I discuss IGNORE_DUP_KEY, and I quote that section here:

You may have a situation where key values are largely unique, but there are occasional duplicates, and you don't care which row you get. In this case, IGNORE_DUP_KEY not only makes the coding a little simpler, it is also likely to perform better than regular ways to remove duplicates such as such as DISTINCT or NOT EXISTS. Beware that as the frequency of duplicates increases, the performance benefit of IGNORE_DUP_KEY diminishes, as discussed this blog post from SQL Server MVP Alex Kuznetsov.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Though, “likely” is a synonym of “probably”.

0 Votes 0 ·

Yeah, but I preferred to post in verbatim what I wrote a couple of years back, when Alex's blog post was still available. As I recall, the outcome was that as long as duplicates are occasional, it is a quicker way.

But as they say, "it depends". The only way to get an answer the workload at hand, is to run your own workload.

Now, when I think of it, I recall that Dmitri Korotkevitch has written about a certain type of deadlock which can occur when you use IGNORE_DUP_KEY in one of this books.

0 Votes 0 ·
EchoLiu-msft avatar image
0 Votes"
EchoLiu-msft answered EchoLiu-msft edited

Hi @JediSQL,

For the two methods you mentioned, you should determine which method performs better based on your actual data.

One suggestion is to look at the execution plan and time of the two methods during execution.Open the execution plan,then execute the following statements to view the messages:

     DBCC dropcleanbuffers --Empty the cache
     set statistics io on --Open IO statistics 
     INSERT A... SELECT... FROM B... 
     WHERE NOT EXISTS (SELECT FROM A... WHERE B.PrimaryKey = A.PrimaryKey);
        
        
     --When IGNORE_DUP_KEY is OFF
     DBCC dropcleanbuffers --Empty the cache
     set statistics io on --Open IO statistics
     INSERT A...

If you have any question, please feel free to let me know.


Regards
Echo


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TomPhillips-1744 avatar image
0 Votes"
TomPhillips-1744 answered

As with most of these questions, "it depends" and you would have to test it for your exact situation.

There are several variables which may determine which is better performing:
How many duplicate keys are expected
How many rows in the table
How many rows are being inserted
How big is the PK
Is the PK the clustered index


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.