INSERT... WHERE NOT EXISTS, or IGNORE_DUP_KEY?

JediSQL 76 Reputation points
2021-04-05T16:38:12.797+00:00

I'm looking for some factual knowledge about INSERT statement performance, not conjecture with "probably" or "I imagine."
Does one of these have better performance, or does it not really matter?

A) Using an "INSERT A... SELECT... FROM B... WHERE NOT EXISTS (SELECT FROM A... WHERE B.PrimaryKey = A.PrimaryKey);" style statement.

B) Create table A with IGNORE_DUP_KEY on the primary key and let the database engine reject the duplicates for me?

Thank you in advance,
Dan Jameson

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 100.9K Reputation points MVP
    2021-04-05T17:23:50.083+00:00

    Former SQL Server MVP Alex Kuznetsov investigated this quite a few years and back and published a blog post about his results. Unfortunately, that post does not seem to be available anymore.

    But I have summarised his findings in one of my own article where I discuss IGNORE_DUP_KEY, and I quote that section here:

    You may have a situation where key values are largely unique, but there are occasional duplicates, and you don't care which row you get. In this case, IGNORE_DUP_KEY not only makes the coding a little simpler, it is also likely to perform better than regular ways to remove duplicates such as such as DISTINCT or NOT EXISTS. Beware that as the frequency of duplicates increases, the performance benefit of IGNORE_DUP_KEY diminishes, as discussed this blog post from SQL Server MVP Alex Kuznetsov.


  2. EchoLiu-MSFT 14,571 Reputation points
    2021-04-06T06:44:15.493+00:00

    Hi @JediSQL ,

    For the two methods you mentioned, you should determine which method performs better based on your actual data.

    One suggestion is to look at the execution plan and time of the two methods during execution.Open the execution plan,then execute the following statements to view the messages:

        DBCC dropcleanbuffers --Empty the cache  
        set statistics io on --Open IO statistics   
        INSERT A... SELECT... FROM B...   
        WHERE NOT EXISTS (SELECT FROM A... WHERE B.PrimaryKey = A.PrimaryKey);  
          
          
        --When IGNORE_DUP_KEY is OFF  
        DBCC dropcleanbuffers --Empty the cache  
        set statistics io on --Open IO statistics  
        INSERT A...  
    

    If you have any question, please feel free to let me know.

    Regards
    Echo


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  3. Tom Phillips 17,716 Reputation points
    2021-04-06T14:11:00.813+00:00

    As with most of these questions, "it depends" and you would have to test it for your exact situation.

    There are several variables which may determine which is better performing:
    How many duplicate keys are expected
    How many rows in the table
    How many rows are being inserted
    How big is the PK
    Is the PK the clustered index

    0 comments No comments