Intermittent slow inserts on a large table with lots of data churn

Question

I have a rather large table that has anywhere for 400 million to 700 million rows in it at a time. This table contains individual business transactions that contain a unique id field (generated by a third party system). We keep the last 15 days worth of transactions in this table. We have a unique index on this table for unique id field. We have to keep this unique index as a check as the unique identifier is generated by a third party and we need to make sure we don't import the same record twice into this table.

The data is imported continuously round the clock by an ssis job that imports it from CSV files. Every couple of hours we run a process to aggregate the new data and write the aggregates to a new table. Every evening we delete all rows in the table that are older than 15 days to keep only the most recent 15 days worth of data in the table. We ingest approximately 40 million rows each business day and substantially less on the weekends. We end up with a a range of 400-700 million rows of data in the table at all times.

The problem is that sometimes the inserts are fast and other are slow and I cannot figure out why. I have maintenance job that rebuilds the index (not re-org) twice a day and includes a 50% fill factor (I have played with this setting and seem to get my longest run of good throughput with this setting). Even when I rebuild it the index is less than 15% fragmented, but the rebuild seems to help the import performance each time anyways.

I also have have a clustered index on the importid column, which is just a sequential identity column added to each row. We delete in order as so this index is never above 4% fragmented. Any ideas on what could be causing this? I have ran traces, checked resource monitor etc. I am not seeing blocking, over taxed disks, cpu, etc. I rarely if ever have anything else touch this table. It only exists to prevent duplicates and then generate the aggregates.

Below is the table definition. I am open for any and all ideas as this has me stumped! I would like to solve this in the short term, but open to any long term better solution ideas as well!

USE [BizUnit_Dev]
GO

/****** Object: Table [dbo].[BizUnitProd] Script Date: 10/15/2020 11:23:34 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[BizUnitProd](
[order] varchar NULL,
[account] varchar NULL,
[name] varchar NULL,
[network] varchar NULL,
[cpm] [float] NULL,
[fixedCost] [float] NULL,
[leaseId] [uniqueidentifier] NULL,
[creativeId] varchar NULL,
[issued] datetime2 NULL,
[impressions] [int] NULL,
[completed] datetime2 NULL,
[channelKey] varchar NULL,
[playerId] [uniqueidentifier] NOT NULL,
[leaseProgress] [float] NULL,
[flightProgress] [float] NULL,
[bonus] [bit] NULL,
[id] varchar NULL,
[orderName] varchar NULL,
[creative] varchar NULL,
[importedOn] datetime2 NOT NULL,
[importId_orig] [bigint] NOT NULL,
[localtime] datetime2 NULL,
[RAdded] datetime2 NULL,
[orderNumber] varchar NULL,
[__FileName] varchar NULL,
[ImportID] [bigint] IDENTITY(1,1) NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [idxBizUnitProd__importId] ON [dbo].[BizUnitProd]
(
[importId] ASC
)
INCLUDE([network],[completed],[playerId],[creative]) WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_BizUnitProd_ID] ON [dbo].[BizUnitProd]
(
[id] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

Answer

Hi @JasonW-5564 ，

Rebuilding the index and setting the fill factor can reduce index fragmentation, thereby improving query efficiency,reduce the impact of indexes on inserted data.
Regarding the insertion is sometimes slow,could you provide an execution plan ?

In addition, you can check whether there are uncommitted transactions and deadlocks (deadlock tracking can be turned on in advance). When inserting a large amount of data, you can first insert it into an intermediate table, and then insert it into the current table.

I found a similar problem, which may help you:
Insert statement sometimes runs slow and sometimes fast SQL Server 2014

If you have any question, please feel free to let me know.
If the response is helpful, please click "Accept Answer" and upvote it.

Regards
Echo

If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Answer

Checking execution plan might be helpful, if possible.

You have already played with fillfactor, reducing plage splits when inserts happens. The clustered index is on something ever increasing, so that shouldn't become fragmented. But the other indexes might, of course.

Can you spot a pattern where the slowness is more prominent the longer time it was since last defrag? I.e., is it defrag related.

You checked for blocking already. It shouldn't be deadlocks, since I'd assume you'd spots the 1205 errors for the selected victim.

How about autogrow of database files? That can take a while! You can capture that in an XE trace.

Or auto update statistics? Again, you can capture that in a trace.

You can for instance use my "looking for strange" trace found here: https://karaszi.com/looking-for-strange. Perhaps run it in two phases. First to see what events actually happens. Then modify the trace, remove events in interesting and add a file target so you not only see how meny of each event but also capture the actual events.

Another thing is to check what you are waiting for. I don't know what your insert job looks line, but you can get session level wait stats and possibly insert that into a table so you can check that at a later time. Wait stats for the whole instance ls less relevant, since your import wait stats will be muddled by the whole.

Answer

I am sorry for the delayed response. I did not get a notice of replies! I am new to this format and am used to the previous MSDN forums. Not much change. I did want to reiterate that I have used perfmon, resource monitor, etc. to ensure I do not have a disk system bottleneck and ran multiple benchmark tests to confirm. I now rebuild this index 4 times a day taking a 30 minute penalty where I stop the import/inserts, rebuild then turn the import job back on. My index is never more than 2% fragmented or pages more than 53% full (specified to 50% fill factor in index). This seems to help but still seeing occasional periods of slowness. In fact saw a period of slowness last night for ~ 5 hours and when I looked no other job running, no other active queries at the time via sp_who/sp_who2. I have never looked at query plans interms of bulk inserts only selects really. It has the feeling of this insert job occasionally gets a bad query plan for whatever reason. Does this sound feasible? If so how do I correct it. It is a straight bulk insert, with no where clause or conditions. I don't understand what is happening!

Answer

First, you can set up get mail replies from this forum as well. Click on your avatar in the upper-right corner and select your Profile and then Settings.

What does "SELECT @@version" report? That is always good to know.

To be honest I don't have that much input to offer; this kind of problem is difficult to assess without access to the system, and I have to rely on what you tell me what you already have done.

But one thing about the query plans: there are two indexes on the table, and this means that SQL Server has a choice for the query plans, so the slowness can indeed be found in the query plan.

If you have SQL 2016 or later, Query Store is a great tool to track this kind of thing. Not the least since the issue is intermittent. By default, Query Store aggregates by data by the hour, so this permits you go back and check the plans when there has been a slowness issue.

Answer

I don't think that bulk loading operations has execution plans, at least not in the same way as traditional DML commands.

What can affect a bulk load are things like recovery model for the database, what indexes you have on the table, if you specify tablock etc. So think long and hard of it could be anything like that.

I reiterate my earlier suggestions:

Waits stats. Unless you know what you are waiting for, you are in the dark. I believe that sys.dm_exec_session_wait_stats came in 2016, so you have to rely on sys.dm_os_wait_stats. Grab a snapshot into a table before and efter the import. See what you wait for. Compare between fast and slow runs to see what differs. Use Glenn Berry's diag scripts to filter out stuff that are just noise.

And also use my looking for strange trace. Start it before a run, and check it in the morning to see if any strange things happens during the import. It if does and you need more info, then just modify the trace and add a file destination. Now you don't just count how many of each event, you also grab the actual events.

See my earlier reply for some more info.

Intermittent slow inserts on a large table with lots of data churn

5 answers