Large Variation in Bulk Insert time

Question

So I have a simple Bulk Insert process to take data from our staging table and move it into our datamart.

The process is a simple data flow task with default settings for "Rows per batch" and the options are "tablock" and "no check constraint".

The table is fairly large. 587,162,986 with a data size of 201GB and 49GB of index space. The clustered index for the table is.

CREATE CLUSTERED INDEX ImageData ON dbo.ImageData
(
    DOC_ID ASC,
    ACCT_NUM ASC,
    MasterID ASC
)

And the Primary Key is:

ALTER TABLE dbo.ImageData 
ADD CONSTRAINT ImageData 
PRIMARY KEY NONCLUSTERED 
(
    ImageID ASC,
    DT_CRTE_DOC ASC
)

Now we've been having an issue where BULK INSERT via SSIS is running incredibly slow. 1 hour to insert a million rows. The query that populates the table is already sorted and the query to populate takes under a minute to run.

When the process is running I can see the query waiting on BULK insert which takes anywhere from 5 to 20 seconds and showing a wait type of PAGEIOLATCH_EX. The process is only able to INSERT about a thousand rows at a time.

Yesterday while testing this process against my UAT environment I was running into the same issue. I was running the process a few times and attempting to determine what the root cause of this slow insert is. Then all of a sudden it started running in under 5 minutes. So I ran it a few more times all with the same result. Also the number of bulk inserts that were waiting for 5 seconds or greater dropped form hundreds to about 4.

Now this is perplexing because it's not like we had some huge drop off in activity.

CPU during the duration is low.

The times when it's slower there appear to be fewer waits on disk.

Disk latency actually increases during the time frame that the process was running in under 5 minutes.

And The IO was much lower during the times that this process runs poorly.

I've already checked and there was no file growth as the files are only 70% full. The log file still has 50% to go. The DB is on Simple Recovery mode. DB only has one file group but is spread across 4 files.

So what I'm wondering A: why was I seeing such large wait times on those bulk inserts. B: what sort of magic happened that made it run faster?

Side note. It runs like crap again today.

UPDATE it is currently partitioned. However it's done in a method that is at best silly.

CREATE PARTITION SCHEME [ps_Image] AS PARTITION [pf_Image] 
TO ([FG_Image], [FG_Image], [FG_Image], [FG_Image])

CREATE PARTITION FUNCTION [pf_Image](datetime) AS 
RANGE RIGHT FOR VALUES (
      N'2011-12-01T00:00:00.000'
    , N'2013-04-01T00:00:00.000'
    , N'2013-07-01T00:00:00.000'
);

This leaves essentially all of the data in the 4th partition. However since it's all going to the same file group. The data is currently split pretty evenly across those files.

UPDATE 2 These are the overall waits when the process is running poorly.

This is the waits during the period that I was able to run the process is running well.

The storage subsystem is locally attached RAID, no SAN involved. The logs are on a different drive. Raid Controller is PERC H800 with 1 GB cache size. (For UAT) Prod is a PERC(810) .

We're using simple recovery with no backups. It's restored from a production copy nightly.

We also have set IsSorted property = TRUE in SSIS since the data is already sorted.

score 1 · Answer 1 · answered May 25 '17 at 16:32

I can't point to the cause but I believe the default rows-per-batch for a BULK INSERT operation is "all". Setting a limit in rows could make the operation more digestible: that's why it's an option. (Here and going on, I'm looking at the Transact-SQL "BULK INSERT" documentation, so it could be way off for SSIS.)

It'll have the effect of splitting the operation into multiple batches of X rows, each operating as a separate transaction. If there's an error, the batches that finished will remain committed into the destination table, and the batch that was stopped will rollback. If that's tolerable in what you're doing, i.e. you can re-run it later and catch up, then, try that.

It's not wrong to have a partition function that puts all current inserts into one table partition, but I don't see how it's useful to partition at all with partitions in the same filegroup. And using datetime is poor, and actually kind of broken for datetime and 'YYYY-MM-DD' without explicit CONVERT formula since SQL Server 2008 (SQL may cheerfully treat this as YYYY-DD-MM: not kidding: don't panic, just change it to 'YYYYMMDD', fixed: or CONVERT(datetime, 'YYYY-MM-DDT00:00:00', 126), I think is it). But I think using a proxy for date value (year as int, or year + quarter) to partition on will work better.

Maybe it's a design copied from elsewhere, or duplicated across several datamarts. If this -is- a true datamart, a dump from the data warehouse to give department managers some data to play with, that isn't (by you) being sent on elsewhere, and probably read-only as far as data users are concerned, then, it seems to me that you could remove the partition function -or- change it to explicitly put all new data into the fourth partition no matter what, and no one would care. (Perhaps you should check that no one cares.)

It feels like a design where the plan is to drop the contents of partition 1 some time in the future and create another new partition for more new data, but it doesn't sound like that's happening here. At least it hasn't happened since 2013.

score 0 · Answer 2 · answered Apr 27 '17 at 15:35

I have seen this same sporadic extreme slowness on inserts to large partitioned tables on occasion myself. Have you tried updating the destination tables Statistics and then running again? The extreme wait time could be due to poor stats, and if a stat update was triggered at some point during your testing then that would explain the speed increase. Just a thought and an easy test to verify.

Large Variation in Bulk Insert time

2 Answers2