I am trying to run a script that will batch insert over 20 million records into a table 10,000 at a time. At the start of the run it seems to be working fine. Although once a number of records had already been inserted (270,000) the script began to slow done considerably. It took 23 hours to insert another 30,000 records. My best guess is that as the number of new records increases the part where the script checks to see if the new record already exist is taking much longer. I have created indexes of the tables used in the script but I need shave off run time for this script. Any help would be much appreciated. My script is below.
CREATE NONCLUSTERED INDEX [plan2TMP] ON [dbo].[plan2]
(
[l_dr_plan1] ASC
)
INCLUDE
(
l_address,
l_provider
)ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nameTMP] ON [dbo].[name]
(
[dr_id],
[nationalid]
)ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [plan1TMP] ON [dbo].[plan1]
(
[dr_id],
[cmt]
)ON [PRIMARY]
GO
DECLARE @BatchSize int = 10000
WHILE 1 = 1
BEGIN
INSERT INTO plan2(
l_dr_plan1
, l_address
, l_provider
)
SELECT TOP (@BatchSize)
dr1.link
,ad1.link
,dr.link
from plan1 dr1
INNER JOIN name dr ON dr1.dr_id = dr.dr_id
INNER JOIN name2 dr2 on dr2.nationalid = dr1.cmt
INNER JOIN address1 ad1 ON ad1.dr_id = dr2.dr_id
WHERE NOT EXISTS (
SELECT l_plan1
FROM plan2
WHERE ltrim(rtrim(dr1.link)) + ltrim(rtrim(ad1.link)) + ltrim(rtrim(dr.link)) = ltrim(rtrim(l_dr_plan1)) +ltrim(rtrim(l_address)) +ltrim(rtrim(l_provider))
)
AND dr1.cmt <> ''
IF @@ROWCOUNT < @BatchSize
BREAK
END
The source data isn't really bad. I could pull out the ltrim/rtrim without encountering any problems. The reason I have the concatenation is that I couldn't come up with a better way to compare the values I was inserting to the values that had already been inserted.
The link value is a unique column in each table (but not the PK or FK). The plan2 table references the links of the plan1, address1, and name tables to connect those records in the back end and show them together in the front end.
The purpose of inserting 10k at a time was that the log file grew so large it brought the server memory to about 10 mbs. With the batch it will commit records in chunks and keep the log file from expanding out of control.