7

I'm currently working on a project which bulk import data from flat files (csv) about 18 different files each linking to a specific table through some stored procedure.

I followed the steps as advised in Data Loading Performance guide.

The database is in BulkLogged recovery mode to minimize the logging, when executing the stored procedure below on a file containing 600000 rows I get an error

Msg 9002, Level 17, State 4, Procedure SP_Import__DeclarationClearanceHistory_FromCSV, Line 34
The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

(for testing purposes I do a full backup before starting the import).

Looking at the log_reuse_wait_desc I see the following:

log_reuse_wait_desc CHECKPOINT. All other import get imported successfully.

Any input in solving this would be welcomed.

PROCEDURE [dbo].[SP_Import_DeclarationClearanceHistory_FromCSV]
    @FilePath [nvarchar](1000)
AS
BEGIN
    -- Creating a Temproary Table for importing the data from csv file.
    DBCC TRACEON(610)

    CREATE TABLE #DeclarationClearanceHistory
    (
          [ItemID] [int] IDENTITY(1, 1) NOT NULL ,
          [CMSDeclarationID] [bigint] NOT NULL ,
          [StatusCode] [nvarchar](10) NOT NULL ,
          [SubStatus] [nvarchar](10) NULL ,
          [DepartmentCode] [nvarchar](10) NULL ,
          [StartDate] [datetime] NULL ,
          [EndDate] [datetime] NULL ,
          PRIMARY KEY CLUSTERED ( [ItemID] ASC )
            WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
                   IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
                   ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY]
        )
    ON  [PRIMARY]

    -- Inserting all the from csv to temproary table using BULK INSERT
    EXEC ('BULK INSERT #DeclarationClearanceHistory
    FROM ''' + @FilePath + '''
    WITH ( FIELDTERMINATOR = ''<,>'', ROWTERMINATOR =''\n'', FIRSTROW = 2, KEEPIDENTITY, CODEPAGE = ''ACP'', ORDER = ''ITEMID ASC'' );') ;

    -- By using MERGE statement, inserting the record if not present and updating if exist.
    MERGE dbo.DeclarationClearanceHistory AS TargetTable                            -- Inserting or Updating the table.
        USING #DeclarationClearanceHistory AS SourceTable                           -- Records from the temproary table (records from csv file).
        ON ( TargetTable.ItemID = SourceTable.ItemID )      -- Defining condition to decide which records are alredy present
        WHEN NOT MATCHED BY TARGET 
            THEN INSERT (
                          ItemID ,
                          CMSDeclarationID ,
                          StatusCode ,
                          SubStatus ,
                          DepartmentCode ,
                          StartDate ,
                          EndDate
                        )
               VALUES   ( SourceTable.ItemID ,
                          SourceTable.CMSDeclarationID ,
                          SourceTable.StatusCode ,
                          SourceTable.SubStatus ,
                          SourceTable.DepartmentCode ,
                          SourceTable.StartDate ,
                          SourceTable.EndDate
                        )
        WHEN MATCHED -- If  matched then UPDATE
            THEN UPDATE
               SET      TargetTable.ItemID = SourceTable.ItemID ,
                        TargetTable.CMSDeclarationID = SourceTable.CMSDeclarationID ,
                        TargetTable.StatusCode = SourceTable.StatusCode ,
                        TargetTable.SubStatus = SourceTable.SubStatus ,
                        TargetTable.DepartmentCode = SourceTable.DepartmentCode ,
                        TargetTable.StartDate = SourceTable.StartDate ,
                        TargetTable.EndDate = SourceTable.EndDate ;
DBCC TRACEOFF(610)
END
marc_s
  • 9,052
  • 6
  • 46
  • 52
Raymond
  • 229
  • 1
  • 3
  • 9

2 Answers2

4

When you see CHECKPOINT as the log_reuse_wait_desc for that database, it is because no checkpoint has happened since the last time the log was truncated.

You can alleviate this issue by manually kicking off a CHECKPOINT command.

Supporting references:
Factors That Can Delay Log Truncation
Checkpoints and the Active Portion of the Log

Thomas Stringer
  • 42,434
  • 9
  • 120
  • 155
4

My first comment is that you are doing an ELT (Extract, Load, Transform) rather than an ETL (Extract, Transform, Load). While ELTs leverage set based relational advantages and can be very fast, they are sometimes very write intensive (hard on storage). Specifically, the t-log. This is because the transform is done on disk (typically an update or insert). I prefer ETL when possible, as the transform is done in the buffer and, when done correctly, requires minimal t-log writes. Buffer is cheap. Fast storage is not. For some bulk operations, the t-log is a non-value adding bottleneck.

Here are a few things that you're doing but I wouldn't recommend.

  1. Bulk loading to tempdb. I'd recommend doing the bulk load on a real table in the destination database. Then you can size you're files accordingly and not worry about impacting tempdb.
  2. Bundling independent procedures together. Split these two procedures up. The bulk load and the merge are independent of each other. Splitting them into individual procedures makes them more modular / unit testable.

It looks like you have the minimal logging rules covered pretty well. You're loading to an empty B-Tree with no non-clustereds, using tf 610, the ordering key specified, in bulk-logged mode. Outside of the temp table, everything looks ok here. As long as the file is actually ordered by the key, you should be good. Are you popping the log on tempdb or the user database?

On the merge statement:

UPDATES will always be fully logged. Are you changing a pretty significant portion of your table? If so, you might consider doing the merge in memory (SSIS data flow task or .Net) then bulk loading into a new table. This is more work, but most of the work is done in the buffer and minimal t-log is used. A minimally logged insert can be faster than a fully logged update if the portion of change is significant.

Since you're using tf 610, the insert can minimally log when using a tablock hint. See here for more info on merge with tablock: Link Note, the update will still be fully logged if you go this route.

Glorfindel
  • 2,205
  • 5
  • 19
  • 26
brian
  • 1,053
  • 6
  • 7