3

I need to automate a weekly import of .txt log files into a SQL database. I have already created my table, format file and can get some of the data to work using OPENROWSET BULK, but the data from the log files is not consistent.

The table has 10 columns 9 of which are populated from the log file, last 3 columns are all descriptions but the log file may only have a single description column populated (the max is always 3).

In the image below the highlighted rows work because they have the valid 3 column worth of data separated by a comma. Is there a way to force BCP to always populate the 3 columns or mark them as NULL then move onto the next row? enter image description here

Stockburn
  • 501
  • 4
  • 22

2 Answers2

5

You can achieve most things with OPENROWSET BULK because the SELECT can be part of an arbitrarily complex query.

In your case, you could map the always-present columns as usual and consume everything else on the line as a single string column. That ending string can then be split in your SELECT using any convenient method.

Example

Given a 'ragged' input file with 3 required fields and up to 3 optional ones:

a,b,c,1
d,e,f,2,3
g,h,i,4
j,k,l,5,6,7
m,n,o,8
p,q,r,9

The format file consumes everything after the three fixed fields in a single fourth field:

9.0
4
1 SQLCHAR 0 0 ","    1 c1 Latin1_General_CI_AS
2 SQLCHAR 0 0 ","    2 c2 Latin1_General_CI_AS
3 SQLCHAR 0 0 ","    3 c3 Latin1_General_CI_AS
4 SQLCHAR 0 0 "\r\n" 4 c4 Latin1_General_CI_AS

The OPENROWSET query with string splitting for the fourth field:

SELECT
    BI.c1,
    BI.c2,
    BI.c3,
    S.c4, 
    S.c5, 
    S.c6
FROM OPENROWSET
(
    BULK 'C:\Temp\data.txt',
    FORMATFILE = 'C:\Temp\data_format.txt'
) AS BI
CROSS APPLY 
(
    SELECT
        P.*
    FROM 
    (
        -- Split the catch-all column
        SELECT 
            r = ROW_NUMBER() OVER (ORDER BY @@SPID),
            v = [value]
        FROM STRING_SPLIT(BI.c4, ',')
    ) AS SN
    -- Pivot split elements to new columns
    PIVOT 
    (
        MAX(v) 
        FOR r IN ([1], [2], [3])
    ) AS P
) AS S (c4, c5, c6);

Output:

c1 c2 c3 c4 c5 c6
a b c 1 NULL NULL
d e f 2 3 NULL
g h i 4 NULL NULL
j k l 5 6 7
m n o 8 NULL NULL
p q r 9 NULL NULL
Paul White
  • 94,921
  • 30
  • 437
  • 687
-1

Ended up using a combination of Power Shell and SQL, PS cleans the files first, adds the correct headers and formatting combined with a format file I have been able to get this to work and it is reliable.

If anyone does have any thoughts or suggestions please post them.

Stockburn
  • 501
  • 4
  • 22