0

I have multiple tables in an Oracle schema that have columns with incorrect values:

  • Number columns where the value is zero, instead of null.
  • Text columns where the value is a space, instead of null.

I’m in the process of cleaning up the data. I will eventually create constraints on the columns to prevent bad data being entered in the first place. But before I do that, I need to find the columns that have existing problems and investigate them.

To start, I would like to produce a list of columns that have incorrect values.

  • For each table in a schema, provide a list of columns that have zeros or spaces as values.
table_name    column_name    datatype    value    count
—————————————————————————————————-————————————-———————-
roads         width          number      0        500
sidewalks     description    varchar2    [space]  10000
sidewalks     const_year     number      0        2000

Question:

What would it take to produce a list like this in Oracle?

(Note: I’m a public works technician by day, not a database administrator. Layman’s terms would be appreciated.)

User1974
  • 1,517
  • 25
  • 54

2 Answers2

1

You should look into querying the information schema based tables (such as ALL_TAB_COLUMNS or DBA_TAB_COLUMNS) to get a list of all tables and all columns: https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/ALL_TAB_COLUMNS.html#GUID-F218205C-7D76-4A83-8691-BFD2AD372B63

https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/DBA_TAB_COLUMNS.html#GUID-91F945AE-5778-45A6-A07E-775A006A9AA1

Then look into how to write dynamic SQL to easily generate the code you need to query every column of every table: https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/dynamic-sql.html#GUID-7E2F596F-9CA3-4DC8-8333-0C117962DB73

Here is some psuedo-code on what your thought process should be when creating the dynamic SQL (*note this is not Oracle PL/SQL syntax, so just take it as a general example on how to build dynamic SQL, and focus mostly on step 3):

-- 1) Create a temp table to hold the results
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(100), ColumnName VARCHAR(100), [Value] VARCHAR(100) NULL)

-- 2) Dynamic SQL variable used to build the dynamic SQL string DECLARE @DynamicSQL AS NVARCHAR(MAX) = ''

-- 3) Build the dynamic SQL string using the DBA_TAB_COLUMNS information schema view. First query gets any row whose column has 0 as a value, second query gets any row whose column has a NULL value SELECT @DynamicSQL = @DynamicSQL + ' INSERT INTO #Results SELECT ''' + TABLE_NAME+ ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue FROM [' + TABLE_NAME+ '] WHERE [' + COLUMN_NAME + '] = ''0'';' + ' INSERT INTO #Results SELECT ''' + TABLE_NAME + ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue FROM [' + TABLE_NAME + '] WHERE [' + COLUMN_NAME + '] IS NULL;' FROM sys.DBA_TAB_COLUMNS

-- 4) Execute the dynamic SQL string EXEC sp_ExecuteSQL @DynamicSQL

-- 5) Select the final results, grouping up on TableName, ColumnName, and Value to get the counts of 0 and NULL values per column per table SELECT R.TableName, R.ColumnName, TC.DATA_TYPE AS DataType, R.[Value], COUNT(1) AS [Count] FROM #Results AS R INNER JOIN sys.DBA_TAB_COLUMNS AS TC ON R.TableName = TC.TableName AND R.ColumnName = R.ColumnName GROUP BY R.TableName, R.ColumnName, R.[Value]

J.D.
  • 40,776
  • 12
  • 62
  • 141
0

Here's what I came up with:

--DROP TABLE incorrect_value_results;
--CREATE TABLE incorrect_value_results (id NUMBER, table_name VARCHAR2(30), column_name VARCHAR2(30), val_count NUMBER, value varchar2(30));
TRUNCATE TABLE incorrect_value_results;

DECLARE l_count NUMBER; l_inv_char_str VARCHAR2(2000); TYPE invalid_char_val_rec IS RECORD( cnt NUMBER, inv_char VARCHAR2(20) ); TYPE invalid_char_val_tab IS TABLE OF invalid_char_val_rec INDEX BY PLS_INTEGER; l_inv_char_vals invalid_char_val_tab; l_index NUMBER; BEGIN l_index := 0; -- Loop through each table in the schema FOR i IN (SELECT table_name FROM user_tables) LOOP -- Loop through each relevant column for this table -- Exclude column EVT_FROM FOR j IN (SELECT column_name, data_type FROM user_tab_cols WHERE table_name = i.table_name AND column_name NOT IN ('EVT_FROM','ANGLE','UND','OFFSET') AND table_name NOT LIKE 'A_%' AND table_name NOT LIKE 'SDE%' AND table_name NOT LIKE '%ANNO%' AND table_name NOT IN ('INCORRECT_VALUE_RESULTS','NUMBERS') AND data_type IN ('VARCHAR2', 'CHAR', 'NCHAR', 'NVARCHAR2', 'NUMBER')) LOOP IF j.data_type IN ('VARCHAR2', 'CHAR', 'NCHAR', 'NVARCHAR2') THEN EXECUTE IMMEDIATE 'SELECT COUNT(1), '||j.column_name|| ' FROM '||i.table_name|| ' WHERE UPPER('||j.column_name||') IN('' '', '' '', ''0'', ''-'', ''NULL'', ''<NULL>'' ) GROUP BY '||j.column_name BULK COLLECT INTO l_inv_char_vals;

     ELSIF j.data_type = 'NUMBER' THEN
        EXECUTE IMMEDIATE 
           'SELECT COUNT(1), '||j.column_name||
            ' FROM '||i.table_name|| 
            ' WHERE UPPER('||j.column_name||') &lt;= 0
              GROUP BY '||j.column_name
             BULK COLLECT INTO l_inv_char_vals;
     END IF;     
     -- If there are results then log them
     l_index := l_index + 1;
     FORALL k IN 1..l_inv_char_vals.COUNT
        INSERT INTO incorrect_value_results (id, table_name, column_name, val_count, value)
        VALUES (l_index,
                i.table_name, 
                j.column_name, 
                l_inv_char_vals(k).cnt,
                l_inv_char_vals(k).inv_char);
  END LOOP;

END LOOP; END; / COMMIT;

User1974
  • 1,517
  • 25
  • 54