4

After some research I found an example where I can find tables and its columns with NULL values. But the function also returns true when the column is empty. It gives me errors when I try to add an or statement. How can I modify the function so it returns also true when the column contains blank values? This is the function I found:

create function has_nulls(p_schema in text, p_table in text, p_column in text)
                returns boolean language plpgsql as $$
declare 
  b boolean;
begin
  execute 'select exists(select * from '||
          p_table||' where '||p_column||' is null)' into b;
  return b;
end;$$;
mylktea
  • 41
  • 1
  • 2

1 Answers1

2

Assuming "empty" and "blank values" means empty string ('').

This function checks whether the passed table has any NULL or empty values ('') in the passed column (which must be a string type or some other type where the empty string is valid; not the case for numeric types for instance):

CREATE FUNCTION f_has_missing(_tbl regclass, _col text, OUT has_missing boolean)
  LANGUAGE plpgsql AS
$func$
BEGIN
   EXECUTE
   format($$SELECT EXISTS (SELECT FROM %s WHERE %2$I = '' OR %2$I IS NULL)$$, _tbl, _col)
   INTO has_missing;
END
$func$;

Call:

SELECT f_has_missing('tablename', 'column')

Or, optionally, schema-qualified:

SELECT f_has_missing('schema.tablename', 'column')

db<>fiddle here

Probably most important: Never concatenate parameter values into SQL code blindly. That's begging for SQL injection. I sanitized the code with format(). See:

There are a couple of smart expressions to check for both:

(string_col = '') IS NOT FALSE
(string_col <> '') IS NOT TRUE
coalesce(string_col, '') = ''

See:

But I chose this plain and more verbose expression for two reason:

string_col = '' OR string_col IS NULL

I have grown fond of simple, obvious code, and none of the above is as clear as this. But more importantly, this expression can use an index on (string_col), while the above cannot - which makes a big difference for big tables. See the added demo in the fiddle!

db<>fiddle here

Obviously, we want a different name for the function than "has_nulls", now.

I use an OUT parameter for convenience and short code. Now we can assign to it and be done.

And I use regclass as IN parameter for the table name. This way I can provide a schema explicitly or not. Again, see:

Aside, one might loop through all columns of a table or a whole db to find any such column at once. Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633