1

Using PostgreSQL 9.3.4 on Mac OSX 10.9.2 with Postgres.app.

I want to apply a function to all columns in a table, specifically I want to trim white space. So far I've got this in a file test.sql: (ed. fixed code after answer below)

do $$
declare
target text;
begin
    for target in
        select quote_ident(column_name)
        from information_schema.columns
        where table_name = 'cds' and data_type = 'character varying'
    loop
        RAISE NOTICE 'Calling trim(%)', target;
        execute 'update cds set '
            || quote_ident(target)
            || ' = trim('
            || quote_ident(target)
            || ')';
    end loop;
end
$$;

The select gives my a nice list of all the column names in my table and I was hoping to for-loop through all of those:

gakera=# select column_name from information_schema.columns
         where table_name = 'cds' and data_type = 'character varying';

 column_name 
-------------
 userID
 junk1
 dates
 times
 junk2
 junk3
 servID
 junk4
 junk5

Sidetrack

Not the question, but just extra confusing.

The fun starts at \i ~/path/test.sql with this error:

psql:/path/test.sql:18: NOTICE:  Calling trim((anum))
psql:/path/test.sql:18: ERROR:  function quote_ident(record) does not exist
LINE 2:    || quote_ident(target)
              ^
HINT:  No function matches the given name and argument types.
       You might need to add explicit type casts.

What's going on with quote_ident? I'm gonna try a sanity check, I copied this from http://www.postgresql.org/docs/9.3/static/functions-string.html

gakera=# select quote_ident('test');
 quote_ident 
-------------
 test

Ok, insanity check then:

gakera=# select quote_indeed('test');
ERROR:  function quote_indeed(unknown) does not exist
LINE 1: select quote_indeed('test');
               ^
HINT:  No function matches the given name and argument types.
       You might need to add explicit type casts.

Ugh, whatever, I'll just remove the quote_identfunction from the offending lines (it didn't complain about the select quote_ident part so I keep that. Now I have:

    execute 'update cds set '
        || target
        || ' = trim('
        || target
        || ')';

Question

Here is where things start to get interesting. when I run this like before I get:

psql:/path/test.sql:18: NOTICE:  Calling trim((anum))
psql:/path/test.sql:18: ERROR:  syntax error at or near "trim"
LINE 1: update cds set (anum) = trim((anum))
                                 ^
QUERY:  update cds set (anum) = trim((anum))
CONTEXT:  PL/pgSQL function inline_code_block line 11 at EXECUTE statement

Why is it adding () around anum? How can I get rid of them? The EXECUTE statement should read update cds set anum = trim(anum)

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
gakera
  • 175
  • 1
  • 7

2 Answers2

3

@klin already addressed the bug you had been asking about. I am going to address your general approach.

Your DO statement would run a separate UPDATE for every column - which is almost n times as expensive as updating n columns at once.
Due to PostgreSQL's MVCC model this also produces n dead tuples in the table, leading to excessive table bloat and a lot of work for VACUUM.

What's worse, it updates every row n times even if nothing changes at all.

I suggest a much more efficient approach:

  • Update every row once.
  • Only update if anything actually changes.

CREATE OR REPLACE FUNCTION f_trim_cols(IN _tbl regclass, OUT row_ct integer
                                                       , OUT col_ct integer)
  RETURNS record AS
$func$
DECLARE
   _sql text;
BEGIN
   SELECT format('
      UPDATE %s SET (%2$s) = (trim(%3$s))
      WHERE (%2$s) IS DISTINCT FROM (trim(%3$s))' -- test if anything changes
                , _tbl
                , string_agg (quote_ident(attname), ', ')
                , string_agg (quote_ident(attname), '), trim(')
                )
        , count(*)::int  -- count columns
   INTO   _sql, col_ct
   FROM   pg_attribute
   WHERE  attrelid = _tbl
   AND    atttypid IN ('text'::regtype, 'varchar'::regtype) -- basic string types
   AND    NOT attisdropped 
   AND    attnum > 0;

   -- RAISE NOTICE '%', _sql;  --  debug
   EXECUTE _sql;  GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$  LANGUAGE plpgsql;

COMMENT ON FUNCTION f_trim_cols(regclass) IS 'Trim all columns of table _tbl ($1).
Return count of applicable columns (col_ct) and rows actually changed (row_ct) .
Call:
   SELECT * FROM f_trim_cols($$city$$)   --';

Produces SQL code like:

 UPDATE city SET (city, country) = (trim(city), trim(country))
 WHERE (city, country) IS DISTINCT FROM (trim(city), trim(country));

I also target text as well as varchar columns.
Everything is sanitized and safe against SQL injection.
More explanation, links and a fiddle in this related answer on SO:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
2

The first listed function is almost correct. Variable target should be of type text. Try this:

do $$
declare
    target text;
begin
    for target in
        select column_name
        from information_schema.columns
        where table_name = 'cds' and data_type = 'character varying'
    loop
        RAISE NOTICE 'Calling trim(%)', target;
        execute 'update cds set '
            || target
            || ' = trim('
            || target
            || ')';
    end loop;
end
$$;
klin
  • 2,194
  • 18
  • 16