14

I have a software that generates transactions and executes them. I would like to asynchronously inspect about the transaction progress in term of the simple proportion of queries executed over total number of queries. I'm able to read both stdout and stderr generated by psql, so I was thinking that a solution could be printing to stdout some custom messages like "Progress: 3/8" (or another custom text string).

I've thought about executing a "logging query" which stores in a proper table the informations about query progress, but this table wouldn't be available until the transaction completes, making it unuseful for me to inspect transaction progress.

At the moment I tried the following (suppose to have 3 queries in the transaction which do some stuff):

BEGIN;
CREATE OR REPLACE FUNCTION progress(curr int, total int) RETURNS float AS $$
BEGIN
  RAISE NOTICE '___PROGRESS___%', curr/total::float;
  RETURN curr/total::float;
END;

-- First query
SELECT ... FROM ...;
-- Log the progress
SELECT * FROM progress(1,3);

-- Second query
SELECT ... FROM ...;
-- Log progress
SELECT * FROM progress(2,3);

-- Third query
SELECT ... FROM ...;
-- Log progress
SELECT * FROM progress(3,3);

COMMIT;

I execute the script from a bash file using the following syntax:

res=$((psql -U username -d db -h localhost --log-file=plog.log -f "the transaction above") 2>&1>clog.log)

What I get from a transaction like the one above using the previous bash command are three outputs: plog.log, clog.log and $res bash variable. I'll post some extract of them to make it clear what's their content.

plog.log

....
********* QUERY **********
SELECT * from progress(1, 3);
**************************

     progress      
-------------------
 0.333333333333333
(1 row)

...
********* QUERY **********
SELECT * from progress(2, 3);
**************************

     progress      
-------------------
 0.6666666666666667
(1 row)

...
********* QUERY **********
SELECT * from progress(3, 3);
**************************

     progress      
-------------------
 1
(1 row)

clog.log

...
     progress      
-------------------
 0.333333333333333
(1 row)

...
     progress      
-------------------
 0.6666666666666667
(1 row)

...
     progress      
-------------------
 1
(1 row)    

$res variabile (psql return value)

psql:/path/to/script.sql:11: NOTICE:  ___PROGRESS___0.333333333333333
psql:/path/to/script.sql:16: NOTICE:  ___PROGRESS___0.666666666666667
psql:/path/to/script.sql:21: NOTICE:  ___PROGRESS___1

The problem is that I can't real-time access the content of $res variable. I can instead access the content of clog.log and plog.log, so the usage of RAISE NOTICE doesn't make sense at all because it doesn't appear in both clog.log and plog.log but only in $res.

Should I stick to parse clog.log or plog.log in order to find the SELECT output value from the progress() function (removing the RAISE NOTICE statement in that function because of it's uselessness) or there's an alternative method? Am I missing something?

Thank you

pietrop
  • 756
  • 1
  • 7
  • 16

2 Answers2

7

RAISE NOTICE is the right method.

The reason RAISE NOTICE output does not appear in clog.log is that your standard error redirection is wrong at the shell level.

You want:

psql --log-file=plog.log -f file.sql  >clog.log 2>&1

That way it redirects the standard output into clog.log and then the standard error to that standard output.

The way it's done in the question: 2>&1>clog.log does not achieve that because of the wrong order, as described in the bash manual:

Note that the order of redirections is significant. For example, the command

ls > dirlist 2>&1

directs both standard output (file descriptor 1) and standard error (file descriptor 2) to the file dirlist, while the command

ls 2>&1 > dirlist

directs only the standard output to file dirlist, because the standard error was made a copy of the standard output before the standard output was redirected to dirlist.

Daniel Vérité
  • 32,662
  • 3
  • 78
  • 84
0

In our system we use dblink's and write to the same DB. This is an alternative approach for Oracle Autonomous Transaction.

Here I use the concept of DB link to achieve the asynchronous commit for logging. This is as same as the concept of Autonomous Transaction in Oracle. Commits done inside the db link will not cause a commit in the main transaction.

Actual data is written to log table via command "PERFORM PUBLIC.dblink_exec" and is committed immediately.

You can track the progress of your main procedure by querying the log table from a different session when the main procedure is still under execution, no matter if it errors out after hours of execution, log data remains committed.

#1: Confirm dblink extension

#2: Create DDLs
CREATE TABLE dblink_log (ID serial, message TEXT);--Table to store the logs

GRANT ALL ON TABLE core.dblink_log TO PUBLIC;--Permission to write as a dblink

GRANT ALL ON TABLE core.dblink_log_id_seq TO PUBLIC;

#3
/*Test block to write to a table and commit while 
  the main transaction is still in progress*/
DO$$ 
DECLARE
        open_connections TEXT [] := PUBLIC .dblink_get_connections () ; 
        --Cannot connect more than once before closing
        my_dblink_name TEXT := 'logging_dblink' ;
        --Define a name for your db link connection
    BEGIN

    --If the connection doest not exists, create one
    IF open_connections IS NULL 
       OR NOT open_connections @> ARRAY [ my_dblink_name ] THEN
        PERFORM PUBLIC .dblink_connect (
            my_dblink_name,
            'host=127.0.0.1  port=5432 dbname=dbname user=user password=password'
        ) ; 
        raise notice 'New db link connection made' ;
    ELSE
        raise notice 'Db link connection exists, re-using' ;
    END IF ; 

    FOR i IN 1..10 loop 

        PERFORM PUBLIC.dblink_exec (
                my_dblink_name,
                'BEGIN;
                 INSERT INTO core.dblink_log (message) 
                 VALUES (''Executing loop #' || i || ''') ;
                 COMMIT;'
            ) ; 

        perform pg_sleep(10) ;-- Execute select * from core.dblink_log
                              -- in a different session
    END loop ;
END $$;

#4: From a different session verify record core.dblink_log is populated with data
András Váczi
  • 31,778
  • 13
  • 102
  • 151
Aneesh Mon N
  • 363
  • 2
  • 13