I have a procedure that loops through a query that returns a set of dates, recreates a set of views using the date in the WHERE clause, and the calls a series of other procedures that use these views. So, if the loop contains three dates, it runs the procedures with each date in turn coded into the WHERE clause of the views. The tables that the views select are partitioned on the date in the WHERE clause, the views exist in order to eliminate all the unnecessary partitions.
On the first iteration around the loop, everything is fine. The views get created, the procedures get called, and the data is processed.
On the second and subsequent iterations, the procedures behave as if the views still had the date from the first iteration. After the loop finishes, the views have the last of the dates in their WHERE clauses, but the procedures had clearly just been processing the first date over and over.
I have created a trivial test that just creates a view, selects the count from it, creates it again with a different date and selects the count from that, and I get the two correct counts, so redefining and using a view within a procedure clearly is possible.
Any ideas why it doesn't work in a loop?
Here is a demonstration script, the first run should process 2 records and the second should process 4, but both runs process 2:
create schema dv;
create schema rpt;
drop table dv.batch_data;
create table dv.batch_data
(
std_date date
, dv_load_dt timestamp with time zone
)
WITH (APPENDONLY=false, OIDS=FALSE)
DISTRIBUTED BY (std_date)
;
insert into dv.batch_data values( '2016-01-02'::date, current_timestamp );
insert into dv.batch_data values( '2016-01-04'::date, current_timestamp );
drop table dv.table_1;
create table dv.table_1
(
std_date date
, col_1 text
, col_2 text
)
WITH (APPENDONLY=false, OIDS=FALSE)
DISTRIBUTED BY (std_date)
;
insert into dv.table_1 values( '2016-01-01'::date, 'First', 'Record' );
insert into dv.table_1 values( '2016-01-02'::date, 'Second', 'Record' );
insert into dv.table_1 values( '2016-01-02'::date, 'Third', 'Record' );
insert into dv.table_1 values( '2016-01-03'::date, 'Fourth', 'Record' );
insert into dv.table_1 values( '2016-01-03'::date, 'Fifth', 'Record' );
insert into dv.table_1 values( '2016-01-03'::date, 'Sixth', 'Record' );
insert into dv.table_1 values( '2016-01-04'::date, 'Seventh', 'Record' );
insert into dv.table_1 values( '2016-01-04'::date, 'Eighth', 'Record' );
insert into dv.table_1 values( '2016-01-04'::date, 'Ninth', 'Record' );
insert into dv.table_1 values( '2016-01-04'::date, 'Tenth', 'Record' );
create table rpt.stage_table_a
(
std_date date
, col text
)
WITH (APPENDONLY=false, OIDS=FALSE)
DISTRIBUTED BY (std_date)
;
CREATE OR REPLACE FUNCTION rpt.custom_proc_rpt_loop()
RETURNS integer AS
$BODY$
DECLARE
v_count integer;
rec RECORD;
v_days integer;
v_query varchar(10000);
v_return_msg varchar(256);
BEGIN
v_count := 0;
v_days := 0;
BEGIN
FOR rec IN
SELECT DISTINCT std_date
FROM dv.batch_data
ORDER BY std_date
LOOP
v_days := v_days + 1;
v_query := 'CREATE OR REPLACE VIEW rpt.view_batch_table_1 AS ' ||
'SELECT std_date ' ||
', col_1 ' ||
', col_2 ' ||
'FROM dv.table_1 ' ||
'WHERE std_date = ''' || rec.std_date || '''';
RAISE NOTICE '%',v_query;
EXECUTE v_query;
v_count := v_count + 1;
PERFORM rpt.update_stage_table_a();
ANALYZE rpt.stage_table_a;
END LOOP;
END;
v_return_msg := 'custom_proc_rpt_loop completed. '
|| CAST(v_days AS VARCHAR(64)) || ' loops performed. ';
RAISE NOTICE '%',v_return_msg;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE OR REPLACE FUNCTION rpt.update_stage_table_a()
RETURNS integer AS
$BODY$
DECLARE
v_count integer;
v_return_msg varchar(256);
BEGIN
TRUNCATE rpt.stage_table_a;
INSERT INTO rpt.stage_table_a
(
std_date
, col
)
SELECT DISTINCT
std_date
, COALESCE(col_1,'') || CASE WHEN length(col_1) > 1 AND length(col_2) > 1 THEN ' ' ELSE '' || COALESCE(col_1,'') END
FROM rpt.view_batch_table_1
;
GET DIAGNOSTICS v_count = row_count;
v_return_msg := 'stage_table_a completed. '
|| CAST(v_count AS VARCHAR(64)) || ' records processed. ';
RAISE NOTICE '%',v_return_msg;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
select rpt.custom_proc_rpt_loop();
select count(*) from rpt.view_batch_table_1;
In the real world, the inner procedures are generated by a data warehouse automation tool, so I don't want to change them. I can do whatever I want in the calling procedure that contains the loop.