Join 2 tables by closest time, PostgreSQL 9.6

Question

I have 2 tables: tbl1, tbl2.

CREATE TABLE tbl1(time_1)
AS VALUES
  ( '2017-09-06 15:26:03'::timestamp ),
  ( '2017-09-06 15:26:02' ),
  ( '2017-09-06 15:28:01' ),
  ( '2017-09-06 15:40:00' );

CREATE TABLE tbl2(time_2)
AS VALUES
  ( '2017-09-06 15:29:01'::timestamp ),
  ( '2017-09-06 15:40:00' ),
  ( '2017-09-06 15:23:59' ),
  ( '2017-09-06 15:45:58' );

I want to join the table, thus for every row in tbl1 match closest time in tbl2. The output is :

     time_1                     time_2
---------------------      --------------------
 2017-09-06 15:26:03      2017-09-06 15:23:59      
 2017-09-06 15:26:02      2017-09-06 15:23:59 
 2017-09-06 15:28:01      2017-09-06 15:29:01
 2017-09-06 15:40:00      2017-09-06 15:45:58

I know how to find single value for closest time:

SELECT * from tbl1 where time_1=INPUT_TIME ORDER BY case when time_1 > INPUT_TIME then time_1 - INPUT_TIME else INPUT_TIME - time_1 end limit 1;

Evan Carroll · Answer 1 · 2017-09-08T05:10:03.957

Using `btree_gist` and `<->`

You really want <-> which will actually work on GiST indexes through the internal tstz_dist. This works because you don't really care about the interval. First you need to add the extension,

CREATE EXTENSION btree_gist;

then

SELECT DISTINCT ON (time_1) time_1, time_2
FROM tbl1
CROSS JOIN tbl2
ORDER BY time_1, time_1 <-> time_2;

If there are too many rows to do a cross-join, you may have better luck with the lateral solution.

SELECT time_1, time_2
FROM tbl1
CROSS JOIN LATERAL (
  SELECT time_2
  FROM tbl2
  ORDER BY time_1<->time_2
  LIMIT 1
) AS tbl2;

Would be interested in seeing a report back on the benchmarks. =) Especially if you have a GIST index on time_1 and time_2.

score 2 · Answer 2 · answered Sep 06 '17 at 13:01

Here's one attempt using a lateral join:

select time_1, time_2
from tbl1 
cross join lateral (
    select time_2 
    from tbl2 
    order by case when time_1 > time_2 then age(time_1,time_2) 
                                       else age(time_2,time_1) 
             end 
    fetch first 1 rows only
) as t;

Another possibility is to use a sub-query:

select time_1, (
    select time_2 
    from tbl2 
    order by case when time_1 > time_2 then age(time_1,time_2) 
                                       else age(time_2,time_1) 
             end 
    fetch first 1 rows only
)
from tbl1;

There is probably a more elegant way to determine the absolute value of the timestamp difference. Worth noting is that the result will be different if the order of the tables is changed.

markp-fuso · Answer 3 · 2017-09-07T23:14:32.127

select dt1.time_1, 
       dt1.time_2 
from
(select row_number() over(partition by time_1 order by abs(extract('epoch' from time_1) - extract('epoch' from time_2)) asc) as rn,
        time_1,
        time_2
from    tbl1
cross
join    tbl2
) dt1
where dt1.rn = 1
order by dt1.time_1

within the derived table dt1 ...
perform a cross join (aka cartesian product) across the 2 tables ...
using the absolute value of the differences in epoch times to order row numbering for a group of rows with the same time_1 value; then ...
in the outer query limit our output to the first row (dt1.rn = 1) for each time_1 value

Results:

time_1              | time_2             
------------------- | -------------------
2017-09-06 15:26:02 | 2017-09-06 15:23:59
2017-09-06 15:26:03 | 2017-09-06 15:23:59
2017-09-06 15:28:01 | 2017-09-06 15:29:01
2017-09-06 15:40:00 | 2017-09-06 15:40:00

NOTE: For the last time_1 = 2017-09-06 15:40:00 we get a different time_2 match than suggested in the question, which I'm assuming is incorrect:

-- suggested in the question:

     time_1                     time_2
---------------------      --------------------
 2017-09-06 15:40:00      2017-09-06 15:45:58

Here's a dbfiddle

Join 2 tables by closest time, PostgreSQL 9.6

3 Answers3

Using `btree_gist` and `<->`

Linked

Join 2 tables by closest time, PostgreSQL 9.6

3 Answers3

Using btree_gist and <->

Linked

Using `btree_gist` and `<->`