Identify if the sequence of visits to an area id was interrupted at any point

Question

I have a PostgreSQL table df with the following information:

area_id	trip_nr	timestamp
a	3	29/06/2022 17:18:03
a	2	29/06/2022 17:18:02
a	1	29/06/2022 17:18:01
b	1	28/06/2022 19:21:03
c	3	28/06/2022 19:21:02
e	3	28/06/2022 19:21:01
e	2	27/06/2022 17:36:03
c	2	27/06/2022 17:36:02
c	1	27/06/2022 17:36:01
e	1	27/06/2022 16:55:03

I would like to add a column continuous that checks if the trip_nr to a particular area_id follows chronologically from the first visit based on the order of timestamp.

In other words, continuous=True if an area_id is visited for the nth time and the previous visits were to the same area_id all the way from n, n-1,...,1.

And, continuous=False if an area_id is visited for the n’th time, but a different area_id was visited somewhere in between the visits n, n-1,...,1 (i.e. the sequence of trip_nr’s is broken at some point).

An example of the desired output:

area_id	trip_nr	timestamp	continous
a	3	29/06/2022 17:18:03	True
a	2	29/06/2022 17:18:02	True
a	1	29/06/2022 17:18:01	True
b	1	28/06/2022 19:21:03	True
c	3	28/06/2022 19:21:02	False
e	3	28/06/2022 19:21:01	False
e	2	27/06/2022 17:36:03	False
c	2	27/06/2022 17:36:02	True
c	1	27/06/2022 17:36:01	True
e	1	27/06/2022 16:55:03	True

How can I achieve this optimally? Clickhouse/SQL answers also welcome.

Note that the timestamp is not necessarily stored in order.

score 2 · Accepted Answer · answered Mar 03 '24 at 23:51

Assuming all involved columns NOT NULL, and no duplicate timestamps per area_id.

Undefined aspect: what if the same area_id is visited again (after a gap) and it starts over with trip_nr = 1. Still "continuous"?

Only the first trip per area is "continuous"

SELECT area_id, trip_nr, timestamp
     , row_number() OVER (PARTITION BY area_id, grp ORDER BY timestamp) = trip_nr
       AND grp = min(grp) OVER (PARTITION BY area_id) AS continuous
FROM  (
   SELECT *
        , row_number() OVER (ORDER BY timestamp)
        - row_number() OVER (PARTITION BY area_id ORDER BY timestamp) AS grp
   FROM   df
   ) sub
ORDER  BY timestamp DESC;

Multiple "continuous" trips per area

SELECT area_id, trip_nr, timestamp
     , row_number() OVER (PARTITION BY area_id, grp ORDER BY timestamp) = trip_nr AS continuous
FROM  (
   SELECT *
        , row_number() OVER (ORDER BY timestamp)
        - row_number() OVER (PARTITION BY area_id ORDER BY timestamp) AS grp
   FROM   df
   ) sub
ORDER  BY timestamp DESC;

fiddle

How?

Subtract the row number per partition of area_id from the total row number, then continuous rows produce the same group (grp) per area_id. Each gap starts a new group.

Then check whether the row number within each of those groups matches the trip_nr.
(And whether it's the first group per area_id?)

Select longest continuous sequence

Identify if the sequence of visits to an area id was interrupted at any point

1 Answers1

Only the first trip per area is "continuous"

Multiple "continuous" trips per area

How?