1

I have a PostgreSQL table df with the following information:

area_id trip_nr timestamp
a 3 29/06/2022 17:18:03
a 2 29/06/2022 17:18:02
a 1 29/06/2022 17:18:01
b 1 28/06/2022 19:21:03
c 3 28/06/2022 19:21:02
e 3 28/06/2022 19:21:01
e 2 27/06/2022 17:36:03
c 2 27/06/2022 17:36:02
c 1 27/06/2022 17:36:01
e 1 27/06/2022 16:55:03

I would like to add a column continuous that checks if the trip_nr to a particular area_id follows chronologically from the first visit based on the order of timestamp.

In other words, continuous=True if an area_id is visited for the nth time and the previous visits were to the same area_id all the way from n, n-1,...,1.

And, continuous=False if an area_id is visited for the n’th time, but a different area_id was visited somewhere in between the visits n, n-1,...,1 (i.e. the sequence of trip_nr’s is broken at some point).

An example of the desired output:

area_id trip_nr timestamp continous
a 3 29/06/2022 17:18:03 True
a 2 29/06/2022 17:18:02 True
a 1 29/06/2022 17:18:01 True
b 1 28/06/2022 19:21:03 True
c 3 28/06/2022 19:21:02 False
e 3 28/06/2022 19:21:01 False
e 2 27/06/2022 17:36:03 False
c 2 27/06/2022 17:36:02 True
c 1 27/06/2022 17:36:01 True
e 1 27/06/2022 16:55:03 True

How can I achieve this optimally? Clickhouse/SQL answers also welcome.

Note that the timestamp is not necessarily stored in order.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Ruan
  • 35
  • 4

1 Answers1

2

Assuming all involved columns NOT NULL, and no duplicate timestamps per area_id.

Undefined aspect: what if the same area_id is visited again (after a gap) and it starts over with trip_nr = 1. Still "continuous"?

Only the first trip per area is "continuous"

SELECT area_id, trip_nr, timestamp
     , row_number() OVER (PARTITION BY area_id, grp ORDER BY timestamp) = trip_nr
       AND grp = min(grp) OVER (PARTITION BY area_id) AS continuous
FROM  (
   SELECT *
        , row_number() OVER (ORDER BY timestamp)
        - row_number() OVER (PARTITION BY area_id ORDER BY timestamp) AS grp
   FROM   df
   ) sub
ORDER  BY timestamp DESC;

Multiple "continuous" trips per area

SELECT area_id, trip_nr, timestamp
     , row_number() OVER (PARTITION BY area_id, grp ORDER BY timestamp) = trip_nr AS continuous
FROM  (
   SELECT *
        , row_number() OVER (ORDER BY timestamp)
        - row_number() OVER (PARTITION BY area_id ORDER BY timestamp) AS grp
   FROM   df
   ) sub
ORDER  BY timestamp DESC;

fiddle

How?

Subtract the row number per partition of area_id from the total row number, then continuous rows produce the same group (grp) per area_id. Each gap starts a new group.

Then check whether the row number within each of those groups matches the trip_nr.
(And whether it's the first group per area_id?)

Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633