I have a PostgreSQL table df with the following information:
| area_id | trip_nr | timestamp |
|---|---|---|
| a | 3 | 29/06/2022 17:18:03 |
| a | 2 | 29/06/2022 17:18:02 |
| a | 1 | 29/06/2022 17:18:01 |
| b | 1 | 28/06/2022 19:21:03 |
| c | 3 | 28/06/2022 19:21:02 |
| e | 3 | 28/06/2022 19:21:01 |
| e | 2 | 27/06/2022 17:36:03 |
| c | 2 | 27/06/2022 17:36:02 |
| c | 1 | 27/06/2022 17:36:01 |
| e | 1 | 27/06/2022 16:55:03 |
I would like to add a column continuous that checks if the trip_nr to a particular area_id follows chronologically from the first visit based on the order of timestamp.
In other words, continuous=True if an area_id is visited for the nth time and the previous visits were to the same area_id all the way from n, n-1,...,1.
And, continuous=False if an area_id is visited for the n’th time, but a different area_id was visited somewhere in between the visits n, n-1,...,1 (i.e. the sequence of trip_nr’s is broken at some point).
An example of the desired output:
| area_id | trip_nr | timestamp | continous |
|---|---|---|---|
| a | 3 | 29/06/2022 17:18:03 | True |
| a | 2 | 29/06/2022 17:18:02 | True |
| a | 1 | 29/06/2022 17:18:01 | True |
| b | 1 | 28/06/2022 19:21:03 | True |
| c | 3 | 28/06/2022 19:21:02 | False |
| e | 3 | 28/06/2022 19:21:01 | False |
| e | 2 | 27/06/2022 17:36:03 | False |
| c | 2 | 27/06/2022 17:36:02 | True |
| c | 1 | 27/06/2022 17:36:01 | True |
| e | 1 | 27/06/2022 16:55:03 | True |
How can I achieve this optimally? Clickhouse/SQL answers also welcome.
Note that the timestamp is not necessarily stored in order.