1

I want to select rows within each group following a condition in time. I have the following table trajectory_data in PostgreSQL 10 with daily points for each group.

| traj_id | timestamp  | height | lat   | Lon    |
|---------|------------|--------|-------|--------|
| 1       | 2006-01-01 | 10     | 34.55 | -70.80 |
| 1       | 2006-01-02 | 9      | 34.56 | -70.88 |
| 1       | 2006-01-03 | 0      | 34.56 | -70.89 |
| 1       | 2006-01-04 | 5      | 34.57 | -70.90 |
| 1       | 2006-01-05 | 20     | 34.60 | -70.94 |
| 2       | 2006-01-01 | 10     | 34.55 | -70.80 |
| 2       | 2006-01-02 | 5      | 34.55 | -70.82 |
| 2       | 2006-01-03 | 3      | 34.54 | -70.89 |
| 2       | 2006-01-04 | 0      | 34.53 | -70.90 |
| 2       | 2006-01-05 | 10     | 34.51 | -70.95 |
| 3       | 2006-01-01 | 10     | 34.55 | -70.80 |
| 3       | 2006-01-01 | 9      | 34.56 | -70.75 |
| 3       | 2006-01-01 | 11     | 34.54 | -70.90 |
| 3       | 2006-01-01 | 15     | 34.60 | -70.95 |
| 3       | 2006-01-01 | 30     | 34.67 | -70.90 |

I want to filter the data to get the rows within each group which height is greater than zero. But, is not as simple as just removing the zeros with a WHERE statement. I want to keep the all the rows before the group reach zero for the first time:

| traj_id | timestamp  | height | lat   | Lon    |
|---------|------------|--------|-------|--------|
| 1       | 2006-01-01 | 10     | 34.55 | -70.80 |
| 1       | 2006-01-02 | 9      | 34.56 | -70.88 |
| 2       | 2006-01-01 | 10     | 34.55 | -70.80 |
| 2       | 2006-01-02 | 5      | 34.55 | -70.82 |
| 2       | 2006-01-03 | 3      | 34.54 | -70.89 |
| 3       | 2006-01-01 | 10     | 34.55 | -70.80 |
| 3       | 2006-01-01 | 9      | 34.56 | -70.75 |
| 3       | 2006-01-01 | 11     | 34.54 | -70.90 |
| 3       | 2006-01-01 | 15     | 34.60 | -70.95 |
| 3       | 2006-01-01 | 30     | 34.67 | -70.90 |

In this example, groups 1 and 2 are filtered and the rows after zero are removed.

I know that I can run something like:

SELECT *
FROM trajectory_data
GROUP BY traj_id
HAVING height > 0

But, this query will keep rows that I do not necessarily want (i.e. rows where the group already reached zero), and I have no intentions of aggregating the data, just select rows. There is another way of tackling this?

topcat
  • 177
  • 4
  • 10

1 Answers1

1

A single window function in a subquery should do the trick:

SELECT *
FROM  (
   SELECT *, count(*) FILTER (WHERE height = 0)
                      OVER (PARTITION BY traj_id ORDER BY timestamp) AS ct0
   FROM   trajectory_data
   ) sub
WHERE  ct0 = 0;

This counts occurrences of 0 along the timeline per traj_id, and only keeps rows before the first one pops up. Voilá.

db<>fiddle here

Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633