22

I have a InnoDB table 'idtimes' (MySQL 5.0.22-log) with columns

`id` int(11) NOT NULL,
`time` int(20) NOT NULL, [...]

with a compound unique key

UNIQUE KEY `id_time` (`id`,`time`)

so there can be multiple timestamps per id and multiple ids per timestamp.

I'm trying to set up a query where I get all entries plus the next greater time for each entry, if it exists, so it should return e.g.:

+-----+------------+------------+
| id  | time       | nexttime   |
+-----+------------+------------+
| 155 | 1300000000 | 1311111111 |
| 155 | 1311111111 | 1322222222 |
| 155 | 1322222222 |       NULL |
| 156 | 1312345678 | 1318765432 |
| 156 | 1318765432 |       NULL |
+-----+------------+------------+

Right now I am so far:

SELECT l.id, l.time, r.time FROM 
    idtimes AS l LEFT JOIN idtimes AS r ON l.id = r.id
    WHERE l.time < r.time ORDER BY l.id ASC, l.time ASC;

but of course this returns all rows with r.time > l.time and not only the first one...

I guess I'll need a subselect like

SELECT outer.id, outer.time, 
    (SELECT time FROM idtimes WHERE id = outer.id AND time > outer.time 
        ORDER BY time ASC LIMIT 1)
    FROM idtimes AS outer ORDER BY outer.id ASC, outer.time ASC;

but I don't know how to refer to the current time (I know the above is not valid SQL).

How do I do this with a single query (and I'd prefer not to use @variables that depend on stepping though the table one row at a time and remembering the last value)?

Martin Hennings
  • 357
  • 1
  • 3
  • 9

5 Answers5

22

Doing a JOIN is one thing you might need.

SELECT l.id, l.time, r.time FROM 
    idtimes AS l LEFT JOIN idtimes AS r ON l.id = r.id

I suppose the outer join is deliberate, and you want to be getting nulls. More on that later.

WHERE l.time < r.time ORDER BY l.id ASC, l.time ASC;

You only want the r. row that has the lowest (MIN) time that is higher than the l.time. That is the place where you need subquerying.

WHERE r.time = (SELECT MIN(time) FROM idtimes r2 where r2.id = l.id AND r2.time > l.time)

Now to the nulls. If "there is no next-higher time", then the SELECT MIN() will evaluate to null (or worse), and that itself never compares equal to anything, so your WHERE clause will never be satisfied, and the "highest time" for each ID, could never appear in the result set.

You solve it by eliminating your JOIN, and moving the scalar subquery into the SELECT list :

SELECT id, time, 
    (SELECT MIN(time) FROM idtimes sub 
        WHERE sub.id = main.id AND sub.time > main.time) as nxttime
  FROM idtimes AS main 
Erwin Smout
  • 1,746
  • 10
  • 12
7

I always avoid to use subqueries either in SELECT block or in the FROM block, because it makes the code "dirtier" and sometimes less efficient.

I think a more elegant way to do it is to:

1. Find the times greater than the time of the row

You can do this with a JOIN between idtimes table with itself, constraining the join to the same id and to times greater than the time of current row.

You should use LEFT JOIN to avoid excluding rows where there are no times greater than the one of the current row.

SELECT
    i1.id,
    i1.time AS time,
    i2.time AS greater_time
FROM
    idtimes AS i1
    LEFT JOIN idtimes AS i2 ON i1.id = i2.id AND i2.time > i1.time

The problem, as you mentioned, is that you have multiple rows where next_time is greater than time.

+-----+------------+--------------+
| id  | time       | greater_time |
+-----+------------+--------------+
| 155 | 1300000000 | 1311111111   |
| 155 | 1300000000 | 1322222222   |
| 155 | 1311111111 | 1322222222   |
| 155 | 1322222222 |       NULL   |
| 156 | 1312345678 | 1318765432   |
| 156 | 1318765432 |       NULL   |
+-----+------------+--------------+

2. Find the rows where greater_time is not only greater but next_time

The best way to filter all of these useless rows is to find out if there are times between time (greater than) and greater_time (lesser than) for this id.

SELECT
    i1.id,
    i1.time AS time,
    i2.time AS next_time,
    i3.time AS intrudor_time
FROM
    idtimes AS i1
    LEFT JOIN idtimes AS i2 ON i1.id = i2.id AND i2.time > i1.time
    LEFT JOIN idtimes AS i3 ON i2.id = i3.id AND i3.time > i1.time AND i3.time < i2.time

ops, we still have a false next_time!

+-----+------------+--------------+---------------+
| id  | time       | next_time    | intrudor_time |
+-----+------------+--------------+---------------+
| 155 | 1300000000 | 1311111111   |         NULL  |
| 155 | 1300000000 | 1322222222   |    1311111111 |
| 155 | 1311111111 | 1322222222   |         NULL  |
| 155 | 1322222222 |       NULL   |         NULL  |
| 156 | 1312345678 | 1318765432   |         NULL  |
| 156 | 1318765432 |       NULL   |         NULL  |
+-----+------------+--------------+---------------+

Just filter the rows where this event happens, adding the WHERE constraint below

WHERE
    i3.time IS NULL

Voilà, we have what we need!

+-----+------------+--------------+---------------+
| id  | time       | next_time    | intrudor_time |
+-----+------------+--------------+---------------+
| 155 | 1300000000 | 1311111111   |         NULL  |
| 155 | 1311111111 | 1322222222   |         NULL  |
| 155 | 1322222222 |       NULL   |         NULL  |
| 156 | 1312345678 | 1318765432   |         NULL  |
| 156 | 1318765432 |       NULL   |         NULL  |
+-----+------------+--------------+---------------+

I hope that you still need an answer after 4 years!

luisfsns
  • 71
  • 1
  • 2
2

You can also get what you want from a min() and GROUP BY with no inner select:

SELECT l.id, l.time, min(r.time) 
FROM idtimes l 
LEFT JOIN idtimes r on (r.id = l.id and r.time > l.time)
GROUP BY l.id, l.time;

I would almost bet a large sum of money that the optimiser turns this into the same thing as Erwin Smout's answer anyway, and it's debatable whether it's any clearer, but there it is for completeness...

2

Before presenting the solution, I should note it is not pretty. It would be much easier if you had some AUTO_INCREMENT column on your table (do you?)

SELECT 
  l.id, l.time, 
  SUBSTRING_INDEX(GROUP_CONCAT(r.time ORDER BY r.time), ',', 1)
FROM 
  idtimes AS l 
  LEFT JOIN idtimes AS r ON (l.id = r.id)
WHERE 
  l.time < r.time
GROUP BY
  l.id, l.time

Explanation:

  • Same join as yours: join two tables, the right one only gets the higher times
  • GROUP BY both columns from left table: this ensures we get all (id, time) combinations (which are also known to be unique).
  • For each (l.id, l.time), get the first r.time which is greater than l.time. This happens with first ordering the r.times via GROUP_CONCAT(r.time ORDER BY r.time), the by slicing first token via SUBSTRING_INDEX.

Good luck, and, don't expect good performance if this table is large.

Shlomi Noach
  • 7,403
  • 1
  • 25
  • 24
0

You should be careful with the complexity of your queries. With JOINs you will usually have a complexity in O(n²).

You can achieve linear complexity by using window functions. lead() can be used to obtain subsequent rows efficiently:

SELECT id, time, lead(time, 1) OVER (PARTITION BY id ORDER BY time)
FROM idtimes
Rohit Gupta
  • 2,116
  • 8
  • 19
  • 25
scand1sk
  • 101
  • 1