In the following example I have a table foo from which I'd like to pick out at random a row per group.
CREATE TABLE foo (
line INT
);
INSERT INTO foo (line)
SELECT generate_series(0, 999, 1);
Let's say that I'd like to group by line % 10. I could do this with:
SELECT DISTINCT ON (bin) bin, line
FROM (
SELECT line, line % 10 AS bin, random() x
FROM foo
ORDER BY x
) X
What I'd like to do is get random picks from each bin several times. I had thought I'd be able to do this with generate_series() and LATERAL
SELECT i, line, bin
FROM
(
SELECT generate_series(1,3) i
) m,
LATERAL
(SELECT DISTINCT ON (bin) bin, line
FROM (
SELECT line, line % 10 bin, random() x
FROM foo
ORDER BY x
) X
ORDER BY bin) Q
ORDER BY bin, i;
However, when I do this in PostgreSQL 9.5 I find I get the same line for a given bin for each iteration i, e.g.,
i;line;bin
1;530;0
2;530;0
3;530;0
1;611;1
2;611;1
3;611;1
...
I'm confused, as I thought the subquery containing the random() would be run differently for each line from the generate_series().
EDIT: I realised that I can achieve the same objective by generating more combinations and choosing from these with
SELECT DISTINCT ON (bin, round) round, bin, line
FROM (
SELECT line, line % 10 as bin, round
FROM foo, generate_series(1,3) round
ORDER BY bin, random()
) X;
So my question is simply why didn't the first way work?
EDIT: The problem appears to be that LATERAL only acts like a for-loop if the subqueries are correlated in some way (thanks to @ypercube's comment). Hence my original approach can be fixed by adding the following small change
SELECT i, line, bin
FROM
(
SELECT generate_series(1,3) i
) m,
LATERAL
(
SELECT DISTINCT ON (bin) bin, line
FROM (
SELECT line, line % 10 bin, m.i, random() x -- <NOTE m.i HERE
FROM foo
ORDER BY x
) X
ORDER BY bin
LIMIT 3
) Q
ORDER BY bin, i;