3

Whenever I need to return a random record from my tables and performance matters, instead of:

SELECT column FROM table ORDER BY random() LIMIT 1;

I always do:

SELECT column FROM table TABLESAMPLE BERNOULLI(1) LIMIT 1;

This is much faster but it seems like it's not very random? It looks like a lot of the same records get returned when using this method repeatedly. Is it just me or is this method much less random (and less useful because of that)?

Peter Vandivier
  • 5,485
  • 1
  • 25
  • 49

2 Answers2

7

The probability of a row to be returned from TABLESAMPLE BERNOULLI(1) is 1/100, that is, 0.01. The presence of LIMIT 1 tells Postgres to stop reading the table after the first match. This means that on average one of the first 100 rows (in whichever order they are scanned by the engine) in the table will be retrieved.

The probability of any particular row to be returned after ORDER BY random() LIMIT 1 is, on the other hand, 1/C, where C is the table cardinality. The entire table will be read and randomly ordered, then one row returned. This means that if your table has more than 100 rows, the ORDER BY random() will return a row from a wider selection (or, as you put it, will be "more random").

Some background info here.

mustaccio
  • 28,207
  • 24
  • 60
  • 76
2

TABLESAMPLE BERNOULLI(1) returns a random sampling of the rows, but does not return that sample in a random order. It is your imposition of the LIMIT 1 on top of the table sample which biases the end result.

You can get greater speed without biasing the result by combining those options:

TABLESAMPLE BERNOULLI(1) order by random() limit 1;

You do have to make sure that the percentage of the table sampled is high enough that it will "practically never" return zero rows.

jjanes
  • 42,332
  • 3
  • 44
  • 54