Is `count(*)` ever guaranteed to return the same result in a transaction at any transaction level?

Question

If I create a table.

CREATE TABLE foo AS
SELECT CASE WHEN random() > 0.5 THEN x END AS x
FROM generate_series(1,10) AS x;

And, then I run the following in a transaction

BEGIN;
  SELECT count(*)
  FROM foo
  WHERE x IS NOT NULL;

  --time

  SELECT count(*)
  FROM foo
  WHERE x IS NOT NULL;
END;

Under what transaction level is my result guaranteed to remain the same in the transaction?

joanolo · Answer 1 · 2017-06-13T02:55:34.000

My ten cents, and based on the PostgreSQL documentation:

Assumption: your own transaction does not change any relevant value from table foo.

Having the same count(*) in both your queries means:

You cannot have dirty reads, because another transaction might have written more rows into your foo table between the first and the second select; and you should not see them.
You are not really reading any column, so nonrepeatable reads are not an issue.
You cannot have phantom reads, that is, if another transaction changes any row whose x column is NULL to a non-null value, your transaction does not have to notice those changes affecting the WHERE condition.
I don't really know how to judge serialization anomalies. My educated guess is that this isn't required. But this is really very open for debate.

Under these conditions, the table from Transaction Isolation Levels makes it clear that Repeatable read complies with the criteria of:

Not having dirty reads
Not having non-repeatable reads
Not having phantom reads (in PostgreSQL, but not demanded by the standard)

... as such, it would give you the same count(*) in both select statements.

score 1 · Answer 2 · edited Jun 13 '17 at 05:11

guaranteed to remain the same "in the transaction"

That has two meanings, and that's the problem. You have what's happening outside of the transaction and what's happening inside with the "snapshot". Two things can happen to the result of count(*):

The count can change
Transaction level READ COMMITTED (and READ UNCOMMITTED^†).
The count cannot change
Transaction levels REPEATABLE READ and SERIALIZABLE.

Changing and not changing isn't everything you need to know. REPEATABLE READ and SERIALIZABLE work on snapshots. That is to say, the counts(*) won't change, but that doesn't mean anything with regard to what may already changed in the database.

Let's simplify and review some things and call the initialization code above REINIT. We will play with only these two statements.

SELECT count(*) FROM foo WHERE x IS NULL;
UPDATE foo SET x = 1 WHERE x IS NULL;

Now, let's say we run two sessions

REINIT
1#     BEGIN; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
1#     SELECT count(*) FROM foo WHERE x IS NULL;
   2#  UPDATE foo SET x = 1 WHERE x IS NULL;
1#     SELECT count(*) FROM foo WHERE x IS NULL;

Now what does count(*) show in 1#'s transaction? And after both transactions commit, #1 and then #2? Spoiler alert:

In the transaction, it will show the same number. Outside of the transaction, it will show 0 because the UPDATE has already committed.

Now, in a lower transaction level then the REPEATABLE READ, like READ COMMITTED the second SELECT by #1 see the committed rows. And, in a higher transaction level. The first SELECT count(*) will obtain a predicate lock on just the rows where x IS NULL. So then what happens when we move up a level to SERIALIZABLE and run the same sequence, now with predicate lock?

REINIT
1#      BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
1#      SELECT count(*) FROM foo WHERE x IS NULL;
   2#   UPDATE foo SET x = 1 WHERE x IS NULL;
1#      SELECT count(*) FROM foo WHERE x IS NULL;

Spoiler alert:

Nothing. Same thing.

Why? The concurrency model doesn't care about UPDATEs to things unless something also tries to modify those rows and both intend to commit. Both are SELECTing and not modifying their snapshots. So by extension,

if (SELECT count(*) 
      FROM foo
     WHERE x IS NULL
   ) < arg THEN
    RETURN 0 ;
end if ;

Then in the default transaction level, READ COMMITTED it may see a number that is not the same as the rest of the transaction -- a concurrent transaction could have committed between the second SELECT. That makes it useless WITHOUT REPEATABLE READ or SERIALIZABLE.
But in the other transaction modes you could either face a lock issue in REPEATABLE READ, or a failure upon commit in SERIALIZABLE if anything else touches the table from outside of the transaction. So again, there is limited point. If the count(*) query return rows, and, then, for example, other code may attempt to update those rows just to find that they've already been modified or that they're no longer there when the working snapshot goes to commit.

So don't conditionally do stuff in transactions. Do stuff, and then handle it.

† The SQL standard provides for a level READ UNCOMMITTED. In PostgreSQL that level aliases that to the tighter isolation level READ COMMITTED.

Is `count(*)` ever guaranteed to return the same result in a transaction at any transaction level?

2 Answers2

Linked