26

Each row in a table has a system column ctid of type tid that represents the physical location of the row:

create table t(id serial);
insert into t default values;
insert into t default values;
select ctid
     , id
from t;
ctid  | id
:---- | -:
(0,1) |  1
(0,2) |  2

dbfiddle here

What's the best way of getting just the page number as from the ctid in the most appropriate type (eg integer, bigint or numeric(1000,0))?

The only way I can think of is very ugly.

Jack Douglas
  • 40,517
  • 16
  • 106
  • 178

1 Answers1

36
SELECT (ctid::text::point)[0]::bigint AS block_number FROM t;

db<>fiddle here

@bma suggested something similar in his comment. Here is a ...

Rationale for the type

ctid is of type tid (tuple identifier), called ItemPointer in the C-language source code. The manual:

This is the data type of the system column ctid. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.

Bold emphasis mine. And:

(ItemPointer, also known as CTID)

A block is 8 KB in standard installations. Maximum Table Size is 32 TB. It follows logically that block numbers must accommodate at least a maximum of (fixed per @Daniel's comment):

SELECT (2^45 / 2^13)::int      -- = 2^32 = 4294967294

Which would fit into an unsigned integer. On further investigation I found in the source code that ...

blocks are numbered sequentially, 0 to 0xFFFFFFFE.

Bold emphasis mine. Which confirms the first calculation:

SELECT 'xFFFFFFFE'::bit(32)::int8 -- max page number: 4294967294

Postgres uses signed integer and is therefore one bit short. I couldn't pin down, yet, whether the text representation is shifted to accommodate signed integer. Until somebody can clear this up, I would fall back to bigint, which works in any case.

Cast

There is no registered cast for the tid type in Postgres 9.3 (still true in Postgres 13):

SELECT *
FROM   pg_cast
WHERE  castsource = 'tid'::regtype
OR     casttarget = 'tid'::regtype;

castsource | casttarget | castfunc | castcontext | castmethod ------------+------------+----------+-------------+------------ (0 rows)

You can still cast to text. There is a text representation for every type in Postgres:

Another important exception is that "automatic I/O conversion casts", those performed using a data type's own I/O functions to convert to or from text or other string types, are not explicitly represented in pg_cast.

The text representation matches that of a point, which consists of two float8 numbers, that cast is lossless.

You can access the first number of a point with index 0. Cast to bigint. Voilá.

Performance

I ran a quick test in Postgres 9.4 on a table with 30k rows (best of 5) with a couple of expressions that came to mind, including your original:

SELECT (ctid::text::point)[0]::int                              --  25 ms
     , right(split_part(ctid::text, ',', 1), -1)::int           --  28 ms
     , ltrim(split_part(ctid::text, ',', 1), '(')::int          --  29 ms
     , (ctid::text::t_tid).page_number                          --  31 ms
     , (translate(ctid::text,'()', '{}')::int[])[1]             --  45 ms
     , (replace(replace(ctid::text,'(','{'),')','}')::int[])[1] --  51 ms
     , substring(right(ctid::text, -1), '^\d+')::int            --  52 ms
     , substring(ctid::text, '^\((\d+),')::int                  -- 143 ms
FROM   tbl;

int instead of bigint, mostly irrelevant for the purpose of the test. I eventually repeated the test in Postgres 13 with bigint on a table with 50k rows. Results are largely the same!

The cast to t_tid builds on a user-defined composite type, like @Jake commented.
The gist of it: Casting tends to be faster than string manipulation. Regular expressions are expensive. The above solution is shortest and fastest.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633