14

What is the difference between the smallint type and the bool type for storing boolean values?

This question arose in the comments to a question on Geographic Information Systems Stack Exchange.

Evan Carroll
  • 65,432
  • 50
  • 254
  • 507

2 Answers2

29

Always store boolean data as boolean. Only exotic exception imaginable.
Just to address the storage angle in addition to what you posted as answer:

boolean requires 1 byte on disk, smallint requires 2. But that's not the whole story.

smallint (like other integer types and unlike boolean) also has special needs for alignment padding. It can only start at an even offset from the start of the tuple data. So another byte is consumed every odd time (literally).

In a worst case scenario, when mixing with types that require 8-byte alignment like bigint or timestamp / timestamptz:

SELECT pg_column_size(row("char" 'a', FALSE   )) AS char_bool
     , pg_column_size(row("char" 'a', int2 '1')) AS char_int2
     , pg_column_size(row(text 'abcdef', TRUE    , now())) AS text7_bool_ts
     , pg_column_size(row(text 'abcdef', int2 '1', now())) AS text7_int2_ts;  -- worst case
 char_bool | char_int2 | text7_bool_ts | text7_int2_ts
-----------+-----------+---------------+---------------
        26 |        28 |            40 |            48

Details:

If you have many boolean NOT NULL values and want to optimize space on disk:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
4

First, the size of a smallint is two bytes which is twice the size of a bool:

SELECT
  pg_column_size(1::bool) AS "bool"
  , pg_column_size(1::smallint) AS "si";

 bool | si 
------+-----
    1 |   2

Let's create a small table with 1000 rows and check it out:

CREATE TABLE foo AS
  SELECT
    CASE WHEN x>0.5 THEN 1::smallint ELSE 0::smallint END AS si,
    CASE when x>0.5 THEN true ELSE false END AS b
  FROM ( SELECT random() AS x FROM generate_series(1,1e3) ) AS t;

CREATE INDEX si ON foo (si);
CREATE INDEX b ON foo (b);

Now we can see the table with 40kB:

test=# \dt+ foo
            List of relations
 Schema | Name | Type  | Size 
--------+------+-------+-------
 public | foo  | table | 40 kB

Both indexes are the same size (40kB). And all self-joins (returning 500050 rows in my case) using a seq scan and index scan (with SET enable_seqscan=off) complete in the same amount of time:

EXPLAIN ANALYZE SELECT * FROM foo JOIN foo AS f2 USING (b);
EXPLAIN ANALYZE SELECT * FROM foo JOIN foo AS f2 USING (si);

So really the benefits are that:

  • the col size is one byte smaller
  • the table shows bool, rather than smallint which is more descriptive
  • eliminates room for error if you store 3.
  • it's likely to work in third party libraries a bit better.
  • it permits slightly more terse SQL, WHERE isEmpty rather than WHERE empty = 1
Paul White
  • 94,921
  • 30
  • 437
  • 687
Evan Carroll
  • 65,432
  • 50
  • 254
  • 507