Difference between LC_COLLATE='C' and LC_COLLATE='C.UTF-8' in PostgreSQL

Question

I need guidance on the exact distinction, if any, between LC_COLLATE='C' and LC_COLLATE='C.UTF-8' in PostgreSQL. It seems to me that they behave exactly the same and that the codeset (with bearing on character classification for character-case conversion, etc.) is picked up from LC_CTYPE alone.

Here is an experiment with PostgreSQL 9.6 that seems to support this assessment ('Ä' is the upper-case version of the German Umlaut 'ä'):

SHOW LC_COLLATE; -- 'en_US.UTF-8': cluster default
SHOW LC_CTYPE; -- 'en_US.UTF-8': cluster default

CREATE DATABASE test WITH TEMPLATE='template0' ENCODING='UTF8' 
  LC_COLLATE='C.UTF-8' LC_CTYPE='C';
\c test
SELECT upper('ä'); -- 'ä': codeset 'UTF-8' apparently not picked up from LC_COLLATE
\c postres
DROP DATABASE test;

CREATE DATABASE test WITH TEMPLATE='template0' ENCODING='UTF8' 
  LC_COLLATE='C' LC_CTYPE='C.UTF-8';
\c test
SELECT upper('ä'); -- 'Ä': codeset 'UTF-8' apparently picked up from LC_CTYPE
\c postres
DROP DATABASE test;

CREATE DATABASE test WITH TEMPLATE='template0' ENCODING='UTF8' 
  LC_COLLATE='C.UTF-8';
SELECT datcollate, datctype FROM pg_database 
  WHERE datname='test'; -- 'C.UTF-8', 'en_US.UTF-8'
\c test
SELECT upper('ä'); -- 'Ä': codeset 'UTF-8' apparently picked up from cluster default `LC_CTYPE`
\c postres
DROP DATABASE test;

So it would seem to me that LC_COLLATE='C' and LC_COLLATE='C.UTF-8' are synonymous. The rationale could be that LC_COLLATE determines string sort order, which does not depend on any codeset in the case of C. Am I correct or is there a ready counter-example?

Difference between LC_COLLATE='C' and LC_COLLATE='C.UTF-8' in PostgreSQL

0 Answers0