Equivalent of utf8_general_ci in Postgres/ICU?

Question

In MySQL there is a collation utf8_general_ci which provides case-insensitive comparisons in a variety of languages. For example, these are all 1 (true):

SELECT 'ı' = 'I' SOLLATE 'utf8_general_ci';
SELECT 'i' = 'I' COLLATE 'utf8_general_ci';
SELECT 'ä' = 'Ä' COLLATE 'utf8_general_ci';

Can I define a similar collation using PostgreSQL's ICU?

I tried

CREATE COLLATION "undefined_ci_nondet_old" (
  PROVIDER = 'icu',
  LOCALE = "@colStrength=secondary",
  DETERMINISTIC = false
);

But that doesn't seem to include the Turkish I/ı conversion:

SELECT 'ı' = 'I' COLLATE undefined_ci_nondet_old; -- false

Daniel Vérité · Answer 1 · 2021-05-10T13:12:08.547

The dotless I is a special case. It's processed by the ICU collation service with rules that depend on the language.

If the locale refered to the Turkish or Azerbaijani languages, it would produce the result that speakers of these languages might expect (that is, i and ı are two different letters with İ and I being their respective uppercase counterparts; cross-comparisons return false). Otherwise the result is normally that i is the lowercase version of I, whereas ı is not.

postgres=# CREATE COLLATION "undefined_ci_nondet_old" (
  PROVIDER = 'icu',
  LOCALE = 'tr@colStrength=secondary',
  DETERMINISTIC = false
);
CREATE COLLATION
postgres=# SELECT 'ı' = 'I' COLLATE undefined_ci_nondet_old;
 ?column?

t
(1 row)
postgres=# select 'i'='İ' COLLATE undefined_ci_nondet_old;
 ?column?

t
(1 row)
postgres=# select 'i'='I' COLLATE undefined_ci_nondet_old;
 ?column?

f
(1 row)
postgres=# select 'ı'='İ' COLLATE undefined_ci_nondet_old;
 ?column?

f
(1 row)

Equivalent of utf8_general_ci in Postgres/ICU?

1 Answers1