2

In MySQL there is a collation utf8_general_ci which provides case-insensitive comparisons in a variety of languages. For example, these are all 1 (true):

SELECT 'ı' = 'I' SOLLATE 'utf8_general_ci';
SELECT 'i' = 'I' COLLATE 'utf8_general_ci';
SELECT 'ä' = 'Ä' COLLATE 'utf8_general_ci';

Can I define a similar collation using PostgreSQL's ICU?

I tried

CREATE COLLATION "undefined_ci_nondet_old" (
  PROVIDER = 'icu',
  LOCALE = "@colStrength=secondary",
  DETERMINISTIC = false
);

But that doesn't seem to include the Turkish I/ı conversion:

SELECT 'ı' = 'I' COLLATE undefined_ci_nondet_old; -- false
AndreKR
  • 607
  • 9
  • 19

1 Answers1

3

The dotless I is a special case. It's processed by the ICU collation service with rules that depend on the language.

If the locale refered to the Turkish or Azerbaijani languages, it would produce the result that speakers of these languages might expect (that is, i and ı are two different letters with İ and I being their respective uppercase counterparts; cross-comparisons return false). Otherwise the result is normally that i is the lowercase version of I, whereas ı is not.

postgres=# CREATE COLLATION "undefined_ci_nondet_old" (
  PROVIDER = 'icu',
  LOCALE = 'tr@colStrength=secondary',
  DETERMINISTIC = false
);
CREATE COLLATION

postgres=# SELECT 'ı' = 'I' COLLATE undefined_ci_nondet_old; ?column?


t (1 row)

postgres=# select 'i'='İ' COLLATE undefined_ci_nondet_old; ?column?


t (1 row)

postgres=# select 'i'='I' COLLATE undefined_ci_nondet_old; ?column?


f (1 row)

postgres=# select 'ı'='İ' COLLATE undefined_ci_nondet_old; ?column?


f (1 row)

Daniel Vérité
  • 32,662
  • 3
  • 78
  • 84