Trigrams don't work for non-Latin languages on the Mac.
On the Mac, I'm getting the following behavior:
select show_trgm('peace') -> {" p"," pe",ace,"ce ",eac,pea}
select show_trgm('über') -> {0xb1fa7d,0x0d7f89,0x4f84fe,ber,"er "}
select show_trgm('שלום') -> {}
After trying with the default collations and getting the empty results, I tried changing to a UTF-8 collation, which still doesn't work.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-------+----------+-------------+-------------+-------------------
testdb | tomer | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | tomer | UTF8 | C | C |
template0 | tomer | UTF8 | C | C | =c/tomer +
| | | | | tomer=CTc/tomer
template1 | tomer | UTF8 | C | C | =c/tomer +
| | | | | tomer=CTc/tomer
I'm using postgresql 14 on macOS 13.4.
Running the third query works on a postgres 13 instance in AWS RDS, with the UTF-8 encoding, collation and ctype.
select show_trgm('שלום') -> {0x46e146,0x64182e,0x8c40cc,0x9c1fe2,0xda89ec}
I tried running this statement on both machines:
select * FROM ts_debug('english', 'שלום');
On the Mac, I got
alias │ description │ token │ dictionaries │ dictionary │ lexemes
═══════╪═══════════════╪═══════╪══════════════╪════════════╪═════════
blank │ Space symbols │ שלום │ {} │ NULL │ NULL
On the cloud server, I got
alias │ description │ token │ dictionaries │ dictionary │ lexemes
═══════╪═══════════════════╪═══════╪════════════════╪══════════════╪═════════
word | Word, all letters | שלום | {english_stem} | english_stem | {שלום}
Ideas?