40

I have Postgres 9.4.4 running on Debian and I get the following ORDER BY behavior:

veure_test=# show LC_COLLATE;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

veure_test=# SELECT regexp_split_to_table('D d a A c b CD Capacitor', ' ') ORDER BY 1;
 regexp_split_to_table 
-----------------------
 a
 A
 b
 c
 Capacitor
 CD
 d
 D
(8 rows)

And uname -a:

Linux ---- 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux

However, on my iMac, with Postgres 9.3.4, I get the following:

veure_test=# show LC_COLLATE;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

veure_test=# SELECT regexp_split_to_table('D d a A c b CD Capacitor', ' ') ORDER BY 1;
 regexp_split_to_table 
-----------------------
 A
 CD
 Capacitor
 D
 a
 b
 c
 d
(8 rows)

And the uname -a:

Darwin ---- 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64

I'm mystified by why the Debian version appears to be case-insensitive and the OS X version is not. What am I missing, or what other information do I need to provide?

Update: On my Mac, the pg_collation table shows I have an en_US.UTF-8 collation, but on Debian, I have an en_US.utf8 collation. Thus, on my Mac:

veure_test=# with foo as (
SELECT regexp_split_to_table('D d a A c b CD Capacitor', ' ') as bar
   )
SELECT bar FROM foo
ORDER BY bar collate "en_US.UTF-8";                                                                                                                                                                                      
    bar    
-----------
 A
 CD
 Capacitor
 D
 a
 b
 c
 d
(8 rows)

And on Debian:

veure_test=# with foo as (
SELECT regexp_split_to_table('D d a A c b CD Capacitor', ' ') as bar
   )
SELECT bar FROM foo
ORDER BY bar collate "en_US.utf8";
    bar    
-----------
 a
 A
 b
 c
 Capacitor
 CD
 d
 D
(8 rows)

So en_US.UTF-8 and en_US.utf8 have different sort orders?

Curtis Poe
  • 515
  • 1
  • 4
  • 6

1 Answers1

32

So en_US.UTF-8 and en_US.utf8 have different sort orders?

No, these both are the same, just a different naming convention.

I'm mystified by why the Debian version appears to be case-insensitive and the OS X version is not.

Yes, you are correct. This is the default behavior on Mac. Collations don't work on any BSD-ish OS (incl. OSX) for UTF8 encoding.

Here is a reference to prove that:

Problems with sort order (UTF8 locales don't work

As a_horse_with_no_name said, Postgres uses the collation implementation from the OS. There is no way to get the same result on both operating systems.

In your case you may(I said maybe)do like this: ORDER BY lower(fieldname).

atokpas
  • 8,680
  • 1
  • 18
  • 27