6

I am looking for a charset/collation that would make it so when I do a

SELECT * FROM table_name WHERE username = "Warrior"

It only returns me the rows where username = "Warrior", "warrior" or "WARRIOR", and not "WÂRRÎOR" "Wârrîor" etc.

I found a partial solution, by changing the Charset to "utf8mb4" and the Collation to "utf8mb4_bin", now it seems accent sensitive, it differentiates "Wârrîor" from "Warrior", but it's also case sensitive, so "Warrior" is different than "WARRIOR" which is not what I want.

I tried a different collations but I couldn't get one to do what exactly what I want. Any ideas ?

Below is a screenshot of the different Collations available to me in the "utf8mb4" Charset :

enter image description here

lyeaf
  • 307
  • 2
  • 11

2 Answers2

2

Note that _bin means case and accent sensitive, in your case you should't use utf8mb4_bin.

You could use:

utf8mb4_0900_as_ci

as means accent sensitive, and ci means case insensitive

Demo:

CREATE TABLE t (
s1 VARCHAR(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci
);

insert into t values ('WÂRRÎOR'), ('Warrior'), ('warrior'), ('WARRIOR'), ('Wârrîor');

SELECT * FROM t WHERE s1 = "Warrior";

Result:

s1
Warrior
warrior
WARRIOR
Ergest Basha
  • 5,369
  • 3
  • 7
  • 22
1
  • _bin -- accent sensitive and case sensitive
  • _as_ci (MySQL 8.0 only) -- accent sensitive and case insensitive
  • _ci -- accent insensitive and case insensitive

This lets you see what will compare equal and what won't:

(Caveat: Those were taken from specific versions; the available collations do change, but the collations don't change.)

Most, maybe not all, _ci and _ai_ci collations will treat "Wârrîor" = "Warrior"

All _ci or _ai_ci collations will treat "WARRIOR" = "Warrior"

Rick James
  • 80,479
  • 5
  • 52
  • 119