Questions tagged [character-encoding]
60 questions
585
votes
1 answer
Is the use of "utf8=✓" preferable to "utf8=true"?
I have recently seen a few URIs containing the query parameter "utf8=✓". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding.
So, is this a better way to resolve potential…
Gary
- 24,440
169
votes
6 answers
How to detect the encoding of a file?
On my filesystem (Windows 7) I have some text files (These are SQL script files, if that matters).
When opened with Notepad++, in the "Encoding" menu some of them are reported to have an encoding of "UCS-2 Little Endian" and some of "UTF-8 without…
Marcel
- 3,152
121
votes
5 answers
What is the advantage of choosing ASCII encoding over UTF-8?
All characters in ASCII can be encoded using UTF-8 without an increase in storage (both requires a byte of storage).
UTF-8 has the added benefit of character support beyond "ASCII-characters". If that's the case, why will we ever choose ASCII…
Pacerier
- 5,053
76
votes
2 answers
Why do so many hashed and encrypted strings end in an equals sign?
I work in C# and MSSQL and as you'd expect I store my passwords salted and hashed.
When I look at the hash stored in an nvarchar column (for example the out the box aspnet membership provider). I've always been curious why the generated Salt and…
Liath
- 3,436
41
votes
3 answers
Why do we need to put N before strings in Microsoft SQL Server?
I'm learning T-SQL. From the examples I've seen, to insert text in a varchar() cell, I can write just the string to insert, but for nvarchar() cells, every example prefix the strings with the letter N.
I tried the following query on a table which…
qinking126
- 551
34
votes
8 answers
Should character encodings besides UTF-8 (and maybe UTF-16/UTF-32) be deprecated?
A pet peeve of mine is looking at so many software projects that have mountains of code for character set support. Don't get me wrong, I'm all for compatibility, and I'm happy that text editors let you open and save files in multiple character…
Joey Adams
- 5,615
28
votes
5 answers
What issues lead people to use Japanese-specific encodings rather than Unicode?
At work I come across a lot of Japanese text files in Shift-JIS and other encodings. It causes many mojibake (unreadable character) problems for all computer users. Unicode was intended to solve this sort of problem by defining a single character…
Nicolas Raoul
- 1,072
27
votes
7 answers
Is the carriage-return char considered obsolete
I wrote an open source library that parses structured data but intentionally left out carriage-return detection because I don't see the point. It adds additional complexity and overhead for little/no benefit.
To my surprise, a user submitted a bug…
Evan Plaice
- 5,785
21
votes
4 answers
Why does UTF-8 waste several bits in its encoding
According to the Wikipedia article, UTF-8 has this format:
First code Last code Bytes Byte 1 Byte 2 Byte 3 Byte 4
point point Used
U+0000 U+007F 1 0xxxxxxx
U+0080 U+07FF 2 110xxxxx 10xxxxxx
U+0800 U+FFFF…
qbt937
- 321
17
votes
2 answers
Is UTF-16 fixed-width or variable-width? Why doesn't UTF-8 have byte-order problem?
Is UTF-16 fixed-width or variable-width? I got different results
from different sources:
From http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF:
UTF-16 stores Unicode characters in sixteen-bit chunks.
From…
Tim
- 5,545
16
votes
4 answers
Why should C++ uint8_t data not be printable?
On this github C++ related page the writer said
Note that the value_type of those two containers is uint8_t which is not a printable character, make sure to cast it to int before you print.
Why should this be so?
For numbers < 128 decimal the sign…
Russell McMahon
- 273
12
votes
3 answers
Should my source code be in UTF-8?
I feel that often you don't really choose what format your code is in. I mean most of my tools in the past have decided for me. Or I haven't really even thought about it. I was using TextPad on windows the other day and as I was saving a file, it…
Parris
- 241
9
votes
2 answers
I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode?
I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode? I know that from a mathematical point of view the fractions 2/2 = 3/3 = 1 but I am typing a list where I want to indicate that you have reached the final step (third step out of three…
d-b
- 215
8
votes
1 answer
Is the BOM optional for UTF-16 and UTF-32?
I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32.
But then I have read the following (in this article):
Let's look just at the ones that Notepad supports.
8-bit ANSI (of which 7-bit ASCII is a subset). These…
user9002947
- 249
- 3
- 4
8
votes
2 answers
How relevant is UTF-7 when it comes to parsing emails?
I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way.
One of those things is emails encoded as UTF-7. Most emails come…
Pablo Fernandez
- 313