Highest Voted 'character-encoding' Questions - Software Engineering Stack Exchange

585

votes

1 answer

Is the use of "utf8=✓" preferable to "utf8=true"?

I have recently seen a few URIs containing the query parameter "utf8=✓". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding. So, is this a better way to resolve potential…

asked Oct 13 '12 at 11:57

Gary

24,440

169

votes

6 answers

How to detect the encoding of a file?

On my filesystem (Windows 7) I have some text files (These are SQL script files, if that matters). When opened with Notepad++, in the "Encoding" menu some of them are reported to have an encoding of "UCS-2 Little Endian" and some of "UTF-8 without…

file-systems character-encoding utf-8 notepad++

asked Feb 15 '13 at 09:45

Marcel

3,152

121

votes

5 answers

What is the advantage of choosing ASCII encoding over UTF-8?

All characters in ASCII can be encoded using UTF-8 without an increase in storage (both requires a byte of storage). UTF-8 has the added benefit of character support beyond "ASCII-characters". If that's the case, why will we ever choose ASCII…

character-encoding utf-8 ascii

asked Jul 30 '11 at 13:08

Pacerier

5,053

76

votes

2 answers

Why do so many hashed and encrypted strings end in an equals sign?

I work in C# and MSSQL and as you'd expect I store my passwords salted and hashed. When I look at the hash stored in an nvarchar column (for example the out the box aspnet membership provider). I've always been curious why the generated Salt and…

hashing character-encoding

asked Jun 17 '14 at 09:15

Liath

3,436

41

votes

3 answers

Why do we need to put N before strings in Microsoft SQL Server?

I'm learning T-SQL. From the examples I've seen, to insert text in a varchar() cell, I can write just the string to insert, but for nvarchar() cells, every example prefix the strings with the letter N. I tried the following query on a table which…

sql sql-server character-encoding unicode

asked Jul 06 '12 at 14:47

qinking126

551

34

votes

8 answers

Should character encodings besides UTF-8 (and maybe UTF-16/UTF-32) be deprecated?

A pet peeve of mine is looking at so many software projects that have mountains of code for character set support. Don't get me wrong, I'm all for compatibility, and I'm happy that text editors let you open and save files in multiple character…

unicode utf-8 character-encoding

asked Jan 26 '11 at 03:32

Joey Adams

5,615

28

votes

5 answers

What issues lead people to use Japanese-specific encodings rather than Unicode?

At work I come across a lot of Japanese text files in Shift-JIS and other encodings. It causes many mojibake (unreadable character) problems for all computer users. Unicode was intended to solve this sort of problem by defining a single character…

legacy unicode character-encoding

asked Jun 08 '11 at 06:36

Nicolas Raoul

1,072

27

votes

7 answers

Is the carriage-return char considered obsolete

I wrote an open source library that parses structured data but intentionally left out carriage-return detection because I don't see the point. It adds additional complexity and overhead for little/no benefit. To my surprise, a user submitted a bug…

mac osx character-encoding software-obsolescence

asked Dec 13 '12 at 06:24

Evan Plaice

5,785

21

votes

4 answers

Why does UTF-8 waste several bits in its encoding

According to the Wikipedia article, UTF-8 has this format: First code Last code Bytes Byte 1 Byte 2 Byte 3 Byte 4 point point Used U+0000 U+007F 1 0xxxxxxx U+0080 U+07FF 2 110xxxxx 10xxxxxx U+0800 U+FFFF…

character-encoding utf-8 text-encoding

asked Nov 09 '14 at 19:50

qbt937

321

17

votes

2 answers

Is UTF-16 fixed-width or variable-width? Why doesn't UTF-8 have byte-order problem?

Is UTF-16 fixed-width or variable-width? I got different results from different sources: From http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF: UTF-16 stores Unicode characters in sixteen-bit chunks. From…

unicode character-encoding utf-8

asked Jul 22 '11 at 23:45

Tim

5,545

16

votes

4 answers

Why should C++ uint8_t data not be printable?

On this github C++ related page the writer said Note that the value_type of those two containers is uint8_t which is not a printable character, make sure to cast it to int before you print. Why should this be so? For numbers < 128 decimal the sign…

type-casting character-encoding

asked Dec 01 '24 at 12:26

Russell McMahon

273

12

votes

3 answers

Should my source code be in UTF-8?

I feel that often you don't really choose what format your code is in. I mean most of my tools in the past have decided for me. Or I haven't really even thought about it. I was using TextPad on windows the other day and as I was saving a file, it…

coding-standards source-code character-encoding utf-8

asked Jun 13 '12 at 19:55

Parris

241

9

votes

2 answers

I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode?

I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode? I know that from a mathematical point of view the fractions 2/2 = 3/3 = 1 but I am typing a list where I want to indicate that you have reached the final step (third step out of three…

unicode character-encoding fonts

asked Oct 09 '16 at 19:04

d-b

215

8

votes

1 answer

Is the BOM optional for UTF-16 and UTF-32?

I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32. But then I have read the following (in this article): Let's look just at the ones that Notepad supports. 8-bit ANSI (of which 7-bit ASCII is a subset). These…

unicode character-encoding

asked Apr 28 '18 at 05:11

user9002947

249
3
4

8

votes

2 answers

How relevant is UTF-7 when it comes to parsing emails?

I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way. One of those things is emails encoded as UTF-7. Most emails come…

ruby character-encoding text-encoding

asked Sep 06 '12 at 16:25

Pablo Fernandez

313

Questions tagged [character-encoding]