What are the commonly confused encodings that may result in identical test data?

Question

I'm fixing code that is using ASCIIEncoding in some places and UTF-8 encoding in other functions.

Since we aren't using the UTF-8 features, all of our unit tests passed, but I want to create a heightened awareness of encodings that produce similar results and may not be fully tested.

I don't want to limit this to just UTF-8 vs ASCII, since I think issue with code that handles ASN.1 fields and other code working with Base64.

So, what are the commonly confused encodings that may result in identical test data?

score 3 · Answer 1 · answered Jul 01 '12 at 10:52

3

Windows Codepages and Extended ASCII support are virtually guaranteed to throw you a curveball.

answered Jul 01 '12 at 10:52

DeadMG

36,914

score 0 · Answer 2 · answered Jul 02 '12 at 22:23

For MIME the following headers may produce similar results:

Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset=UTF-8 (a superset of ASCII)
Content-type: text/plain; charset="ISO-8859-2" (another superset of ASCII)
Content-type: text/enriched; charset="windows-1252" (if there are no enriched codes)

What are the commonly confused encodings that may result in identical test data?

2 Answers2