Why should C++ uint8_t data not be printable?

Question

On this github C++ related page the writer said

Note that the value_type of those two containers is uint8_t which is not a printable character, make sure to cast it to int before you print.

Why should this be so?

For numbers < 128 decimal the sign bit will be zero.
In an ASCII based system an eg "A" should be stored as decimal 65 = binary 0100 0001 for both signed and unsigned 8 bit integer types.

(Or in default 16 bit signed int as 0000 0000 0100 0001.)

Why should unint8_t need casting to int to print?

Note: For characters with values above 127 decimal, casting to int will ensure that the 8th bit is not confused with the sign bit, but for either 7 or 8 bit ASCII stored in Uint8_t type, for me this does not seem to be an issue.

score 53 · Answer 1 · answered Dec 02 '24 at 04:38

To add to the other answers with some more background about why this is an issue:

In C++, char, signed char, and unsigned char are three distinct types. char may be signed or unsigned, but that’s mostly irrelevant; char is not the same as either signed char or unsigned char. char is supposed to be used for character data (whatever that may mean; it gets complicated). signed char and unsigned char are supposed to be byte-sized integers, signed and unsigned respectively. (And std::byte is supposed to mean an actual byte just as a bag-of-bits, rather than a (integer) number.) Yes, it’s all numbers under the hood, but numbers mean different things in computing.

As noted in the other answers, some character values are not printable. So some values of char are not printable. However since signed char and unsigned char are supposed to represent numeric values, there should be no values of signed char or unsigned char that are unprintable.

However…

The fixed-width integer types in C++ are defined to be type aliases (“typedefs”)… not actual types. That means that std::int8_t is just an alias for some other type. And what type would it be an alias for? Well, the best choice on most platforms—often the only practical choice—is signed char.

Similarly, the best choice for the underlying type of std::uint8_t is unsigned char.

Now, unfortunately, IOStreams was specified to treat signed char and unsigned char just like char. In my opinion, this was a mistake. The upshot of that is that doing std::cout << std::int8_t{65} is exactly the same as doing std::cout << char{65}… which—even though std::int8_t is supposed to be a number, and should print 65—will probably print A on an ASCII or UTF-8 system.

(And it gets even worse, because std::int8_t doesn’t have to be aliased to signed char. It could be aliased to some other, platform-specific type. And that type could be treated as a number when using IOStreams. So you don’t even know whether std::cout << std::int8_t{??} will print a character or a number on other platforms. It’s a mess, which is why I say treating signed char and unsigned char the same as char when doing output was a mistake.)

This has been fixed in the modern formatting library. std::int8_t is still (probably) signed char, but the modern formatting library is smart enough to realize that signed char and unsigned char are not char, and it treats them appropriately as numbers. See for yourself.

One last thing: while it is not technically wrong to cast std::int8_t to int (and std::uint8_t to unsigned int), it’s a little verbose and unnecessary. It’s much simpler to just use the + prefix:

std::cout << std::int8_t{65};   // prints a character (probably 'A')
std::cout << +std::int8_t{65};  // always prints a number (65)

All the + is doing there is promoting the signed char (which std::int8_t really is) to int, which then prints normally as a number.

Doing it this way is optional for std::int8_t or signed char, because manually casting to int is always correct for those types. But if you ever want to print the numeric value of a char, using the + is the only correct way to do it. That is because you don’t know if char is signed or not, so you don’t know whether it is correct to cast it to a signed char or unsigned char. The + always does the right thing.

candied_orange · Answer 2 · 2024-12-03T15:53:42.233

Why should unint8_t need casting to int to print?

So that cout will treat it as an int, not a char. See uint8-t cant be printed with cout.

You've focused on the signed bit, which maps to the reserved bit in ASCII, and ignored that some lower ASCII values map to "unprintable" control characters.

ASCII code - fastbitlab.com

That first column is all "unprintable" control characters.

Unprintable because rather than representing symbols to print, these values are attempts to control a terminal. cout may or may not be pointing at a terminal. Terminals will ding when you send them a bell. Printers wont.^*

So it's not truly that you can't print this way, it's that you'd better be sure what you mean.

This line:

Note that the value_type of those two containers is uint8_t which is not a printable character, make sure to cast it to int before you print.

Is a little deceptive. It's not that the type is or isn't a printable character. It's that cout may surprise you by treating a type with int in its name as a char.

This line (which was next)

std::cout << static_cast<int>(v.front()) << "\n";

doesn't change the values. It just changes how they're interpreted and represented.

If you're wondering why I keep putting "unprintable" in scare quotes it's because that unprintable status is entirely context dependent. For example, in Notepad++ you can turn on Show All Characters and instead of hiding these unprintable characters it will print them (even null) on the screen using special symbols.

This does nothing to change the value of the data. It's a choice of how to present it.

* It can be argued that some printers will ding when you send them a bell. I argue the more a printer does terminal things the less it's a printer and the more it's a terminal.

score 8 · Answer 3 · answered Dec 01 '24 at 16:10

Given the context of the page mentioned, which is about vectors of 2 and 4-bit values, I can only guess that with printing they mean sending the value to an output stream (like std::cout) and getting the decimal representation of the number back.

This does not work with the type uint8_t, because that is an alias for unsigned char and the output streams interpret the bit pattern of unsigned char as representing a character instead of representing a number.

This means that if you use unsigned char or uint8_t to store numbers and you pass them without conversion to an output stream with the expectation to see numbers printed, you are in for a rude awakening. That is what the page tries to warn you about.

gnasher729 · Answer 4 · 2024-12-02T10:05:33.563

If c is an unsigned char with a value of ‘A’ then printing c to std::cout prints the letter ‘A’. Printing c cast to int prints “65”.

uint8_t has 256 different values. Printing each of them cast to int will print something from “0” to “255”. Trying to print each as a character will try to print many unprintable characters. Printable characters print a singe character. Printing unprintable characters can print anything. Usually 32 to 126 are printable. In Windows more characters from 160 to 255 and possibly many but not all from 128 to 159 may be printable.

Why should C++ uint8_t data not be printable?

4 Answers4