26

I have a very simple question that baffles me for a long time. I am dealing with networks and databases so a lot of data I am dealing with are 32-bit and 64-bit counters (unsigned), 32-bit and 64-bit identification ids (also do not have meaningful mapping for sign). I am practically never deal with any real word matter that could be expressed as a negative number.

Me and my co-workers routinely use unsigned types like uint32_t and uint64_t for these matters and because it happens so often we also use them for array indexes and other common integer uses.

At the same time various coding guides I am reading (e.g. Google) discourage use of unsigned integer types, and as far as I know neither Java nor Scala have unsigned integer types.

So, I could not figure out what is the right thing to do: using signed values in our environment would be very inconvenient, at the same time coding guides to insist on doing exactly this.

Mael
  • 2,395
  • 1
  • 15
  • 26
zzz777
  • 456

7 Answers7

34

There are two schools of thought on this, and neither will ever agree.

The first argues that there are some concepts that are inherently unsigned - such as array indexes. It makes no sense to use signed numbers for those as it may lead to errors. It also can impose un-necessary limits on things - an array that uses signed 32-bit indexes can only access 2 billion entries, while switching to unsigned 32-bit numbers allows 4 billion entries.

The second argues that in any program that uses unsigned numbers, sooner or later you will end up doing mixed signed-unsigned arithmetic. This can give strange and unexpected results: casting a large unsigned value to signed gives a negative number, and conversely casting a negative number to unsigned gives a large positive one. This can be a big source of errors.

Simon B
  • 9,772
22

First of all, the Google C++ coding guideline is not a very good one to follow: it shuns things like exceptions, boost, etc which are staples of modern C++. Secondly, just because a certain guideline works for company X doesn't mean it will be the right fit for you. I would continue using unsigned types, as you have a good need for them.

A decent rule of thumb for C++ is: prefer int unless you have a good reason to use something else.

bstamour
  • 1,249
6

The other answers lack real world examples, so I will add one. One of the reasons why I (personally) try to avoid unsigned types.

Consider using standard size_t as an array index:

for (size_t i = 0; i < n; ++i)
    // do something here;

Ok, perfectly normal. Then, consider we decided to change the direction of the loop for some reason:

for (size_t i = n - 1; i >= 0; --i)
    // do something here;

And now it does not work. If we used int as an iterator, there would be no problem. I've seen such error twice in the past two years. Once it happened in production and was hard to debug.

Another reason for me are annoying warnings, which make you write something like this every time:

int n = 123;  // for some reason n is signed
...
for (size_t i = 0; i < size_t(n); ++i)

These are minor things, but they add up. I feel like the code is cleaner if only signed integers are used everywhere.

Edit: Sure, the examples look dumb, but I saw people making this mistake. If there's such an easy way to avoid it, why not use it?

When I compile the following piece of code with VS2015 or GCC I see no warnings with default warning settings (even with -Wall for GCC). You have to ask for -Wextra to get a warning about this in GCC. This is one of the reasons you should always compile with Wall and Wextra (and use static analyser), but in many real life projects people don't do that.

#include <vector>
#include <iostream>


void unsignedTest()
{
    std::vector<int> v{ 1, 2 };

    for (int i = v.size() - 1; i >= 0; --i)
        std::cout << v[i] << std::endl;

    for (size_t i = v.size() - 1; i >= 0; --i)
        std::cout << v[i] << std::endl;
}

int main()
{
    unsignedTest();
    return 0;
}
6
for (size_t i = v.size() - 1; i >= 0; --i)
   std::cout << v[i] << std::endl;

The problem here is that you wrote the loop in an unclever manner leading to the erroneous behavior. The construction of the loop is like beginners get it taught for signed types (which is OK and correct) but it simply doesn't fit for unsigned values. But this cannot serve as a counter argument against using unsigned types, the task here is to simply get your loop right. And this can easily be fixed to reliably work for unsigned types like so:

for (size_t i = v.size(); i-- > 0; )
    std::cout << v[i] << std::endl;

This change simply reverts the sequence of the comparison and the decrement operation and is in my opinion the most effective, undisturbing, clean and shortes way to handle unsigned counters in backward loops. You would do the very same thing (intuitively) when using a while loop:

size_t i = v.size();
while (i > 0)
{
    --i;
    std::cout << v[i] << std::endl;
}

No underflow can occur, the case of an empty container is covered implicitely, as in the well known variant for the signed counter loop, and the body of the loop may stay unaltered in comparision to a signed counter or a forward loop. You just have to get acustomed to the at first somewhat strange looking loop construct. But after you've seen that a dozen times there's nothing unintelligible anymore.

I would be lucky if beginners courses would not only show the correct loop for signed but also for unsigned types. This would avoid a couple of errors that should IMHO be blamed to the unwitting developers instead of blaming the unsigned type.

HTH

Don Pedro
  • 161
1

Unsigned integers are there for a reason.

Consider, for example, handing data as individual bytes, e.g. in a network packet or a file buffer. You may occasionally encounter such beasts as 24-bit integers. Easily bit-shifted from three 8-bit unsigned integers, not so easy with 8-bit signed integers.

Or think about algorithms using character lookup tables. If a character is an 8-bit unsigned integer, you can index a lookup table by a character value. However, what do you do if the programming language doesn't support unsigned integers? You would have negative indexes to an array. Well, I guess you could use something like charval + 128 but that's just ugly.

Many file formats, in fact, use unsigned integers and if the application programming language doesn't support unsigned integers, that could be a problem.

Then consider TCP sequence numbers. If you write any TCP processing code, you will definitely want to use unsigned integers.

Sometimes, efficiency matters so much that you really need that extra bit of unsigned integers. Consider for example IoT devices that are shipped in millions. Lots of programming resources can then be justified to be spent on micro-optimizations.

I would argue that the justification to avoid using unsigned integer types (mixed sign arithmetic, mixed sign comparisons) can be overcome by a compiler with proper warnings. Such warnings are usually not enabled by default, but see e.g. -Wextra or separately -Wsign-compare (auto-enabled in C by -Wextra, although I don't think it's auto-enabled in C++) and -Wsign-conversion.

Nevertheless, if in doubt, use a signed type. Many times, it is a choice that works well. And do enable those compiler warnings!

juhist
  • 2,579
  • 12
  • 14
0

There are many cases where integers don't actually represent numbers, but for example a bit mask, an id, etc. Basically cases where adding 1 to an integer doesn't have any meaningful result. In those cases, use unsigned.

There are many cases where you do arithmetic with integers. In these cases, use signed integers, to avoid misbehaviour around zero. See plenty of examples with loops, where running a loop down to zero either uses very unintuitive code or is broken because of the use of unsigned numbers. There is the argument "but indices are never negative" - sure, but differences of indices for example are negative.

In the very rare case where indices exceed 2^31 but not 2^32, you don't use unsigned integers, you use 64 bit integers.

Finally, a nice trap: In a loop "for (i = 0; i < n; ++i) a [i] ... " if i is unsigned 32 bit, and memory exceeds 32 bit addresses, the compiler cannot optimise the access to a [i] by incrementing a pointer, because at i = 2^32 - 1 i wraps around. Even when n never gets that large. Using signed integers avoids this.

gnasher729
  • 49,096
-6

Finally, I found a really good answer here: "Secure Programming Cookbook" by J.Viega and M.Messier (http://shop.oreilly.com/product/9780596003944.do)

Security issues with signed integers:

  1. If a function requires a positive parameter, it is easy to forget to check the lower range.
  2. Unintuitive bit pattern from negative integer size conversions.
  3. Unintuitive bit pattern produced by the right shift operation of a negative integer.

Update:

A. There are problems with signed<->unsigned conversions so it is not advisable to use a mix.

B. If you are using unsigned integers it is easy to check for overflow.

zzz777
  • 456