28

I see and work with a lot of software, written by a fairly large group of people. LOTS of times, I see integer type declarations as wrong. Two examples I see most often: creating a regular signed integer when there can be no negative numbers. The second is that often the size of the integer is declared as a full 32 bit word when much smaller would do the trick. I wonder if the second has to do with compiler word alignment lining up to the nearest 32 bits but I'm not sure if this is true in most cases.

When you create a number, do you usually create it with the size in mind, or just create whatever is the default "int"?

edit - Voted to reopen, as I don't think the answers adequately deal with languages that aren't C/C++, and the "duplicates" are all C/C++ base. They fail to address strongly typed languages such as Ada, where there cannot be bugs due to mismatched types...it will either not compile, or if it can't be caught at compile time, will throw an exception. I purposely left out naming C/C++ specifically, because other languages treat different integers much differently, even though most of the answers seem to be based around how C/C++ compilers act.

prelic
  • 886

8 Answers8

60

Do you see the same thing?

Yes, the overwhelming majority of declared whole numbers are int.

Why?

  1. Native ints are the size your processor does math with*. Making them smaller doesn't gain you any performance (in the general case). Making them larger means they maybe (depending on your processor) can't be worked on atomically, leading to potential concurrency bugs.
  2. 2 billion and change is big enough to ignore overflow issues for most scenarios. Smaller types mean more work to address them, and lots more work if you guess wrong and you need to refactor to a bigger type.
  3. It's a pain to deal with conversion when you've got all kinds of numeric types. Libraries use ints. Clients use ints. Servers use ints. Interoperability becomes more challenging, because serialization often assumes ints - if your contracts are mismatched, suddenly there are subtle bugs that crop up when they serialize an int and you deserialize a uint.

In short, there's not a lot to gain, and some non-trivial downsides. And frankly, I'd rather spend my time thinking about the real problems when I'm coding - not what type of number to use.

*- these days, most personal computers are 64 bit capable, but mobile devices are dicier.

Telastyn
  • 110,259
19

Regarding size, you are operating under the mistaken impression that "smaller is better", which is simply not true.

Even if we completely ignore issues like programmer time or propensity for error, smaller integer types can still have following disadvantages.

Smaller types = bigger work

Processors don't work on arbitrary sized data; they do operations in registers of specific sizes. Trying to do arithmetic with less precision than stored in the registers can easily require you to do extra work.

For example, if a C program does arithmetic in uint8_t — an unsigned 8-bit integer type where overflow is specified to be reduction modulo 256 — then unless the processor has specialized assembly instructions to handle the special case, your program will have to follow every arithmetic operation with a mask by 0xff, unless the compiler is capable of outright proving that the mask is unnecessary.

Smaller types = inefficient memory

Memory is not uniform. It is fairly common on processors that accessing memory on addresses that are multiples of 4 bytes (or more!) is much more efficient than accessing memory on other addresses.

You may think that using a 1-byte field rather than a 4-byte field is helping you, but the reality may be that it's actually harming you due to such misaligned memory accesses running slower than they need to be.

Of course, compilers know all about this, and in many places will insert the needed wasted space to make things faster:

struct this_struct_is_64_bits_not_40_bits
{
    uint32_t x; uint8_t y;
};

Signed integers = more optimization opportunities

A peculiarity of C and C++ is that signed integer overflow is undefined behavior, which allows the compiler to make optimizations without regard to what effect the optimization might have in the case of overflow.

Optimization guides often outright recommend the use of signed integers in many places for exactly this reason. For example, from the CUDA Best Practices Guide

Note:Low Medium Priority: Use signed integers rather than unsigned integers as loop counters.

18

Using signed 32 bit int "just works" in all of these cases:

  • Loops
  • Integer arithmetic
  • Array indexing and sizing
  • Enumeration values
  • Size of objects in memory (most reasonably sized things)
  • Image dimensions (reasonably-sized images)

Yes-- not all of the uses require signage or 32 bits of data, but the compatibility of signed int 32 with most use cases make it an easy choice to make. Picking any other integer type would take consideration that most people don't want to take the time to take. And with the availability of memory today we enjoy the luxury of wasting a few bytes here and there. Standardizing on a common integer type makes everybody's life a bit easier, and most libraries default to using signed 32-bit integers, so choosing to use other integer types would be a hassle from a casting/converting stand-point.

Samuel
  • 9,237
10

There are still many millions, or billions, of embedded processing devices out there where the "default" integer is 16 bits, eight bits, (a few even smaller), where the assumption that a signed integer is enough is not a valid assumption. (I work with them all of the time).

If you are dealing with any form of communications protocol, you should be thinking about:

  • Sizes, (8 bits, 16, 32, 64, others),
  • Signed/Unsigned
  • Endianness
  • Packing/Alignment

So while I see people just using int all over the place in my field of work we have specific rules against it, (MISRA), and deliberately design our communications protocols, type and data stores with the pitfalls in mind and have to reject such code before it gets into production code.

Steve Barnes
  • 5,330
5

I would like to post an answer which goes in the opposite direction to most of the others. I argue that using int for everything is not good, at least in C or C++.

  1. int does not have much semantic meaning. Using a strictly typed language you should convey as much meaning as possible with your types. So, if your variable represents a value for which it makes no sense to be negative, why not conveying this using unsigned int?
  2. Similar to the above, even more precise types are available than int and unsigned int: in C, the size of an object should be size_t, the offset of a pointer should be ptrdiff_t, etc. They will all really be translated to the appropriate int types by the compiler, but they convey some additional, useful information.
  3. Precise types can allow some architecture-specific optimisation (e.g. uint_fast32_t in C).
  4. Normally, a 64-bit processor can operate on one 64-bit value at a time or on two 32-bit values. In other words, in one clock cycle, you can for example perform 1 64-bit sum, or two 32-bit sums. This effectively doubles the speed of most math operations if 32-bit integers are enough for you. (I cannot find a text quote for this, but iirc it was said by Alexei Alexandrescu in a CppCon talk which would make for a quite authoritative source).
  5. If you use a 32-bit unisgned integer instead of a 64-bit signed integer, for a value which can anyways only be positive, you have effectively halved the memory required to hold that value. It might not be so important in the grand scheme of things, if you think about how cheap RAM is nowadays on most platforms, but it can make the difference if you are doubling the quantity of data that goes into your L1 cache, for example!
1

There are several reasons why it's usually slightly simpler to use signed numbers in C during calculations. But these are just recommendations which apply to calculations/loops in C-like languages, not for designing your abstract data types, communication protocols and anything where storage is concerned.

  1. In 99% cases, your variables will operate much closer to zero than to the MAX_INT value, and in these cases using a signed int often makes it simpler to ensure correctness:

    // if i is unsigned this will loop forever
    // due to underflow to (unsigned)-1
    while (--i >= 0)
    { /* do something */ }
    
  2. Integer promotion in C is a rule which tries to promote all smaller-than-int operands into a (signed) int, if they fit. This means that your smaller unsigned variables (uint8_t or uint16_t) will be treated as an int during operations:

    uint8_t x = 1;
    uint8_t y = 2;
    
    // this will produce a warning, because the 
    // result of `x + y` is an `int`, and you're
    // placing it into a `uint8_t` without explicitly
    // casting:
    
    uint8_t result = x + y;
    

    At the same time, by using smaller types, you haven't probably gained anything in terms of performance, because compilers usually choose int to match the word size of the target architecture, so CPU registers won't really care if you are using anything smaller.

Obviously, this doesn't mean you will waste space in struct fields on 32-bit ints, if all you need is a uint8_t.

vgru
  • 613
0

Regarding the signed/unsigned thing: Remember that unsigned arithmetic has totally different semantics compared to signed arithmetic. Unsigned arithmetic is mod 2^n (where n is the number of bits of your unsigned type). However, such an arithmetic is often undesired and it is better to handle an overflow as error.

As far as C++ is concerned also note that there exist some regret in the standards committee about using unsigned data types all over the standard library. See this video at 9:50, 42:40, 1:02:50.

sigy
  • 716
0

signed vs. unsigned or 16-bits vs. 32-bits are only few cases of specifying exact boundaries for integer variables.

C has no way to specify these boundaries, like in Ada:

subtype My_Index is Integer range 2 .. 7;

In C, int, short, char, long, unsigned, are only a convenient way for optimizing storage size. They are not intended to carry a strict semantic.

mouviciel
  • 15,491