30

I've been reviewing C programming and there are just a couple things bothering me.

Let's take this code for example:

int myArray[5] = {1, 2, 2147483648, 4, 5};
int* ptr = myArray;
int i;
for(i=0; i<5; i++, ptr++)
    printf("\n Element %d holds %d at address %p", i, myArray[i], ptr);

I know that an int can hold a maximum value of positive 2,147,483,647. So by going one over that, does it "spill over" to the next memory address which causes element 2 to appear as "-2147483648" at that address? But then that doesn't really make sense because in the output it still says that the next address holds the value 4, then 5. If the number had spilled over to the next address then wouldn't that change the value stored at that address?

I vaguely remember from programming in MIPS Assembly and watching the addresses change values during the program step by step that values assigned to those addresses would change.

Unless I am remembering incorrectly then here is another question: If the number assigned to a specific address is bigger than the type (like in myArray[2]) then does it not affect the values stored at the subsequent address?

Example: We have int myNum = 4 billion at address 0x10010000. Of course myNum can't store 4 billion so it appears as some negative number at that address. Despite not being able to store this large number, it has no effect on the value stored at the subsequent address of 0x10010004. Correct?

The memory addresses just have enough space to hold certain sizes of numbers/characters, and if the size goes over the limit then it will be represented differently (like trying to store 4 billion into the int but it will appear as a negative number) and so it has no effect on the numbers/characters stored at the next address.

Sorry if I went overboard. I've been having a major brain fart all day from this.

stumpy
  • 417

5 Answers5

48

No, it does not. In C, variables have a fixed set of memory addresses to work with. If you are working on a system with 4-byte ints, and you set an int variable to 2,147,483,647 and then add 1, the variable will usually contain -2147483648. (On most systems. The behavior is actually undefined.) No other memory locations will be modified.

In essence, the compiler will not let you assign a value that is too big for the type. This will generate a compiler error. If you force it to with a case, the value will be truncated.

Looked at in a bitwise way, if the type can only store 8 bits, and you try to force the value 1010101010101 into it with a case, you will end up with the bottom 8 bits, or 01010101.

In your example, regardless of what you do to myArray[2], myArray[3] will contain '4'. There is no "spill over". You are trying to put something that is more than 4-bytes it will just lop off everything on the high end, leaving the bottom 4 bytes. On most systems, this will result in -2147483648.

From a practical standpoint, you want to just make sure this never, ever happens. These sorts of overflows often result in hard-to-solve defects. In other words, if you think there is any chance at all your values will be in the billions, don't use int.

24

Signed integer overflow is undefined behavior. If this happens your program is invalid. The compiler is not required to check this for you, so it may generate an executable that appears to do something reasonable, but there is no guarantee that it will.

However, unsigned integer overflow is well-defined. It will wrap modulo UINT_MAX+1. The memory not occupied by your variable will not be affected.

See also https://stackoverflow.com/q/18195715/951890

Vaughn Cato
  • 1,069
14

So, there are two things here:

  • the language level: what are the semantics of C
  • the machine level: what are the semantics of the assembly/CPU you use

At the language level:

In C:

  • overflow and underflow are defined as modulo arithmetic for unsigned integers, thus their value "loops"
  • overflow and underflow are Undefined Behavior for signed integers, thus anything can happen

For those would want a "what anything" example, I've seen:

for (int i = 0; i >= 0; i++) {
    ...
}

turn into:

for (int i = 0; true; i++) {
    ...
}

and yes, this is a legitimate transformation.

It means that there are indeed potential risks of overwriting memory on overflow due to some weird compiler transformation.

Note: on Clang or gcc use -fsanitize=undefined in Debug to activate the Undefined Behavior Sanitizer which will abort on underflow/overflow of signed integers.

Or it means that you can overwrite memory by using the result of the operation to index (unchecked) into an array. This is unfortunately far more likely in the absence of underflow/overflow detection.

Note: on Clang or gcc use -fsanitize=address in Debug to activate the Address Sanitizer which will abort on out-of-bounds access.


At the machine level:

It really depends upon the assembly instructions and CPU you use:

  • on x86, ADD will use 2-complement on overflow/underflow, and set the OF (Overflow Flag)
  • on the future Mill CPU, there will be 4 different overflow modes for Add:
    • Modulo: 2-complement modulo
    • Trap: a trap is generated, halting computation
    • Saturate: value gets stuck to min on underflow or max on overflow
    • Double Width: the result is generated in a double-width register

Note that whether things happen in registers or memory, in neither case does the CPU overwrite memory on overflow.

Matthieu M.
  • 15,214
4

To further @StevenBurnap's answer, the reason this happens is because of how computers work at machine-level.

Your array is stored in memory (e.g. in RAM). When an arithmetic operation is performed, the value in memory is copied into the input registers of the circuit that performs the arithmetic (the ALU: Arithmetic Logic Unit), the operation is then carried out on the data in the input registers, producing a result in the output register. This result is then copied back into memory at the correct address in memory, leaving other areas of memory untouched.

Pharap
  • 594
4

First (assuming C99 standard), you may want to include <stdint.h> standard header and use some of the types defined there, notably int32_t which is exactly a 32 bits signed integer, or uint64_t which is exactly a 64 bits unsigned integer, and so on. You might want to use types like int_fast16_t for performance reasons.

Read others answers explaining that unsigned arithmetic never spills (or overflows) to adjacent memory locations. Beware of undefined behavior on signed overflow.

Then, if you need to compute exactly huge integer numbers (e.g. you want to compute factorial of 1000 with all its 2568 digits in decimal), you want bigints a.k.a. arbitrary precision numbers (or bignums). Algorithms for efficient bigint arithmetic are very clever, and usually requires using specialized machine instructions (e.g. some add word with carry, if your processor has that). Hence I strongly recommend in that case to use some existing bigint library like GMPlib