13

I can't figure out why microprocessor systems implement unsigned numbers. I guess the cost is just double the number of conditional branches, since greater than, less than, .etc, need a different algorithm than signed, still are there any algorithms that unsigned numbers are a significant advantage for?

my question partly is why they need to be in the instruction set as opposed to be supported by a compiler?

jtw
  • 155
  • 1
  • 4

9 Answers9

40

Unsigned numbers are one interpretation of a sequence of bits. It is also the simplest, and most used interpretation internally to the CPU because addresses, and op codes are simply bits. Memory / Stack addressing and arithmetic are the foundations of microprocessor, well, processing. Moving up the abstraction pyramid, another frequent interpretation of bits is as a character (ASCII, Unicode, EBCDIC). Then there are other interpretations such as IEEE Floating point, RGBA for graphics, and so on. None of these are simple signed numbers (IEEE FP is not simple, and the arithmetic using those is very complicated).

Also, with unsigned arithmetic it is pretty straight forward (if not most efficiently) to implement the others . The converse is not true.

Kilian Foth
  • 110,899
Kristian H
  • 1,281
20

The bulk of the hardware cost for comparison operations is the subtraction. The output of the subtraction used by comparison is essentially three bits of state:

  • whether all the bits are zero (i.e. the equal condition),
  • the sign bit of the result
  • the carry bit of the subtraction (i.e. the 33rd high order bit on a 32-bit computer)

With the proper combination of testing these three bits after the subtraction operation, we can determine all the signed relational operations, as well as all the unsigned relational operations (these bits are also how overflow is detected, signed vs. unsigned). The same basic ALU hardware can be shared to implement all these comparisons (not to mention the subtraction instruction), until the final checking of those three bits of state, which differs as per the relational comparison desired. So, it is not a lot of extra hardware.

The only real cost is the need for the encoding of additional modes of comparison in the instruction set architecture, which may marginally decrease instruction density. Still, it is pretty normal that hardware has lot of instructions that aren't used by any given language.

Erik Eidt
  • 34,819
15

Because, if you need to count something that is always >= 0, you would unnecessarily cut your counting space in half using signed integers.

Consider the auto-incremented INT PK you might be putting on your database tables. If you use a signed integer there, your table stores HALF as many records as it could for the same field size with NO benefit.

Or the octets of an RGBa color. We don't want to awkwardly start counting this naturally positive-number concept at a negative number. A signed number would either break the mental model or halve our space. An unsigned integer not only matches the concept, but provides twice the resolution.

From the hardware perspective, unsigned integers are simple. They're probably the easiest bit structure to perform math on. And, no doubt, we could simplify the hardware by simulating integer types (or even floating point!) in a compiler. So, why are both unsigned and signed integers implemented in hardware?

Well ... performance!

It's more efficient to implement signed integers in hardware than it is in software. The hardware can be instructed to perform math on either type of integer in a single instruction. And that's very good, because the hardware smashes bits together more or less in parallel. If you try to simulate that in software, the integer-type that you choose to "simulate" will undoubtedly require many instructions and be noticeably slower.

svidgen
  • 15,252
10

Your question consists of two parts:

  1. What is the purpose of unsigned integers?

  2. Are unsigned integers worth the trouble?

1. What is the purpose of unsigned integers?

Unsigned numbers, quite simply, represent a class of quantities for which negative values are meaningless. Sure, you might say that the answer to the question of "how many apples do I have?" might be a negative number if you owe some apples to someone, but what about the question of "how much memory do I have?" --you cannot have a negative amount of memory. So, unsigned integers are very suitable for representing such quantities, and they have the benefit of being able to represent twice the range of positive values than signed integers can. For example, the maximum value you can represent with a 16-bit signed integer is 32767, while with a 16-bit unsigned integer it is 65535.

2. Are unsigned integers worth the trouble?

Unsigned integers do not really represent any trouble, so, yes, they are worth it. You see, they do not require an extra set of "algorithms"; the circuitry required to implement them is a subset of the circuitry required for implementing signed integers.

A CPU does not have one multiplier for signed integers and a different multiplier for unsigned ones; it has just one multiplier, which works in a slightly different way depending on the nature of the operation. Supporting signed multiplication requires a tiny bit more circuitry than unsigned, but since it needs to be supported anyway, unsigned multiplication comes practically for free, it is included in the package.

As for addition and subtraction, there is no difference in the circuitry at all. If you read up on the so-called two's complement representation of integers you will find that it is so cleverly designed that these operations can be performed in exactly the same way, regardless of the nature of the integers.

Comparison also works the same way, since it is nothing but subtract-and-discard-the-result, the only difference is in the conditional branch (jump) instructions, which work by looking at different flags of the CPU which are set by the preceding (comparison) instruction. In this answer: https://stackoverflow.com/a/9617990/773113 you can find an explanation of how they work on the Intel x86 architecture. What happens is that the designation of a conditional jump instruction as signed or unsigned depends on which flags it examines.

Mike Nakis
  • 32,803
8

Microprocessors are inherently unsigned. The signed numbers are the thing that's implemented, not the other way around.

Computers can and do work fine without signed numbers, but it us, humans that need negative numbers, hence signedness was invented.

Pieter B
  • 13,310
3

Because they have one more bit that is easily available for storage, and you don't have to worry about negative numbers. There's not much more to it than that.

Now if you need an example of where you would need this extra bit, there are plenty to be found if you look.

My favorite example comes from bitboards in Chess engines. There are 64 squares on a chess board, thus an unsigned long provides perfect storage for a variety of algorithms revolving around move generation. Considering the fact that you need to binary operations (as well as shift operations!!), it is easy to see why it is easier not to have to worry about what special things happen if the MSB is set. It can be done with signed long, but it is a lot easier to use unsigned.

riwalk
  • 7,690
3

Having a pure maths background, this is a slightly more mathematical take for anybody interested.

If we start with an 8bit signed and unsigned integer, what we have is basically the integers modulo 256, so far as addition and multiplication is concerned, provided 2's complement is used to represent negative integers (and this is how every modern processor does it).

Where things differ is in two places: one is comparison operations. In a sense, the integers modulo 256 are best considered a circle of numbers (like the integers modulo 12 do on an old-fashioned analog clockface). To make numerical comparisons (is x < y) meaningful, we needed to decide which numbers are less than others. From the mathematician's point of view, we want to embed the integers modulo 256 into the set of all integers somehow. Mapping the 8bit integer whose binary representation is all zeros to the integer 0 is the obvious thing to do. We can then proceed to map others so that '0+1' (the result of zeroing a register, say ax, and the incrementing it by one, via 'inc ax') goes to the integer 1, and so on. We can do the same with -1, for example mapping '0-1' to the integer -1, and '0-1-1' to the integer -2. We must ensure that this embedding is a function, so cannot map a single 8bit integer to two integers. As such, this means that if we map all the numbers into the set of integers, 0 will be there, along with some integers less than 0 and some more than 0. There are essentially 255 ways to do this with an 8bit integer (according to what minimum you want, from 0 to -255). Then you can define 'x < y' in terms of '0 < y - x'.

There are two common use cases, for which hardware support is sensible: one with all nonzero integers being greater than 0, and one with an approximately 50/50 split around 0. All other possibilities are easily emulated by translating numbers via an extra 'add and sub' before operations, and the need for this is so rare than I can't think of an explicit example in modern software (since you can just work with a larger mantissa, say 16 bits).

The other issue is that of mapping an 8bit integer into the space of 16bit integers. Does -1 go to -1? This is what you want if 0xFF is meant to represent -1. In this case, sign-extending is the sensible thing to do, so that 0xFF goes to 0xFFFF. On the other hand, if 0xFF was meant to represent 255, then you want it mapped to 255, hence to 0x00FF, rather than 0xFFFF.

This is the difference between the 'shift' and 'arithmetic shift' operations as well.

Ultimately, however, it comes down to the fact that int's in software are not integers, but representations in binary, and only some can be represented. When designing hardware, choices have to be made as to which to do natively in hardware. Since with 2's complement the addition and multiplication operations are identical, it makes sense to represent negative integers this way. Then it is only a matter of operations which depend on which integers your binary representations are meant to represent.

John Allsup
  • 131
  • 1
2

Lets examine the implementation cost for adding unsigned integers to a CPU design with existing signed integers.

A typical CPU needs the following arithmetic instructions:

  • ADD (which adds two values and sets a flag if the operation overflows)
  • SUB (which subtracts one value from another and sets various flags -- we'll discuss these below)
  • CMP (which is essentially 'SUB and discard the result, only keep the flags')
  • LSH (left shift, set a flag on overflow)
  • RSH (right shift, set a flag if a 1 is shifted out)
  • Variants of all of the above instructions that handle carrying/borrowing from the flags, thus letting you chain the instructions together conveniently to operate on larger types than the CPU registers
  • MUL (multiply, set flags, etc -- not universally available)
  • DIV (divide, set flags, etc -- a lot of CPU architectures lack this)
  • Move from a smaller integer type (e.g. 16-bit) to a larger one (e.g. 32-bit). For signed integers, this is usually called MOVSX (move with sign extend).

It also needs logical instructions:

  • Branch on zero
  • Branch on greater
  • Branch on less
  • Branch on overflow
  • Negated versions of all of the above

To perform the above branches on signed integer comparisons, the easiest way is to have the SUB instruction set the following flags:

  • Zero. Set if the subtraction resulted in a value of zero.
  • Overflow. Set if the subtraction borrowed a value from the most significant bit.
  • Sign. Set to the result's sign bit.

Then the arithmetic branches are implemented as follows:

  • Branch on zero: if zero flag is set
  • Branch on less: if the sign flag is different to the overflow flag
  • Branch on greater: if the sign flag is equal to the overflow flag, and the zero flag is clear.

The negations of these should follow obviously from how those are implemented.

So your existing design already implements all of these for signed integers. Now lets consider what we need to do to add unsigned integers:

  • ADD -- the implementation of ADD is identical.
  • SUB -- we need to add an extra flag: the carry flag is set when a value is borrowed from beyond the register's most significant bit.
  • CMP -- doesn't change
  • LSH -- doesn't change
  • RSH -- the right shift for signed values retains the value of the most significant bit. For unsigned values, we should instead set it to zero.
  • MUL -- if your output size is the same as input, no special handling is required (x86 does have special handling, but only because it has output into a register pair, but note that this facility is actually quite rarely used, so would be a more obvious candidate to leave out of a processor than unsigned types)
  • DIV -- no changes required
  • Move from smaller type to larger type -- need to add MOVZX, move with zero extend. Note that MOVZX is extremely simple to implement.
  • Branch on zero -- unchanged
  • Branch on less -- jumps when carry flag set.
  • Branch on greater -- jumps if carry flag and zero both clear.

Note that in each case, the modifications are very simple, and can be implemented simply by gating a small section of circuitry on or off, or by adding a new flag register than can be controlled by a value that needs to be calculated as part of the instruction's implementation anyway.

Therefore, the cost of adding unsigned instructions is very small. As to why it should be done, note that memory addresses (and offsets in arrays) are inherently unsigned values. As programs spend a lot of time manipulating memory addresses, having a type that handles them correctly makes the programs easier to write.

2

Unsigned numbers exist largely to handle situations where one needs a wrapping algebraic ring (for a 16-bit unsigned type, it would be the ring of integers congruent mod 65536). Take a value, add any amount less than the modulus, and the difference between the two values will be the amount that was added. As a real-world example, if a utility meter reads 9995 at the start of a month and one uses 23 units, the meter will read 0018 at the end of the month. When using an algebraic-ring type, there's no need to do anything special to deal with overflow. Subtracting 9995 from 0018 will yield 0023, precisely the number of units that were used.

On the PDP-11, the machine for which C was first implemented, there were no unsigned integer types but signed types could be used for modular arithmetic which wrapped between 32767 and -32768 rather than between 65535 and 0. The integer instructions on some other platforms did not wrap things cleanly, however; rather than requiring that implementations must emulate the two's-complement integers used in the PDP-11, the language instead added unsigned types which mostly had to behave as algebraic rings, and allowed signed integer types to behave in other ways in case of overflow.

In the early days of C, there were many quantities which could exceed 32767 (the common INT_MAX) but not 65535 (the common UINT_MAX). It thus became common to use unsigned types to hold such quantities (e.g. size_t). Unfortunately, there is nothing in the language to distinguish between types that should behave like numbers with an extra bit of positive range, versus types that should behave like algebraic rings. Instead, the language makes types smaller than "int" behave like numbers while full-sized types behave like algebraic rings. Consequently, calling function like:

uint32_t mul(uint16_t a, uint16_t b) { return a*b; }

with (65535, 65535) will have one defined behavior on systems where int is 16 bits (i.e. return 1), a different behavior where int is 33 bits or larger (return 0xFFFE0001), and Undefined Behavior on systems where "int" is anywhere in-between [note that gcc will usually yield arithmetically-correct results with results between INT_MAX+1u and UINT_MAX, but will sometimes generate code for the above function which fails with such values!]. Not very helpful.

Still, the lack of types which behave consistently like numbers or consistently like an algebraic ring does not change the fact that algebraic ring types are almost indispensable for some kinds of programming.

supercat
  • 8,629