34

How was it decided that if you have an array/struct or anything similiar in a programming language it should be zero-based? Wouldn't it have been easier if it was 1-based. Afer all, when we are taught to count, we start with one.

RHPT
  • 991

9 Answers9

34

All good answers. A good part of my "career" was spent in Fortran, where all arrays are 1-based. It's OK if you're writing math algorithms over vectors and matrices, where indices naturally go 1 .. N.

But as soon as you start trying to do computer-science type algorithms, where you have a big array and you are working on pieces of it, as in binary search, or heap sort, or if it is a memory array and you are writing memory allocation and freeing algorithms, or starting to act like parts of it are actually multidimensional arrays that you have to calculate indices in, that 1-based stuff gets to be a real source of confusion.

For example, if you have a 1-dimensional array A, and you want to treat it as a 2-dimensional NxM array, where I and J are the index variables, in C you just say:

A[ I + N*J ]

but in Fortran you say

A( (I-1) + N*(J-1) + 1 )
       or
A( I + N*(J-1) )

If it was 3-dimensional, you had to do

A( I + N*(J-1) + N*M*(K-1) )

(That's if it was column-major order, as opposed to row-major order which is more common in C.)

What I learned to do in Fortran, when doing string manipulation algorithms, was never to think of an index I as being the position of an element in an array. Rather I would think of a "distance" N as being the number of elements coming before the element of interest. In other words, always think in terms of "number of elements" rather that "index of element". That enabled me to work within what was an unnatural indexing scheme.

Mike Dunlavey
  • 12,905
32

Think of an array index just as an offset from the start.

Adam Lear
  • 32,069
mouviciel
  • 15,491
23

Dijkstra answered this very clearly in http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html - though Pascal programmers didn't agree.

btilly
  • 18,340
11

In mathematics for centuries the subscript of a series has been chosen for convenience and meaning. For example in a polynomial, the coefficients are usually labelled a0, a1, a2, a3, etc. because the zero represents the power of the corresponding term. In computer science, the subscript represents the offset relative to the beginning of the array.

9

The difference is that it's not a human counting, it's the computer. It is easier for computers to think of 0 as the first item, as it is usually just an offset from the memory location. It's logical to start at 0000, then 0001, then 0010. If you started at 1, you would either lose available index (acting like 0000 isn't valid) or have awkwardness to make sure that the compiler always knows it should decrement the index by one before it actually works.

Plus it isn't that hard to pick up after your first programming class and you are told this is the way things work.

6

I see two reasons:

The low level one. If you start with 0, than the pointer element indicated by index is pointer of array + index (a[i] and a + i point to the same memory address). This is very convenient.

On a bit higher level of abstraction -- very often you will have to use modulo function for indices. Modulo n always returns values from 0 to n-1. So it's also more convenient when indices start with 0.

vartec
  • 20,846
5

From a modern standpoint (i.e. interpreted languages or recently developed languages such as C#) this is likely due to developers and thus language designers learned to develop since languages such as C make use of zero indexed arrays. As to why languages have made use of the zero index for arrays, the index is due to how the array is stored. In C, the use of the array index array[1] is the same as using the pointer reference, i.e. array + 1. Wikipedia also has a bit more on the subject, but that is one reason in a nutshell.

rjzii
  • 11,304
5

As has been pointed out, not all languages use 0-based indexing. For example in Ada you can define your indexing basis freely. For String type for example 1-based indexing is used.

The benefit of this is that you can define the indexing to best match the intended usage. In some cases it might sense to for example define the array bounds as -1..+1.

For example a symmetric 5x5 kernel for convolution could be defined as

type KernelT is array (-2..+2, -2..+2) of Real;

Similary enumerated types can be used directly for array indexing.

Schedler
  • 519
3

The simple answer is zero is simpler when one tries to calculate the actual address in memory. If an array starts at location 12345 and you wish to access element 678 then adding the two gives the location that needs to be read. Note the above assumes the array holds bytes and is of course bigger than 678. If you used 1 based indexing then an additional subtract 1 would be required. This quickly becomes a pain andall things said 1 does not really buy anything.

mP01
  • 291