[Tutor] 'Common' mistake ... for other newbies [counting starts at zero?]

Mon Apr 12 18:50:48 EDT 2004

On Mon, 12 Apr 2004, denis wrote:

> Are they here people who think that such a so-called "mistake" is not a
> mistake ? That David programmed the way he (and we, humans should ; and
> that the error lies in language, not in his brain ? Why are indexes
> based on 0 instead of 1 ?

Hi Denis,

It's not a mistake, but it is one of those points of confusion:

    http://c2.com/cgi/wiki?ZeroAndOneBasedIndexes

Edgar Dijkstra wrote a memo about this (also linked from the above url):

    http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

So this is something that has been at least planned --- it's not an
arbitrary feature that indices start at zero.

There are some other nice features about zero indexing.  Let's say that we
have a sample of words:

###
>>> def sample(L, n=50):
...     """Returns a random sampling of 'n' elements out of L."""
...     L2 = [(random.random(), x) for x in L]
...     L2.sort()
...     return [x for (r, x) in L2[:n]]
...
>>> words = sample([w.strip() for w in open('/usr/share/dict/words')], 20)
>>> words
['acquired', 'Rocco', 'alternatively', 'acculturates', 'asphyxia',
 'parcel', 'establish', 'antagonizing', 'unreliable', 'triangular',
 'silt', 'inflatable', 'regards', 'nugget', 'seeker', 'eigenvalue',
 'antiquarian', 'seducing', 'immerse', 'plastered']
###

and want to break these words down into a few groups.  There are at least
two direct ways of doing this, based on simple arithmetic:  the division
and the remainder operations:

###
>>> for i in range(len(words)):
...     print words[i], i / 5
...
acquired 0
Rocco 0
alternatively 0
acculturates 0
asphyxia 0
parcel 1
establish 1
antagonizing 1
unreliable 1
triangular 1
silt 2
inflatable 2
regards 2
nugget 2
seeker 2
eigenvalue 3
antiquarian 3
seducing 3
immerse 3
plastered 3
>>>
>>>
>>> for i in range(len(words)):
...     print words[i], i % 4
...
acquired 0
Rocco 1
alternatively 2
acculturates 3
asphyxia 0
parcel 1
establish 2
antagonizing 3
unreliable 0
triangular 1
silt 2
inflatable 3
regards 0
nugget 1
seeker 2
eigenvalue 3
antiquarian 0
seducing 1
immerse 2
plastered 3
###

That is, we can use the division operation on the indicies to say that:

    'acquired', 'Rocco', 'alternatively', 'acculturates', and 'asphyxia'

are all part of a single group.  Or we can use the remainder operation,
and say that:

    'acquired', 'asphyxia', 'unreliable', 'regards' and 'antiquarian'

are all in the same group.  This divisioning is simple, and quite easy to
get right.

But if the indexing of list elements starts at one instead of zero, then
the process above will involve adding or subtracting 1's to make the math
work out.  It's also possibly a lot more error prone for newcomers (as
well as experienced programmers!) to see off-by-one errors.

This isn't to say that 0-based indexing is always easier to deal with than
1-based indexing.  But in the majority of cases that we deal with, 0-based
indexing seems to be a big win.

Hope this helps!