[Edu-sig] counts, indexes, slices

Sat Apr 17 15:40:59 EDT 2004

Hello,

New to the list, I join you because I feel deeply concerned with readability
or clearness in programming. I guess this may be a central question for you,
now for me what's good for newbies, or for a teacher, is good for trained
and experimented programmers coding, debugging and maintaining applications.
For that reason, even if i'm an amateur, I try to find out how to design
better programming forms (structure, syntax & lexicon). Well, actually I
don't esteem and can hardly use obscure sets of rules ; and even when
speaking of the best ones (like imo python or objective pascal), their worse
aspects really ennoy me : I dream of mixing them -- together with other
ideas of mine -- for the best.
There seems to be here a large debate upon the subject(s) of index base and
intervals in python & programming languages in general, which is also a
point on which I had a lot of reflexions. I'll try an wide overview
commenting what I read about all of that mess. This has unfortunately to be
pretty long -- and does not pretend to objectivity at all. At the end of the
message is a copy of the most argumented text I could find, opposed to my
own opinion.
<please be tolerant about faulty expression : english is a foreign language
for me>

-1- counting & pointing things

The literature about index base (0 or 1) often shows confusion between
ordinal and cardinal numbers. Natural numbers may be used for pointing an
element in an ordered set (1st, 2nd... Nth), answering the question "which
one ?", or for counting the number of elements ((0), 1, 2... N) in any set,
be it ordered or not, then answering the question "how many ?".
That confusion may partially comme from the fact that we still 'primitively'
nearly only interpret data through the filter of the (cardinal) number.
Another point is that computer science is massively and before all english
speaking. Yet english does not easily distinguish ordinals (french "numéro",
german "Nummer") and cardinals (german "Zahl", french "nombre") : both are
called numbers, like in 'the number of bits in a byte' and 'the bit number
2'.
One may imagine that, in a parallel world, programming languages commonly
include a data type called 'subscript', "position' 'Nummer' or 'numero',
used to address one thing among others.

-2- counting on 0, pointing from ?

>From the counting point of view, we meet the mathematical issue "is zero a
natural number ?", a rather endless question. Even if one may answer 'no',
when taking into account historical, sociological and linguistic
observations, the fact is we need it in programming : where people say "no
apple in the basket", "nobody here" or "neither time nore money left", the
time is still far when computer languages will be able to process such
expressions ; they use the "0" token instead.
[But the programs should take care of this, and I personly have few
consideratrion for programmers to lazy or respectless of users to type
simple instructions like (in pseudocode) :
test nbrWords:
    case 0 : screen.print 'no word found matching the search criterium'
    case 1 : screen.print 'one word found matching the search criterium'
    case else : screen.print nbrWords, 'words found matching the search
criterium'
...so that the end user has to read such gibberish
'0 word(s) found matching the search criterium']
Anyway, all of this deals with the counting purpose of natural numbers. If
there was a "counter" or "numbering" data type in a language, this type
should obviously include zero. Yet, what about an "index" or "position" or
"subscript" type ?

-3- indexes and ranges

When talking about pointing things in a set, another confusion may be made
by not distinguishing indexing and slicing fonctions. The underlying models
of these operations need not be analogue -- and as a fact they aren't in
python.
The offset model is generally used to explain C and python's 0-based indexes
:
|  0  |  1  |  2  |     <-- offset indexes (from the sequence's start)
| 1st | 2nd | 3rd |     <-- elements : myList[1] is the second one
On the other hand, python's syntax for ranges and slicing may either be
defined like a 0-based half-open interval (that is, excluding the last
index) or described with the distance metaphor :
0     1     2     3     <-- distance indexes (idem)
| 1st | 2nd | 3rd |     <-- elements : myList[1:2] returns a list holding
only the second one
[Well, the fact that offset and distance are similar notions used in this
case to distinguish two opposite models adds a bit of confusion, no ?]
As a consequence of this difference, an argument pro or contra one of this
models cannot be transposed to the other one. For instance, an argument pro
or contra 0-based indexes won't necessarily be appropriate by slicing. The
point is then : "Should'nt both syntaxes be based on the same model ?".

-4- naturalness and learning

"It's natural (to me)" is often invoked as an argument. Well, of course,
there may be much diversity between people and cultures. Nevertheless, one
can doubt when reading "I have always naturally counted from 0, so python's
indexing mode fits me". To my opinion, even a addict C-programmer asking for
his way and beeing answered "the second street on your left" would *not*
turn in the third one. Similarly, if this poor (in both meanings) programmer
has to do a cleaning job in a hotel (to survive in our modern world), and is
expected to clean up rooms number 3 to 5, he probably won't clean only the
fourth and the fifth.
For teaching / learning purpose, it may appear a both efficient and relevant
criterium of naturalness to wonder how difficult and/or complicated the
explanation model of a particuliar syntax form is. We just watched the
models for 0-based indexing and half-open ranges. Now simply imagine you
have to learn or teaching a language whose syntax reads (pseudo-code) :
myList[3]          returns the third element
myList[2 to 4]     returns second, third and fourth elements
What would the explanations look like ? What explanations, if any ? Let's
concede there may be people to whom this syntax is not self-evident, but the
model needed to make it clear, even by self-learning, should not be so hard
to build. Or what ?

-5- tricky intervals and reality

There are two common technical arguments for half-open intervals. My opinion
is that they are no good things but rather tricks we should avoid using.
[see the text at the end of this message]
One presents as an advantage that the difference between both border indexes
is the length of the slice : days[19:21] 's cardinal is actually 21-19=2
days. Very true. Now let's watch the real world and say I will attend a
python training course next month, from the 19th to the 21th. Well, 21-19=2,
but the course lasts three days, and if I need to stop working for it, my
boss will substract 3*8 hours from my wages.
A syntax like days[19 to 21], a closed interval including both borders and
then meaning 3 days, meets the actual reality : it's better.
The second advantage exposed by half-open intervals' advocates is, the
glueing of two subslices with the same middle index builds the whole
sequence :
myList[0:p] + myList[p:n] == myList[0:n]
Very true again. And again that syntax does not reflect reality, where this
would lead to a double element p. For instance, imagine you're copying your
friends' phone number on a beautiful new phone book, when the phone rings
just as you're writing Jane's number ; you let three rings pass, the time
for controlling the number, then answer. After the call, you get back to
your task with Jason's number. In other words :
myList[1 to p] + myList[p+1 to n] == myList[1:n]
Another example : if you're moving, carrying cases into a truck, and have a
bier pause. As a well organised person, you labeled the cases with flashy
numbers referring to al list of content. Say that you stopped for a bier
after case number p, you won't be able to resume with that same case, as it
already is in the truck ! Again :
myList[1 to p] + myList[p+1 to n] == myList[1:n]    matches the reality
In both cases (length of the sequence, glueing of subsequences) python's
syntax transforms the reality it's expected to modelize, I would say that
its superficial advantages hide a trick or a trap. While the actual
situation is properly reflected by an alternative syntax like the above
pseudocode, even if it seems slightly more complicated at first look,
because of the '+1' operation. If ever people find difficult to understand
that the length of sequence is not the difference of its border indexes, or
that you must not repeat a medium index when glueing subsets (what we maybe
all have experimented), then these problems are natural problems which
should be obvious in the code.
Now consider the case where we have to glue subsets repeating an
intermediate element p :
myList[0:p+1] + myList[p:n]
     vs
myList[1 to p] + myList[p to n]

-6- the analogy criterium

There may be a quality criteria saying : "a good program (language) looks
like what it does -- or modelizes." Thus a good language should lead the
programmer to writing instructions which present, so to say, a analogy whith
what they describe.
This is relevant for the above examples, as for many programming structures.
I personly miss 'until' clauses or branching test ('select case') in python
for this reason. Nested if...elif...if or while 1...break structures seem
artificial to me ; they bring up worse analogies to what they actually do.
>From that point of view, I'd like more visual programming environments.
Python's identation is for me a great step for this reason. Guido imposed
that 'against winds and tides' (as frenchies say) : I find that courageous
and right.

denis

***********************************
here follows a argumentation for half-open intervals
and 0-based indexes, both about slicing (not simple indexing)
to be read carefully
source of original text :
http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF
***********************************

EWD831-0

Edsger W. Dijkstra
Why numbering should start at zero

To denote the subsequence of natural numbers 2, 3, ..., 12 without the
pernicious three dots, four conventions are open to us

a)   2 <= i < 13

b)   1 < i = 12

c)   2 <= i = 12

d)   1 < i < 13

Are there reasons to prefer one convention to the other? Yes, there are. The
observation that conventions a) and b) have the advantage that the
difference between the bounds as mentioned equals the length of the
subsequence is valid. So is the observation that, as a consequence, in
either convention two subsequences are adjacent means that the upper bound
of the one equals the lower bound of the other. Valid as these observations
are, they don't enable us to choose between a) and b); so let us start
afresh.

There is a smallest natural number. Exclusion of the lower bound --as in b)
and d)-- forces for a subsequence starting at the smallest natural number
the lower bound as mentioned into the realm of the unnatural numbers. That
is ugly, so for the lower bound we prefer the = as in a) and c). Consider
now the subsequences starting at the smallest natural number: inclusion of
the upper bound would then force the latter to be unnatural by the time the
sequence has shrunk to the empty one. That is ugly, so for the upper bound
we prefer < as

----------------------------------------------------------------------------
----

EWD831-1

in a) and d). We conclude that convention a) is to be preferred.

Remark  The programming language Mesa, developed at Xerox PARC, has special
notations for intervals of integers in all four conventions. Extensive
experience with Mesa has shown that the use of the other three conventions
has been a constant source of clumsiness and mistakes, and on account of
that experience Mesa programmers are now strongly advised not to use the
latter three available features. I mention this experimental evidence --for
what it is worth-- because some people feel uncomfortable with conclusions
that have not been confirmed in practise. (End of Remark.)

*                *
*

When dealing with a sequence of length N, the elements of which we wish to
distinguish by subscript, the next vexing question is what subscript value
to assign to its starting element. Adhering to convention a) yields, when
starting with subscript 1, the subscript range 1 = i < N+1; starting with 0,
however, gives the nicer range 0 = i < N. So let us let our ordinals start
at zero: an element's ordinal (subscript) equals the number of elements
preceding it in the sequence. And the moral of the story is that we had
better regard --after all those centuries!-- zero as a most natural number.

Remark  Many programming languages have been designed without due attention
to this detail. In FORTRAN subscripts always start at 1; in AL-

----------------------------------------------------------------------------
----

EWD831-2

GOL 60 and in PASCAL, convention c) has been adopted; the more recent SASL
has fallen back on the FORTRAN convention: a sequence in SASL is at the same
time a function on the positive integers. Pity! (End of Remark.)

*                *
*

The above has been triggered by a recent incident, when, in an emotional
outburst, one of my mathematical colleagues at the University --not a
computing scientist-- accused a number of younger computing scientists of
"pedantry" because --as they do by habit-- they started numbering at zero.
He took consciously adopting the most sensible convention as a provocation.
(Also the "End of ..." convention is viewed of as provocative; but the
convention is useful: I know of a student who almost failed at an
examination by the tacit assumption that the questions ended at the bottom
of the first page.) I think Antony Jay is right when he states: "In
corporate religions as in others, the heretic must be cast out not because
of the probability that he is wrong but because of the possibility that he
is right."

 Plataanstraat 5
5671 AL NUENEN
The Netherlands
  11 August 1982
prof.dr. Edsger W. Dijkstra
Burroughs Research Fellow

----------------------------------------------------------------------------
----

Transcriber: Kevin Hely.
Last revised on Thu, 19 Jun 2003.