inclusive-lower-bound, exclusive-upper-bound (was Re: Range Operation pre-PEP)

Alex Martelli aleaxit at yahoo.com
Fri May 11 09:25:45 EDT 2001


"Andrew Maizels" <andrew at one.net.au> wrote in message
news:3AFB0DB9.BDAE54A5 at one.net.au...
> Alex Martelli wrote:
>
> > I disagree.  Inclusive-lower-bound, exclusive-upper-bound
> > is a VERY important and useful idiom, which we should
    ...
> I can see where consistency is important, but why does Python do the
> inclusive-lower-bound, exclusive-upper-bound thing?

The inclusive-lower, exclusive-upper idiom, also called
the "half-open range" idiom, is intrinsically simpler
than closed-range alternatives.  Which is why it has
"taken over" in so many languages and frameworks: its
simplicity has it all over the _apparent_ (surface!)
"convenience" of closed-range.

As an example of where this idiom has "taken over": in
C, it is specified that, for an array x of N elements,
while there is no element x[N] it IS correct to _take
the address_ of x[N].  This must be allowed by any
conformant implementation, and is included specifically
for the purpose of letting the programmer easily specify
an exclusive upper bound and use half-open ranges.

The C++ standard library takes this further, of course,
since just about _every_ standard algorithm consistently
takes as arguments inclusive-start and exclusive-end
iterators.  But it IS just an extension of what Koenig
explains in his book "C Traps and Pitfalls", about how
always using half-open ranges helps avoid off-by-one
errors that otherwise tend to plague programs.


> From my point of view, there's one correct behaviour for range(1,5), and
> it's NOT [1, 2, 3, 4]!

It is a common phenomenon for human beings to consider
"correct", "natural", or "right", something that is not
optimal from the viewpoint of underlying mathematical
reality.  Fortunately, human beings are much more
flexible than maths tend to be:-).  My favourite example
is how, for millennia, the concept of "zero" as a number
was hotly rejected by the keenest thinkers -- Parmenides
and Zeno made a particular career of this ("Non-being
is not, and cannot be!"), but it was a widespread idea.
As a result, no decent way to represent numbers and do
operations with them was available -- addition was high
mathematics, only wizards could multiply large numbers.

Then, some Indian thinker (maybe under Buddhist influence,
given the likely timeframe) came up with this weird "zero"
idea -- the Arabs took it westwards -- Italian merchants
took it from the Arabs, added double-entry bookkeeping --
and modern arithmetic and accounting were born.  A few
centuries later, we typically accept "zero" as a pretty
natural concept (traces of "number" as meaning "more than
zero" remain, but they're more or less like fossil traces,
in natural language, in how we number our years, &c).

This is not completely an aside... the ease of denoting
EMPTY ranges is part of what makes a half-open interval
simpler and handier as a general concept.  range(x,x) is
a nice way to denote empty ranges, more than range(x,x-1)
would be if range was closed rather than half-open.

More generally, range(x,y) has max(y-x, 0) items -- a
VERY nice property, also easily expressed as y-x if y>=x.
If range was closed, the number of items would be y-x+1,
and that '+1' is the spoiler... cause of deucedly many
off-by-one errors.

Similarly, range(x,y)+range(y,z) = range(x,z), ANOTHER
very nice property.  Having such simple axioms means the
behavior of 'range' can easily be symbolically expanded,
and therefore also mentally understood, in more complex
cases.  Programs are easier to understand and prove
correct, by having fewer '+1' and '-1' strewn around,
when half-open ranges are the usage norm.


One approach to help a programmer get used to half
open ranges: with half-open ranges, we only need ONE
basic concept, that of _FIRST_ index of some sequence.
When we say we process range(x,y), we say that x is
the first-index of what we process, and y is the
first-index of what we do NOT process.  With closed
ranges, we would need TWO basic concepts, that of
first AND that of _LAST_.  One basic concept is easier
to handle (mentally, symbolically, or in whatever way)
than two of them.

Say you want "the first N elements starting from the
x-th one".  Isn't range(x,x+N) an easier way to
express this than range(x,x+N-1)?  Or say that L
is the length of the sequence and you want the _last_
N elements -- range(L-N,L) is, again, an easy and
natural way to get their indices, isn't it?

Say we get a list of strings that are to be catenated
into one big string: as well as catenating them, we
also want to record the places in the big string where
the strings can be located, so we can still recover
the original small-strings within the big one.  A very
natural approach:

def catWithTracking(manystrings):
    start_at = [0]
    tot_len_so_far = 0
    for astring in manystrings:
        tot_len_so_far += len(astring)
        start_at.append(tot_len_so_far)
    return ''.join(manystrings), start_at

Nice, right?  Not a +1 nor a -1 anywhere, and just
one list of indices being computed and returned.  OK,
now, how do we get one small string back from the
big one and the list of indices?

def recoverOne(i, bigstring, indices):
    return bigstring[indices[i]:indices[i+1]]

this is the slice-approach of course -- but then it
IS nice that slice and range() behave just the same
way, isn't it?  Both half-open ranges... of course.
If we need the INDICES inside the big string of the
i-th original small string, it will be just as we
do for slicing it: range(indices[i],indices[i+1]).
No irregularity, everything smooth as pie.

Now, of course, we can make even more arguments for
half-open ranges in _slicing_ specifically.  But
they _are_ just "more of the same" wrt the general
arguments for half-open ranges as a general idiom.


Try designing your own libraries so that, whenever
a start and a stop index are specified, the half
open range approach is used... see if both the use
and implementation thereof doesn't profit...


Alex






More information about the Python-list mailing list