[Python-3000] range() issues

Wed Apr 30 04:36:28 CEST 2008

I propose to remove the support for indexing; it is a carryover from
before Python 2.2 when there was no .next() method.

There are good reasons for having range() return an Iterable and not
an Iterator; e.g.

R = range(N)
for i in R:
  for j in R:
    ....

so here I propose to keep the status quo.

Let's also fix __len__() so that it returns sys.{maxint,maxsize} when
the result doesn't fit in a Py_ssize_t.

I am worried that the debates about repr()/hash()/eq() are similarly
stuck in vicious circles; I'll have to think about how to untie those
knots, but they're unrelated to the sequence/iterator/iterable debate.

--Guido

On Tue, Apr 29, 2008 at 7:16 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Tue, Apr 29, 2008 at 7:48 PM, Guido van Rossum <guido at python.org> wrote:
>  ..
>
> >  Let's just stop the discussion here and kill all proposals to add
>  >  indexing/slicing etc. Sorry, Alexander, but there just isn't anyone
>  >  besides you in favor, and nobody has brought up a convincing use case.
>  >
>
>  That's fair, but let me wrap up by rehashing the current state of affairs.
>
>  1. Both 2.x xrange and 3.x range support indexing.  A comment in py3k
>  branch says "range(...)[x] is necessary for:  seq[:] = range(...),"
>  but this is apparently wrong:
>
>  >>> x = []
>  >>> x[:] = iter([1,2,3])
>  >>> x
>  [1, 2, 3]
>
>  2. In 3.x, ranges longer that sys.sizemax are allowed, but cannot be
>  indexed even with small indexes, for example, range(2**100)[0] raises
>  an OverflowError.  There is little justification for this behavior.  A
>  3-line patch can fix the situation for small indexes and Amaury
>  demonstrated [1] that with some effort arbitrary indexes can be
>  supported.
>
>  [1] http://bugs.python.org/file10109/anyrange.patch
>
>  3. There is an ongoing debate [2] on how comparison and hashing should
>  be implemented for range objects.
>
>  My point is that current implementation of 3.x is neither here nor
>  there.  It is not simple: it does not even do what its documentation
>  says:
>
>  >>> print(range.__doc__)
>  range([start,] stop[, step]) -> range object
>
>
>  Returns an iterator that generates the numbers in the range on demand.
>  >>> range(10).__next__()
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>     AttributeError: 'range' object has no attribute '__next__'
>
>  It supports some sequence methods (len and subscripting), but not
>  others (__contains__ and slicing).
>
>  My use case for making range a Sequence is as follows.  I frequently
>  deal with data organized in column oriented tables.  These tables
>  often need a column that represents the row number.  A range object
>  would allow an efficient representation of such column, but having
>  such a virtual column in the table would mean that generic sequence
>  manipulation functions will not work on some columns.
>
>  This is not a strong itch, though.  While virtualizing row number
>  column using range() is an attractive solution, in practice memory
>  savings compared to numpy's arange() (or array('i', range(..))) are
>  not that significant.  However, if slicing support is axed based on
>  complexity considerations, I don't see how supporting indexing can be
>  justified.  Moreover, since indexing and slicing can reuse the same
>  start + i*step computation, the incremental code complexity of slicing
>  support is small, so for me the two go hand in hand.  For these
>  reasons, I believe that either of the following alternatives is better
>  than the status quo:
>
>  1. Make range(..) return a Sequence.
>
>  2. Make range(..) return an Iterator.  (While I prefer #1, there are
>  several advantages of this proposal: in the common list(range(..)) and
>  for i in range(..) cases, creation of an intermediate object will go
>  away; we will stop debating what hash(range(..)) should return [2];
>  and finally we will not need to change the docstring :-).)
>
>  [2] http://bugs.python.org/issue2603
>
>
>
>  >  __len__ will always be problematic when there are more values than can
>  >  be counted in a signed C long; maybe we should do what the Java
>  >  collections package does: for once, Java chooses practicality over
>  >  purity, and simply states that if the length doesn't fit, the largest
>  >  number that does fit is returned (i.e. for us that would be
>  >  sys.maxsize in 3.0, sys.maxint in 2.x).
>
>  This is another simple way to fix  range(2**100)[0] buglett.
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)