[Python-3000] range() issues
Guido van Rossum
guido at python.org
Wed Apr 30 04:36:28 CEST 2008
I propose to remove the support for indexing; it is a carryover from
before Python 2.2 when there was no .next() method.
There are good reasons for having range() return an Iterable and not
an Iterator; e.g.
R = range(N)
for i in R:
for j in R:
....
so here I propose to keep the status quo.
Let's also fix __len__() so that it returns sys.{maxint,maxsize} when
the result doesn't fit in a Py_ssize_t.
I am worried that the debates about repr()/hash()/eq() are similarly
stuck in vicious circles; I'll have to think about how to untie those
knots, but they're unrelated to the sequence/iterator/iterable debate.
--Guido
On Tue, Apr 29, 2008 at 7:16 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Tue, Apr 29, 2008 at 7:48 PM, Guido van Rossum <guido at python.org> wrote:
> ..
>
> > Let's just stop the discussion here and kill all proposals to add
> > indexing/slicing etc. Sorry, Alexander, but there just isn't anyone
> > besides you in favor, and nobody has brought up a convincing use case.
> >
>
> That's fair, but let me wrap up by rehashing the current state of affairs.
>
> 1. Both 2.x xrange and 3.x range support indexing. A comment in py3k
> branch says "range(...)[x] is necessary for: seq[:] = range(...),"
> but this is apparently wrong:
>
> >>> x = []
> >>> x[:] = iter([1,2,3])
> >>> x
> [1, 2, 3]
>
> 2. In 3.x, ranges longer that sys.sizemax are allowed, but cannot be
> indexed even with small indexes, for example, range(2**100)[0] raises
> an OverflowError. There is little justification for this behavior. A
> 3-line patch can fix the situation for small indexes and Amaury
> demonstrated [1] that with some effort arbitrary indexes can be
> supported.
>
> [1] http://bugs.python.org/file10109/anyrange.patch
>
> 3. There is an ongoing debate [2] on how comparison and hashing should
> be implemented for range objects.
>
> My point is that current implementation of 3.x is neither here nor
> there. It is not simple: it does not even do what its documentation
> says:
>
> >>> print(range.__doc__)
> range([start,] stop[, step]) -> range object
>
>
> Returns an iterator that generates the numbers in the range on demand.
> >>> range(10).__next__()
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'range' object has no attribute '__next__'
>
> It supports some sequence methods (len and subscripting), but not
> others (__contains__ and slicing).
>
> My use case for making range a Sequence is as follows. I frequently
> deal with data organized in column oriented tables. These tables
> often need a column that represents the row number. A range object
> would allow an efficient representation of such column, but having
> such a virtual column in the table would mean that generic sequence
> manipulation functions will not work on some columns.
>
> This is not a strong itch, though. While virtualizing row number
> column using range() is an attractive solution, in practice memory
> savings compared to numpy's arange() (or array('i', range(..))) are
> not that significant. However, if slicing support is axed based on
> complexity considerations, I don't see how supporting indexing can be
> justified. Moreover, since indexing and slicing can reuse the same
> start + i*step computation, the incremental code complexity of slicing
> support is small, so for me the two go hand in hand. For these
> reasons, I believe that either of the following alternatives is better
> than the status quo:
>
> 1. Make range(..) return a Sequence.
>
> 2. Make range(..) return an Iterator. (While I prefer #1, there are
> several advantages of this proposal: in the common list(range(..)) and
> for i in range(..) cases, creation of an intermediate object will go
> away; we will stop debating what hash(range(..)) should return [2];
> and finally we will not need to change the docstring :-).)
>
> [2] http://bugs.python.org/issue2603
>
>
>
> > __len__ will always be problematic when there are more values than can
> > be counted in a signed C long; maybe we should do what the Java
> > collections package does: for once, Java chooses practicality over
> > purity, and simply states that if the length doesn't fit, the largest
> > number that does fit is returned (i.e. for us that would be
> > sys.maxsize in 3.0, sys.maxint in 2.x).
>
> This is another simple way to fix range(2**100)[0] buglett.
>
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list