[Python-Dev] More int/long integration issues

Chad Netzer cnetzer@mail.arc.nasa.gov
13 Mar 2003 12:25:17 -0800


On Thu, 2003-03-13 at 08:42, Aahz wrote:
> On Thu, Mar 13, 2003, David Abrahams wrote:

> > Now that we have a kind of long/int integration, maybe it makes sense
> > to update xrange()?  Or is that really a 2.4 feature?
> 
> IIRC, it was decided that doing that wouldn't make sense until the
> standard sequences (lists/tuples) can support more than 2**31 items.

I'm working on a patch that allows both range() and xrange() to work
with large (PyLong) values.  Currently, with my patch, the length of
range is still limited to a C long (due to memory issues anyway), and
xrange() could support longer sequences (conceptually), although
indexing them still is limited to C int indices.

I noticed the need for a least supporting long values when I found some
bugs in code that did things like:

a = 1/1e-5
range( a-20, a)

or

a = 1/1e-6
b = 1/1e-5
c = 1/1e-4
range(a, b, c)

Now, this example is hardcoded, but in graphing software, or other
numerical work, the actual values come from the data set.  All of a
sudden, you could be dealing with very small numbers (say, because you
want to examine error values), and you get:

a = 1/1e-21
b = 1/1e-20
c = 1/1e-19
range(a, b, c)

And your piece of code now fails.  By the comments I've seen, this
failure tends to come as a big surprise (people are simply expecting
range to be able to work with PyLong values, over short lengths).

Also, someone who is working with large files (> C long on his machine)
claimed to be having problems w/ xrange() failing (although, if he is
indexing the xrange object, my patch can't help anyway)

I've seen enough people asking in the newsgroups about this behavior (at
least four in the past 5 months or so), and I've submitted some
application patches to make things work for these cases (ie. by
explicitly subtracting out the large common base of each parameter, and
adding it back in after the list is generated), so I decided to make a
patch to change the range behavior.

Fixing range was relatively easy, and could be done with no performance
penalty (the code to handle longs ranges is only invoked after the
existing code path fails; the common case is unaltered).  Fixing
xrange() is trickier, and I'm opting to maintain backwards compatibility
as much as possible.

In any case, I should have the patch ready to submit within the next
week or so (just a few hours more work is needed, for testing and
cleanup)

Then the argument about whether it should ever be included can begin in
earnest.  But I have seen enough examples of people being surprised that
ranges of long values (where the range length is well within the
addressable limit, but the range values must be PyLongs) that I think at
least range() should be fixed.  And if range() is fixed, then sadly,
xrange() should be fixed as well (IMO).

BTW, I'm all for deprecating xrange() with all deliberate speed.  Doing
so would only make updating range behavior easier.

Chad