Mailman 3 More int/long integration issues - Python-Dev

More int/long integration issues

David Abrahams

March 13, 2003

4:27 p.m.

I was recently surprised by: Python 2.3a2+ (#1, Feb 24 2003, 15:02:10) [GCC 3.2 20020927 (prerelease)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> xrange(2 ** 32) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: long int too large to convert to int Now that we have a kind of long/int integration, maybe it makes sense to update xrange()? Or is that really a 2.4 feature? -- Dave Abrahams Boost Consulting www.boost-consulting.com

Show replies by date

Aahz

March 2003

4:42 p.m.

On Thu, Mar 13, 2003, David Abrahams wrote:

...

IIRC, it was decided that doing that wouldn't make sense until the standard sequences (lists/tuples) can support more than 2**31 items. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html

Chad Netzer

8:25 p.m.

On Thu, 2003-03-13 at 08:42, Aahz wrote:

...

On Thu, Mar 13, 2003, David Abrahams wrote:

...

I'm working on a patch that allows both range() and xrange() to work with large (PyLong) values. Currently, with my patch, the length of range is still limited to a C long (due to memory issues anyway), and xrange() could support longer sequences (conceptually), although indexing them still is limited to C int indices. I noticed the need for a least supporting long values when I found some bugs in code that did things like: a = 1/1e-5 range( a-20, a) or a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c) Now, this example is hardcoded, but in graphing software, or other numerical work, the actual values come from the data set. All of a sudden, you could be dealing with very small numbers (say, because you want to examine error values), and you get: a = 1/1e-21 b = 1/1e-20 c = 1/1e-19 range(a, b, c) And your piece of code now fails. By the comments I've seen, this failure tends to come as a big surprise (people are simply expecting range to be able to work with PyLong values, over short lengths). Also, someone who is working with large files (> C long on his machine) claimed to be having problems w/ xrange() failing (although, if he is indexing the xrange object, my patch can't help anyway) I've seen enough people asking in the newsgroups about this behavior (at least four in the past 5 months or so), and I've submitted some application patches to make things work for these cases (ie. by explicitly subtracting out the large common base of each parameter, and adding it back in after the list is generated), so I decided to make a patch to change the range behavior. Fixing range was relatively easy, and could be done with no performance penalty (the code to handle longs ranges is only invoked after the existing code path fails; the common case is unaltered). Fixing xrange() is trickier, and I'm opting to maintain backwards compatibility as much as possible. In any case, I should have the patch ready to submit within the next week or so (just a few hours more work is needed, for testing and cleanup) Then the argument about whether it should ever be included can begin in earnest. But I have seen enough examples of people being surprised that ranges of long values (where the range length is well within the addressable limit, but the range values must be PyLongs) that I think at least range() should be fixed. And if range() is fixed, then sadly, xrange() should be fixed as well (IMO). BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier. Chad

Guido van Rossum

2:53 a.m.

...

I'm working on a patch that allows both range() and xrange() to work with large (PyLong) values.

I'm not interested for xrange(). As I said, xrange() is a crutch and should not be given features that make it hard to kill. For range(), sure, upload to SF.

...

I noticed the need for a least supporting long values when I found some bugs in code that did things like:

a = 1/1e-5 range( a-20, a)

This should be a TypeError. I'm sorry it isn't. range() is only defined for ints, and unfortunately if you pass it a float it truncates rather than failing.

...

or

a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c)

Ditto. (BTW why don't you write this as 1e6, 1e5, 1e4???)

...

Now, this example is hardcoded, but in graphing software, or other numerical work, the actual values come from the data set. All of a sudden, you could be dealing with very small numbers (say, because you want to examine error values), and you get:

a = 1/1e-21 b = 1/1e-20 c = 1/1e-19 range(a, b, c)

And your piece of code now fails. By the comments I've seen, this failure tends to come as a big surprise (people are simply expecting range to be able to work with PyLong values, over short lengths).

But 1/1e-21 is not a long. It's a float. You're flirting with disaster here.

...

Also, someone who is working with large files (> C long on his machine) claimed to be having problems w/ xrange() failing (although, if he is indexing the xrange object, my patch can't help anyway)

That's a totally different problem. Indeed you can't use xrange() with values > sys.maxint. But it should be easy to recode this without xrange.

...

I've seen enough people asking in the newsgroups about this behavior (at least four in the past 5 months or so), and I've submitted some application patches to make things work for these cases (ie. by explicitly subtracting out the large common base of each parameter, and adding it back in after the list is generated), so I decided to make a patch to change the range behavior.

Fixing range was relatively easy, and could be done with no performance penalty (the code to handle longs ranges is only invoked after the existing code path fails; the common case is unaltered). Fixing xrange() is trickier, and I'm opting to maintain backwards compatibility as much as possible.

In any case, I should have the patch ready to submit within the next week or so (just a few hours more work is needed, for testing and cleanup)

Then the argument about whether it should ever be included can begin in earnest. But I have seen enough examples of people being surprised that ranges of long values (where the range length is well within the addressable limit, but the range values must be PyLongs) that I think at least range() should be fixed.

Yes.

...

And if range() is fixed, then sadly, xrange() should be fixed as well (IMO).

No.

...

BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier.

It can't be deprecated until we have an alternative. That will have to wait until Python 2.4. I fought its addition to the language long and hard, but the arguments from PBP (Practicality Beats Purity) were too strong. --Guido van Rossum (home page: http://www.python.org/~guido/)

Chad Netzer

3:52 a.m.

On Thu, 2003-03-13 at 18:53, Guido van Rossum wrote:

...

...
a = 1/1e-5 range( a-20, a)

This should be a TypeError. I'm sorry it isn't.

Yeah. I can easily make it do this, BTW. (ie. keep it backwards compatible for smaller floats, but disallow it when dealing with PyLong size floats). With large floats, the implicit conversion to PyLong gets even less sensible, due to granularity issues.

...

...
a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c)

Ditto.

(BTW why don't you write this as 1e6, 1e5, 1e4???)

Just emphasizing that the coders may not have even expected to be dealing with such "large" values, but they got them anyway because they were plotting very "small" values (and the plotting operation did the inversion). A bad choice of example, I guess. Okay, I decided to go look at the specific code I was talking about. It essentially did stuff like: large_float = 1e20 a = long( math.ceil( large_float ) ) b = a + 10 range( a, b ) So, it actually wasn't submitting floats to range(), but was expecting it to work on long values (within the limits of memory). Again, it is also easy to fix these uses, but we agree that in principle that it should work... I've heard others doing number theory work, who hoped or expected it to work, as well. (Typically, they wanted to use HUGE step sizes, for example) In any case, I'll get the patch submitted fairly soon, for range(). Need to update the tests.

...

But 1/1e-21 is not a long. It's a float. You're flirting with disaster here.

Yep. I agree.

...

...
And if range() is fixed, then sadly, xrange() should be fixed as well (IMO).

No.

Alright. That makes things (fairly) easy. :)

...

...
BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier.

It can't be deprecated until we have an alternative. That will have to wait until Python 2.4.

I'm also coding an irange() for consideration in the itertools module. At least an (explicit) replacement for the iteration usage (although, maybe not necessary if you actually do the lazy-list in "for" loop change.) If people need the indexing and length operations, too, I can only suggest a pure python implementation (which could return an irange() iterator when needed). Is that a dead-end idea, or a starter? Chad Netzer

Guido van Rossum

12:16 p.m.

...

I've heard others doing number theory work, who hoped or expected it to work, as well. (Typically, they wanted to use HUGE step sizes, for example)

As long as they wanted to use longs, that's fair. E.g. now that we're trying to get rid of the difference between ints and longs, something like range(0, 2**100, 2**99) should really just work (and it better give us [0, 2**99] :-).

...

In any case, I'll get the patch submitted fairly soon, for range(). Need to update the tests.

Thanks. I had hoped to release beta1 before PyCon, but that's not realistic. But I'll work on it soon after.

...

I'm also coding an irange() for consideration in the itertools module. At least an (explicit) replacement for the iteration usage (although, maybe not necessary if you actually do the lazy-list in "for" loop change.) If people need the indexing and length operations, too, I can only suggest a pure python implementation (which could return an irange() iterator when needed). Is that a dead-end idea, or a starter?

That's something for Raymond H. --Guido van Rossum (home page: http://www.python.org/~guido/)

Neil Schemenauer

3:28 p.m.

Guido van Rossum wrote:

...

A least it gives a DeprecationWarning now. Neil

Guido van Rossum

5:24 p.m.

...

Now that we have a kind of long/int integration, maybe it makes sense to update xrange()? Or is that really a 2.4 feature?

IMO, xrange() must die. As a compromise to practicality, it should lose functionality, not gain any. --Guido van Rossum (home page: http://www.python.org/~guido/)

David Abrahams

5:43 p.m.

Guido van Rossum <guido@python.org> writes:

...

OK, range() becomes lazy, then? Or is there another plan? -- Dave Abrahams Boost Consulting www.boost-consulting.com

Aahz

March 2003

1:42 a.m.

On Thu, Mar 13, 2003, David Abrahams wrote:

...

Chad Netzer

5:25 a.m.

On Thu, 2003-03-13 at 08:42, Aahz wrote:

...

On Thu, Mar 13, 2003, David Abrahams wrote:

...

Guido van Rossum

11:53 a.m.

...

I'm working on a patch that allows both range() and xrange() to work with large (PyLong) values.

I'm not interested for xrange(). As I said, xrange() is a crutch and should not be given features that make it hard to kill. For range(), sure, upload to SF.

...

I noticed the need for a least supporting long values when I found some bugs in code that did things like:

a = 1/1e-5 range( a-20, a)

This should be a TypeError. I'm sorry it isn't. range() is only defined for ints, and unfortunately if you pass it a float it truncates rather than failing.

...

or

a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c)

Ditto. (BTW why don't you write this as 1e6, 1e5, 1e4???)

...

Now, this example is hardcoded, but in graphing software, or other numerical work, the actual values come from the data set. All of a sudden, you could be dealing with very small numbers (say, because you want to examine error values), and you get:

a = 1/1e-21 b = 1/1e-20 c = 1/1e-19 range(a, b, c)

And your piece of code now fails. By the comments I've seen, this failure tends to come as a big surprise (people are simply expecting range to be able to work with PyLong values, over short lengths).

But 1/1e-21 is not a long. It's a float. You're flirting with disaster here.

...

Also, someone who is working with large files (> C long on his machine) claimed to be having problems w/ xrange() failing (although, if he is indexing the xrange object, my patch can't help anyway)

That's a totally different problem. Indeed you can't use xrange() with values > sys.maxint. But it should be easy to recode this without xrange.

...

I've seen enough people asking in the newsgroups about this behavior (at least four in the past 5 months or so), and I've submitted some application patches to make things work for these cases (ie. by explicitly subtracting out the large common base of each parameter, and adding it back in after the list is generated), so I decided to make a patch to change the range behavior.

Fixing range was relatively easy, and could be done with no performance penalty (the code to handle longs ranges is only invoked after the existing code path fails; the common case is unaltered). Fixing xrange() is trickier, and I'm opting to maintain backwards compatibility as much as possible.

In any case, I should have the patch ready to submit within the next week or so (just a few hours more work is needed, for testing and cleanup)

Then the argument about whether it should ever be included can begin in earnest. But I have seen enough examples of people being surprised that ranges of long values (where the range length is well within the addressable limit, but the range values must be PyLongs) that I think at least range() should be fixed.

Yes.

...

And if range() is fixed, then sadly, xrange() should be fixed as well (IMO).

No.

...

BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier.

Chad Netzer

12:52 p.m.

On Thu, 2003-03-13 at 18:53, Guido van Rossum wrote:

...

...
a = 1/1e-5 range( a-20, a)

This should be a TypeError. I'm sorry it isn't.

...

...
a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c)

Ditto.

(BTW why don't you write this as 1e6, 1e5, 1e4???)

...

But 1/1e-21 is not a long. It's a float. You're flirting with disaster here.

Yep. I agree.

...

...
And if range() is fixed, then sadly, xrange() should be fixed as well (IMO).

No.

Alright. That makes things (fairly) easy. :)

...

...
BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier.

It can't be deprecated until we have an alternative. That will have to wait until Python 2.4.

Guido van Rossum

9:16 p.m.

...

I've heard others doing number theory work, who hoped or expected it to work, as well. (Typically, they wanted to use HUGE step sizes, for example)

...

In any case, I'll get the patch submitted fairly soon, for range(). Need to update the tests.

Thanks. I had hoped to release beta1 before PyCon, but that's not realistic. But I'll work on it soon after.

...

I'm also coding an irange() for consideration in the itertools module. At least an (explicit) replacement for the iteration usage (although, maybe not necessary if you actually do the lazy-list in "for" loop change.) If people need the indexing and length operations, too, I can only suggest a pure python implementation (which could return an irange() iterator when needed). Is that a dead-end idea, or a starter?

That's something for Raymond H. --Guido van Rossum (home page: http://www.python.org/~guido/)

Neil Schemenauer

12:28 a.m.

Guido van Rossum wrote:

...

A least it gives a DeprecationWarning now. Neil

Guido van Rossum

March 2003

2:24 a.m.

...

Now that we have a kind of long/int integration, maybe it makes sense to update xrange()? Or is that really a 2.4 feature?

IMO, xrange() must die. As a compromise to practicality, it should lose functionality, not gain any. --Guido van Rossum (home page: http://www.python.org/~guido/)

David Abrahams

2:43 a.m.

Guido van Rossum <guido@python.org> writes:

...

OK, range() becomes lazy, then? Or is there another plan? -- Dave Abrahams Boost Consulting www.boost-consulting.com

8040

Age (days ago)

8041

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Aahz
Chad Netzer
David Abrahams
Guido van Rossum
Neil Schemenauer

More int/long integration issues

Chad Netzer

Chad Netzer

Neil Schemenauer

Chad Netzer

Chad Netzer

Neil Schemenauer

tags

participants (5)