Hi
I would like to propose an addition of an "irange" function to itertools. This addition could reduce testing effort when developing applications, in which large integers show up.
Both, xrange (Python 2.x) and range (Python 3.x) have limited support for large integer step values, for example:
Python 3.1.3 (r313:86834, Nov 28 2010, 10:01:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information.
range(1010000, 1010000+101000, 10900)[5] Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t
The code below is untested and for clarification only. It has been taken and modified from [issue7721] http://bugs.python.org/issue7721
With irange, no OverflowError is thrown:
from itertools import islice from irange import irange def nth(iterable, n, default=None): ... "Returns the nth item or a default value" ... return next(islice(iterable, n, None), default) ... nth(irange(1010000, 1010000+101000, 10900), 5) 100000000000000 ...
## Code snippet (untested)
from itertools import count, takewhile
def irange(start, stop=None, step=1): """Range for long integers
Usage: irange([start], stop, [step])
Parameters
----------
start: Integer
stop: Integer
step: Integer, defaults to 1
"""
if start is None:
raise TypeError("range() integer argument expected, got NoneType")
if stop is None:
stop = start
start = 0
if step is None:
step = 1
if step > 0:
if stop < start:
return (_ for _ in [])
cond = lambda x: x < stop
elif step < 0:
if stop > start:
return (_ for _ in [])
cond = lambda x: x > stop
else:
raise ValueError("irange() step argument must not be zero")
return takewhile(cond, (start + i * step for i in count()))
## End code snippet
Does such an addition make sense in your eyes?
Regards
Martin
On Fri, Jan 7, 2011 at 10:24 AM, Martin Manns mmanns@gmx.net wrote:
Hi
I would like to propose an addition of an "irange" function to itertools. This addition could reduce testing effort when developing applications, in which large integers show up.
Both, xrange (Python 2.x) and range (Python 3.x) have limited support for large integer step values, for example:
Python 3.1.3 (r313:86834, Nov 28 2010, 10:01:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information.
range(1010000, 1010000+101000, 10900)[5] Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t
This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected?
Mark
On Mon, Jan 10, 2011 at 6:27 PM, Mark Dickinson dickinsm@gmail.com wrote:
Python 3.1.3 (r313:86834, Nov 28 2010, 10:01:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information.
range(1010000, 1010000+101000, 10900)[5] Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t
Note that the problem isn't actually the step value - it's the overall length of the resulting sequence.
If you make the sequence shorter, it works (at least in 3.2, I didn't check earlier versions):
x = range(1010000, 1010000+(500*10900), 10900) len(x) 500 x[5]
<snip really big number>This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
The main issue is actually in range_item rather than range_subscript - we invoke range_len() there to simplify the bounds checking logic. To remove this limitation, the C arithmetic and comparison operations in that function need to be replaced with their PyLong equivalent, similar to what has been done for compute_range_length().
There's a related bug where range_subscript doesn't support indices greater than sys.maxsize - given an indexing helper function that can handle a range length that doesn't fit in sys.maxsize, it would be easy to call that unconditionally rather than indirectly via range_item, fixing that problem as well.
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected?
Agreed. It isn't a deliberate design limitation - it's just a consequence of the fact that converting from C integer programming to PyLong programming is a PITA, so it has been a process of progressive upgrades in range's support for values that don't fit in sys.maxsize.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-1 for any proposal that adds anything differentiating int/long.
-Zac
On Mon, Jan 10, 2011 at 7:52 PM, Nick Coghlan ncoghlan@gmail.com wrote:
On Mon, Jan 10, 2011 at 6:27 PM, Mark Dickinson dickinsm@gmail.com wrote:
Python 3.1.3 (r313:86834, Nov 28 2010, 10:01:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information.
range(1010000, 1010000+101000, 10900)[5] Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t
Note that the problem isn't actually the step value - it's the overall length of the resulting sequence.
If you make the sequence shorter, it works (at least in 3.2, I didn't check earlier versions):
x = range(1010000, 1010000+(500*10900), 10900) len(x) 500 x[5]
<snip really big number>This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
The main issue is actually in range_item rather than range_subscript - we invoke range_len() there to simplify the bounds checking logic. To remove this limitation, the C arithmetic and comparison operations in that function need to be replaced with their PyLong equivalent, similar to what has been done for compute_range_length().
There's a related bug where range_subscript doesn't support indices greater than sys.maxsize - given an indexing helper function that can handle a range length that doesn't fit in sys.maxsize, it would be easy to call that unconditionally rather than indirectly via range_item, fixing that problem as well.
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected?
Agreed. It isn't a deliberate design limitation - it's just a consequence of the fact that converting from C integer programming to PyLong programming is a PITA, so it has been a process of progressive upgrades in range's support for values that don't fit in sys.maxsize.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Tue, Jan 11, 2011 at 12:55 AM, Zac Burns zac256@gmail.com wrote:
-1 for any proposal that adds anything differentiating int/long.
It isn't about adding anything - the signature of the length slots at the C level already uses a Py_ssize_t, so any time you get the length of a container, you're limited to values that will fit in that size. This is fine for real containers, as you will run out of memory long before the container length overflows and throws an exception. It is an issue for a virtual container like range() though - because it doesn't actually create the whole range, it can be created with a length that exceeds what Py_ssize_t can handle. That's fine, until you run into one of the operations that directly or indirectly invokes len() on the object.
Currently, indexing a range is such an operation (which is why it fails). While it's a fairly straightforward (albeit somewhat tedious) change to fix range_subscript and range_item to correctly handle cases where the index and/or the length exceed sys.maxsize, it still requires someone to actually create the issue on the tracker and then propose a patch to fix it.
It's even theoretically possible to upgrade the __len__ protocol to support lengths that exceed Py_ssize_t, but that's a much more ambitious (PEP scale) project.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Jan 11, 2011 at 1:23 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Currently, indexing a range is such an operation (which is why it fails). While it's a fairly straightforward (albeit somewhat tedious) change to fix range_subscript and range_item to correctly handle cases where the index and/or the length exceed sys.maxsize, it still requires someone to actually create the issue on the tracker and then propose a patch to fix it.
As anticipated, a reasonably straightforward but somewhat tedious change to make: http://bugs.python.org/issue10889 :)
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Jan 12, 2011 at 4:27 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On Tue, Jan 11, 2011 at 1:23 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Currently, indexing a range is such an operation (which is why it fails). While it's a fairly straightforward (albeit somewhat tedious) change to fix range_subscript and range_item to correctly handle cases where the index and/or the length exceed sys.maxsize, it still requires someone to actually create the issue on the tracker and then propose a patch to fix it.
As anticipated, a reasonably straightforward but somewhat tedious change to make: http://bugs.python.org/issue10889 :)
This change has been committed to SVN, and will be available with 3.2rc1 later this week.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, 10 Jan 2011 21:52:36 +1000 Nick Coghlan ncoghlan@gmail.com wrote:
Wouldn't it be better to fix 'range' to behave as expected?
Agreed. It isn't a deliberate design limitation - it's just a consequence of the fact that converting from C integer programming to PyLong programming is a PITA, so it has been a process of progressive upgrades in range's support for values that don't fit in sys.maxsize.
Nick:
So the limitations is not a deliberate design choice. Looking at the tracker, a fix would probably be covered by issue2690.
I see that you have provided a patch there on Dec. 3. However, this patch either does not address the problem or it has not been committed to Py3k, yet. I checked Py3k with an svn dump and it shows the same behavior as Python 3.1.
original message by Nick Coghlan in issue2690:
"I brought the patch up to date for the Py3k branch, but realised just before checking it in that it may run afoul of the language moratorium (since it alters the behaviour of builtin range objects)."
Does the patch address the issue or is it a more complicated problem?
If the former is the case then could issue2690 be re-opened and the patch committed?
However, if the latter is the case then I still would like to propose at least adding a snippet to the itertools docs because fixing the issue properly could take its time.
Cheers
Martin
Martin Manns, 10.01.2011 18:01:
original message by Nick Coghlan in issue2690:
"I brought the patch up to date for the Py3k branch, but realised just before checking it in that it may run afoul of the language moratorium (since it alters the behaviour of builtin range objects)."
The language moratorium does not apply here because the described behaviour is a limitation that is specific to the CPython implementation, not the language.
Stefan
On Tue, Jan 11, 2011 at 3:04 AM, Stefan Behnel stefan_ml@behnel.de wrote:
Martin Manns, 10.01.2011 18:01: >
original message by Nick Coghlan in issue2690:
"I brought the patch up to date for the Py3k branch, but realised just before checking it in that it may run afoul of the language moratorium (since it alters the behaviour of builtin range objects)."
The language moratorium does not apply here because the described behaviour is a limitation that is specific to the CPython implementation, not the language.
The patch that message referred to was subsequently checked in, it just didn't solve the bug mentioned in this thread :)
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Jan 11, 2011 at 3:01 AM, Martin Manns mmanns@gmx.net wrote:
original message by Nick Coghlan in issue2690:
"I brought the patch up to date for the Py3k branch, but realised just before checking it in that it may run afoul of the language moratorium (since it alters the behaviour of builtin range objects)."
Does the patch address the issue or is it a more complicated problem?
It's a separate problem. Last paragraph of the message you quoted:
Note that I also fixed the patch so that OverflowError occurs only when encountering an affected operation (primarily indexing and retrieval of the length). If you don't do any of those things, you can make your ranges as large as you like. (The indexing could fairly easily be fixed to eliminate the overflow errors - I just didn't do it in this patch, since it is a separate problem).
However, if the latter is the case then I still would like to propose at least adding a snippet to the itertools docs because fixing the issue properly could take its time.
It's a separate issue. In the meantime, a recipe on the Python Cookbook would probably be the most appropriate way to handle it.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia