Add irange with large integer step support to itertools

Hi I would like to propose an addition of an "irange" function to itertools. This addition could reduce testing effort when developing applications, in which large integers show up. Both, xrange (Python 2.x) and range (Python 3.x) have limited support for large integer step values, for example: Python 3.1.3 (r313:86834, Nov 28 2010, 10:01:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information.
The code below is untested and for clarification only. It has been taken and modified from [issue7721] http://bugs.python.org/issue7721 With irange, no OverflowError is thrown:
## Code snippet (untested) from itertools import count, takewhile def irange(start, stop=None, step=1): """Range for long integers Usage: irange([start], stop, [step]) Parameters ---------- start: Integer stop: Integer step: Integer, defaults to 1 """ if start is None: raise TypeError("range() integer argument expected, got NoneType") if stop is None: stop = start start = 0 if step is None: step = 1 if step > 0: if stop < start: return (_ for _ in []) cond = lambda x: x < stop elif step < 0: if stop > start: return (_ for _ in []) cond = lambda x: x > stop else: raise ValueError("irange() step argument must not be zero") return takewhile(cond, (start + i * step for i in count())) ## End code snippet Does such an addition make sense in your eyes? Regards Martin

On Fri, Jan 7, 2011 at 10:24 AM, Martin Manns <mmanns@gmx.net> wrote:
This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected? Mark

On Mon, Jan 10, 2011 at 6:27 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
Note that the problem isn't actually the step value - it's the overall length of the resulting sequence. If you make the sequence shorter, it works (at least in 3.2, I didn't check earlier versions):
This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
The main issue is actually in range_item rather than range_subscript - we invoke range_len() there to simplify the bounds checking logic. To remove this limitation, the C arithmetic and comparison operations in that function need to be replaced with their PyLong equivalent, similar to what has been done for compute_range_length(). There's a related bug where range_subscript doesn't support *indices* greater than sys.maxsize - given an indexing helper function that can handle a range length that doesn't fit in sys.maxsize, it would be easy to call that unconditionally rather than indirectly via range_item, fixing that problem as well.
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected?
Agreed. It isn't a deliberate design limitation - it's just a consequence of the fact that converting from C integer programming to PyLong programming is a PITA, so it has been a process of progressive upgrades in range's support for values that don't fit in sys.maxsize. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 11, 2011 at 12:55 AM, Zac Burns <zac256@gmail.com> wrote:
-1 for any proposal that adds anything differentiating int/long.
It isn't about adding anything - the signature of the length slots at the C level already uses a Py_ssize_t, so any time you get the length of a container, you're limited to values that will fit in that size. This is fine for real containers, as you will run out of memory long before the container length overflows and throws an exception. It *is* an issue for a virtual container like range() though - because it doesn't actually *create* the whole range, it can be created with a length that exceeds what Py_ssize_t can handle. That's fine, until you run into one of the operations that directly or indirectly invokes len() on the object. Currently, indexing a range is such an operation (which is why it fails). While it's a fairly straightforward (albeit somewhat tedious) change to fix range_subscript and range_item to correctly handle cases where the index and/or the length exceed sys.maxsize, it still requires someone to actually create the issue on the tracker and then propose a patch to fix it. It's even theoretically possible to upgrade the __len__ protocol to support lengths that exceed Py_ssize_t, but that's a much more ambitious (PEP scale) project. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 11, 2011 at 1:23 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As anticipated, a reasonably straightforward but somewhat tedious change to make: http://bugs.python.org/issue10889 :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, 10 Jan 2011 21:52:36 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Nick: So the limitations is not a deliberate design choice. Looking at the tracker, a fix would probably be covered by issue2690. I see that you have provided a patch there on Dec. 3. However, this patch either does not address the problem or it has not been committed to Py3k, yet. I checked Py3k with an svn dump and it shows the same behavior as Python 3.1. original message by Nick Coghlan in issue2690:
Does the patch address the issue or is it a more complicated problem? If the former is the case then could issue2690 be re-opened and the patch committed? However, if the latter is the case then I still would like to propose at least adding a snippet to the itertools docs because fixing the issue properly could take its time. Cheers Martin

On Tue, Jan 11, 2011 at 3:01 AM, Martin Manns <mmanns@gmx.net> wrote:
It's a separate problem. Last paragraph of the message you quoted:
Note that I also fixed the patch so that OverflowError occurs only when encountering an affected operation (primarily indexing and retrieval of the length). If you don't do any of those things, you can make your ranges as large as you like. (The indexing could fairly easily be fixed to eliminate the overflow errors - I just didn't do it in this patch, since it is a separate problem).
It's a separate issue. In the meantime, a recipe on the Python Cookbook would probably be the most appropriate way to handle it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Jan 7, 2011 at 10:24 AM, Martin Manns <mmanns@gmx.net> wrote:
This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected? Mark

On Mon, Jan 10, 2011 at 6:27 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
Note that the problem isn't actually the step value - it's the overall length of the resulting sequence. If you make the sequence shorter, it works (at least in 3.2, I didn't check earlier versions):
This example strikes me as a bug in range (specifically, in range_subscript in Objects/rangeobject.c).
The main issue is actually in range_item rather than range_subscript - we invoke range_len() there to simplify the bounds checking logic. To remove this limitation, the C arithmetic and comparison operations in that function need to be replaced with their PyLong equivalent, similar to what has been done for compute_range_length(). There's a related bug where range_subscript doesn't support *indices* greater than sys.maxsize - given an indexing helper function that can handle a range length that doesn't fit in sys.maxsize, it would be easy to call that unconditionally rather than indirectly via range_item, fixing that problem as well.
Does such an addition make sense in your eyes?
Wouldn't it be better to fix 'range' to behave as expected?
Agreed. It isn't a deliberate design limitation - it's just a consequence of the fact that converting from C integer programming to PyLong programming is a PITA, so it has been a process of progressive upgrades in range's support for values that don't fit in sys.maxsize. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 11, 2011 at 12:55 AM, Zac Burns <zac256@gmail.com> wrote:
-1 for any proposal that adds anything differentiating int/long.
It isn't about adding anything - the signature of the length slots at the C level already uses a Py_ssize_t, so any time you get the length of a container, you're limited to values that will fit in that size. This is fine for real containers, as you will run out of memory long before the container length overflows and throws an exception. It *is* an issue for a virtual container like range() though - because it doesn't actually *create* the whole range, it can be created with a length that exceeds what Py_ssize_t can handle. That's fine, until you run into one of the operations that directly or indirectly invokes len() on the object. Currently, indexing a range is such an operation (which is why it fails). While it's a fairly straightforward (albeit somewhat tedious) change to fix range_subscript and range_item to correctly handle cases where the index and/or the length exceed sys.maxsize, it still requires someone to actually create the issue on the tracker and then propose a patch to fix it. It's even theoretically possible to upgrade the __len__ protocol to support lengths that exceed Py_ssize_t, but that's a much more ambitious (PEP scale) project. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 11, 2011 at 1:23 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As anticipated, a reasonably straightforward but somewhat tedious change to make: http://bugs.python.org/issue10889 :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, 10 Jan 2011 21:52:36 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Nick: So the limitations is not a deliberate design choice. Looking at the tracker, a fix would probably be covered by issue2690. I see that you have provided a patch there on Dec. 3. However, this patch either does not address the problem or it has not been committed to Py3k, yet. I checked Py3k with an svn dump and it shows the same behavior as Python 3.1. original message by Nick Coghlan in issue2690:
Does the patch address the issue or is it a more complicated problem? If the former is the case then could issue2690 be re-opened and the patch committed? However, if the latter is the case then I still would like to propose at least adding a snippet to the itertools docs because fixing the issue properly could take its time. Cheers Martin

On Tue, Jan 11, 2011 at 3:01 AM, Martin Manns <mmanns@gmx.net> wrote:
It's a separate problem. Last paragraph of the message you quoted:
Note that I also fixed the patch so that OverflowError occurs only when encountering an affected operation (primarily indexing and retrieval of the length). If you don't do any of those things, you can make your ranges as large as you like. (The indexing could fairly easily be fixed to eliminate the overflow errors - I just didn't do it in this patch, since it is a separate problem).
It's a separate issue. In the meantime, a recipe on the Python Cookbook would probably be the most appropriate way to handle it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Mark Dickinson
-
Martin Manns
-
Nick Coghlan
-
Stefan Behnel
-
Zac Burns