I've been working on enhancing xrange and there are a bunch of issues to consider. I've got pretty much complete implementations in both C and Python. Currently xrange is 2 objects: range and the iter. These only work on C longs. Here's what I propose: 2.6: * Add deprecation warning if a float is passed to xrange (currently silently truncates) * Allow int, long, float, or obj.__index__ * Implement xrange in python * Implement iterator over C longs (or Py_ssize_t) in C * Implement iterator over Python longs in Python (* may lose __length_hint__) * Capture the values on construction, so mutating objects wouldn't affect xrange The idea is to expand xrange's capabilities so that it can replace range in 3k. I've profiled various combinations. Here are the various results normalized doing xrange(0, 1e6, 1): Run on all integer (32-bit) values for start, step, end: C xrange and iter: 1 Py xrange w/C iter: 1 Py xrange w/Py iter (gen): 5-8 Py xrange w/Py iter (class): ~30 So having xrange in python is the same speed as if xrange is written in C. The important part is that the iterator needs to be written in C for speed. If we use a generator, something like: while value < end: yield value value += step The result is ~5 times slower in a release build and 8 times slower in a debug build than with an iterator implemented in C. Using a generator means that there is no __length_hint__. If we go with a full class that has a __length_hint__ the result was ~32 times slower in a debug build. The Python impl is about 1/10th the size of the C impl, though is lacking some comments. Run on Python longs the result is somewhat interesting. The Python based iterator is faster. There's probably a bug in the C version, but given that there is a lot of object allocation, I wouldn't expect the C version to ever be much faster than a similar Python version. Plus the Python version is trivial (same as above) for ints or longs. The C version for longs is quite a bit of code. Run on all Python longs (still 0..1e6, but sys.maxint..(sys.maxint + 1e6) is the same): C xrange and iter: 1.4 Py xrange w/C iter: not tested Py xrange w/Py iter (gen): 1 Py xrange w/Py iter (class): 4 Caveats: * The generator version above doesn't support floats. We could easily support floats with a different calculation that would be slightly more expensive, but not have accumulated error. * By using the generator version, __length_hint__ gets dropped. This means that converting the iterator into a sequence could be slightly more costly as we have to increase the allocation. This would only happen if any of start, step, end weren't an int. * With a python implementation there is a little bit of bootstraping that is necessary to get the iter implemented in C into the xrange object implemented in Python * Since xrange is implemented in Python, it can be changed. * The Python code is much easier to understand than the C code (there is at least one bug in the current C version where -sys.maxint -1 isn't always displayed properly). Hopefully this is all understandable. If I left anything out, Thomas will remind me. n
On 8/24/06, Neal Norwitz
I've been working on enhancing xrange and there are a bunch of issues to consider. I've got pretty much complete implementations in both C and Python. Currently xrange is 2 objects: range and the iter. These only work on C longs. Here's what I propose:
2.6: * Add deprecation warning if a float is passed to xrange (currently silently truncates) * Allow int, long, float, or obj.__index__
float? I thought the first bullet says no float?
* Implement xrange in python
Since xrange is used in performance critical apps that may be a bad idea. Or maybe only if the args aren't all ints?
* Implement iterator over C longs (or Py_ssize_t) in C * Implement iterator over Python longs in Python (* may lose __length_hint__) * Capture the values on construction, so mutating objects wouldn't affect xrange
Right. So capture them as Python int or long only.
The idea is to expand xrange's capabilities so that it can replace range in 3k.
I've profiled various combinations. Here are the various results normalized doing xrange(0, 1e6, 1):
Run on all integer (32-bit) values for start, step, end: C xrange and iter: 1 Py xrange w/C iter: 1 Py xrange w/Py iter (gen): 5-8 Py xrange w/Py iter (class): ~30
So having xrange in python is the same speed as if xrange is written in C.
I'm not sure I believe this benchmark -- to measure the cost of xrange itself (as opposed to the cost of iterating over the iterator) you should test xrange(0) or xrange(1).
The important part is that the iterator needs to be written in C for speed. If we use a generator, something like:
while value < end: yield value value += step
The result is ~5 times slower in a release build and 8 times slower in a debug build than with an iterator implemented in C.
Nobody cares about the speed of the debug build. :-)
Using a generator means that there is no __length_hint__. If we go with a full class that has a __length_hint__ the result was ~32 times slower in a debug build.
I don't mind not having the length hint for longs. Doesn't current xrange also support a faster version of reversed?
The Python impl is about 1/10th the size of the C impl, though is lacking some comments.
Run on Python longs the result is somewhat interesting. The Python based iterator is faster. There's probably a bug in the C version, but given that there is a lot of object allocation, I wouldn't expect the C version to ever be much faster than a similar Python version. Plus the Python version is trivial (same as above) for ints or longs. The C version for longs is quite a bit of code.
If the longs are large enough, the addition is going to dominate the cost, yes.
Run on all Python longs (still 0..1e6, but sys.maxint..(sys.maxint + 1e6) is the same): C xrange and iter: 1.4 Py xrange w/C iter: not tested Py xrange w/Py iter (gen): 1 Py xrange w/Py iter (class): 4
Caveats: * The generator version above doesn't support floats. We could easily support floats with a different calculation that would be slightly more expensive, but not have accumulated error.
Is there a good use case for supporting float? The problem with floats is that even apart from accumulated error, it's still ambiguous. E.g. will xrange(1.1, 1.9, 0.1) include the end point or not? That would depend on the details of decimal-to-binary conversion.
* By using the generator version, __length_hint__ gets dropped. This means that converting the iterator into a sequence could be slightly more costly as we have to increase the allocation. This would only happen if any of start, step, end weren't an int.
Fine with me -- you can't do that at all at the moment. :-)
* With a python implementation there is a little bit of bootstraping that is necessary to get the iter implemented in C into the xrange object implemented in Python
Long-term, I'd rather see it implemented all in C. Short term, the Python implementation is great to experiment.
* Since xrange is implemented in Python, it can be changed. * The Python code is much easier to understand than the C code (there is at least one bug in the current C version where -sys.maxint -1 isn't always displayed properly).
Hopefully this is all understandable. If I left anything out, Thomas will remind me.
I'm about to head out to Google now... -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 8/24/06, Guido van Rossum
On 8/24/06, Neal Norwitz
wrote: I've been working on enhancing xrange and there are a bunch of issues to consider. I've got pretty much complete implementations in both C and Python. Currently xrange is 2 objects: range and the iter. These only work on C longs. Here's what I propose:
2.6: * Add deprecation warning if a float is passed to xrange (currently silently truncates) * Allow int, long, float, or obj.__index__
float? I thought the first bullet says no float?
No, the bullet says 'add warning' :) xrange() currently accepts floats, because it uses one of the integer getargs formats:
xrange(1.2, 2.5, 1.9999) xrange(1, 2)
* Implement xrange in python
Since xrange is used in performance critical apps that may be a bad idea. Or maybe only if the args aren't all ints?
Is the cost of *calling* xrange() really a big issue? I don't think Neal measured this, but he could. I'd imagine performance-critical apps call xrange() once, then use that to iterate.
Caveats: * The generator version above doesn't support floats. We could easily support floats with a different calculation that would be slightly more expensive, but not have accumulated error.
Is there a good use case for supporting float? The problem with floats is that even apart from accumulated error, it's still ambiguous. E.g. will xrange(1.1, 1.9, 0.1) include the end point or not? That would depend on the details of decimal-to-binary conversion.
Supporting floats is definately problematic. It would be nice if xrange() supported arbitrary numeric types, though, like decimals. That would quench the thirst people seem to have for float-ish xranges.
* With a python implementation there is a little bit of bootstraping that is necessary to get the iter implemented in C into the xrange object implemented in Python
Long-term, I'd rather see it implemented all in C. Short term, the Python implementation is great to experiment.
Why, other than performance? It's a lot simpler and much easier to get right
in Python, which is quite good for maintenance, too.
--
Thomas Wouters
Neal Norwitz wrote:
I've profiled various combinations. Here are the various results normalized doing xrange(0, 1e6, 1):
Run on all integer (32-bit) values for start, step, end: C xrange and iter: 1 Py xrange w/C iter: 1
in real life, loops are a lot shorter than that. if you take that into account, you don't even have to run the benchmark to realize that calling a Python function and checking the arguments before calling a C function takes more time than calling a C function. even if you skip the "check the arguments" part, you take a 5% hit:
timeit -s"def myxrange(a,xrange=xrange): return xrange (a)" "for i in myxrange(100): pass" 100000 loops, best of 3: 5.28 usec per loop
timeit "for i in xrange(100): pass" 100000 loops, best of 3: 4.98 usec per loop
timeit -s"def myxrange(a,b=None,c=None,xrange=xrange): return xrange(a,b,c)" "for i in myxrange(0,100,1): pass" 100000 loops, best of 3: 5.58 usec per loop
timeit "for i in xrange(0,100,1): pass" 100000 loops, best of 3: 5.27 usec per loop
I doubt adding more code to the myxrange function will speed it up... </F>
participants (4)
-
Fredrik Lundh
-
Guido van Rossum
-
Neal Norwitz
-
Thomas Wouters