Mailman 3 xrange accepting non-ints - Python-Dev

24 Aug 2006

      I've been working on enhancing xrange and there are a bunch of issues
to consider.  I've got pretty much complete implementations in both C
and Python.  Currently xrange is 2 objects:  range and the iter.
These only work on C longs.  Here's what I propose:

2.6:
 * Add deprecation warning if a float is passed to xrange (currently
silently truncates)
 * Allow int, long, float, or obj.__index__
 * Implement xrange in python
 * Implement iterator over C longs (or Py_ssize_t) in C
 * Implement iterator over Python longs in Python (* may lose __length_hint__)
 * Capture the values on construction, so mutating objects wouldn't
affect xrange

The idea is to expand xrange's capabilities so that it can replace range in 3k.

I've profiled various combinations.  Here are the various results
normalized doing xrange(0, 1e6, 1):

Run on all integer (32-bit) values for start, step, end:
C xrange and iter:  1
Py xrange w/C iter: 1
Py xrange w/Py iter (gen): 5-8
Py xrange w/Py iter (class): ~30

So having xrange in python is the same speed as if xrange is written
in C.  The important part is that the iterator needs to be written in
C for speed.  If we use a generator, something like:

        while value < end:
            yield value
            value += step

The result is ~5 times slower in a release build and 8 times slower in
a debug build than with an iterator implemented in C.  Using a
generator means that there is no __length_hint__.  If we go with a
full class that has a __length_hint__ the result was ~32 times slower
in a debug build.

The Python impl is about 1/10th the size of the C impl, though is
lacking some comments.

Run on Python longs the result is somewhat interesting.  The Python
based iterator is faster.  There's probably a bug in the C version,
but given that there is a lot of object allocation, I wouldn't expect
the C version to ever be much faster than a similar Python version.
Plus the Python version is trivial (same as above) for ints or longs.
The C version for longs is quite a bit of code.

Run on all Python longs (still 0..1e6, but sys.maxint..(sys.maxint +
1e6) is the same):
C xrange and iter:  1.4
Py xrange w/C iter: not tested
Py xrange w/Py iter (gen): 1
Py xrange w/Py iter (class): 4

Caveats:
 * The generator version above doesn't support floats.  We could
easily support floats with a  different calculation that would be
slightly more expensive, but not have accumulated error.
 * By using the generator version, __length_hint__ gets dropped.  This
means that converting the iterator into a sequence could be slightly
more costly as we have to increase the allocation.  This would only
happen if any of start, step, end weren't an int.
 * With a python implementation there is a little bit of bootstraping
that is necessary to get the iter implemented in C into the xrange
object implemented in Python
 * Since xrange is implemented in Python, it can be changed.
 * The Python code is much easier to understand than the C code (there
is at least one bug in the current C version where -sys.maxint -1
isn't always displayed properly).

Hopefully this is all understandable.  If I left anything out, Thomas
will remind me.

n

xrange accepting non-ints

Neal Norwitz

Guido van Rossum

Thomas Wouters

Fredrik Lundh

tags

participants (4)