[Python-3000] optimizing [x]range
Stargaming
stargaming at gmail.com
Thu Aug 2 17:03:42 CEST 2007
On Sat, 28 Jul 2007 17:06:50 +0200, tomer filiba wrote:
> currently, testing for "x in xrange(y)" is an O(n) operation.
>
> since xrange objects (which would become range in py3k) are not real
> lists, there's no reason that __contains__ be an O(n). it can easily be
> made into an O(1) operation. here's a demo code (it should be trivial to
> implement this in CPython)
>
>
> class xxrange(object):
> def __init__(self, *args):
> if len(args) == 1:
> self.start, self.stop, self.step = (0, args[0], 1)
> elif len(args) == 2:
> self.start, self.stop, self.step = (args[0], args[1], 1)
> elif len(args) == 3:
> self.start, self.stop, self.step = args
> else:
> raise TypeError("invalid number of args")
>
> def __iter__(self):
> i = self.start
> while i < self.stop:
> yield i
> i += self.step
>
> def __contains__(self, num):
> if num < self.start or num > self.stop:
> return False
> return (num - self.start) % self.step == 0
>
>
> print list(xxrange(7)) # [0, 1, 2, 3, 4, 5, 6] print
> list(xxrange(0, 7, 2)) # [0, 2, 4, 6] print list(xxrange(1, 7, 2))
> # [1, 3, 5] print 98 in xxrange(100) # True print 98 in
> xxrange(0, 100, 2) # True print 99 in xxrange(0, 100, 2) # False
> print 98 in xxrange(1, 100, 2) # False print 99 in xxrange(1, 100, 2)
> # True
>
>
>
> -tomer
I gave the implementation a try. I cannot guarantee that it follows every
guideline for the CPython core but it works, it's fast and passes all
tests::
>>> 98 in xrange(0, 100, 2)
True
>>> 99 in xrange(0, 100, 2)
False
>>> 98 in xrange(1, 100, 2)
False
>>> 99 in xrange(1, 100, 2)
True
Note: test_xrange wasn't really helpful with validating xrange's
functionality. No tests for negative steps, at least it didn't warn me
while it didn't work. ;)
It is basically the algorithm you provided, with a fix for negative steps.
The patch is based on the latest trunk/ checkout, Python 2.6. I don't
think this is a problem if nobody else made any effort towards making
xrange more sequence-like in the Python 3000 branch. The C source might
require some tab/space cleanup.
Speed comparison
================
$ ./python -V
Python 2.6a0
$ python -V
Python 2.5.1
./python is my local build, with the patch applied.
$ ./python -mtimeit "0 in xrange(100)"
1000000 loops, best of 3: 0.641 usec per loop
$ python -mtimeit "0 in xrange(100)"
1000000 loops, best of 3: 0.717 usec per loop
Well, with a lot of ignorance, this is still the same.
$ ./python -mtimeit "99 in xrange(100)"
1000000 loops, best of 3: 0.638 usec per loop
$ python -mtimeit "99 in xrange(100)"
100000 loops, best of 3: 6.17 usec per loop
Notice the difference in the magnitude of loops!
$ ./python -mtimeit -s "from sys import maxint" "maxint in xrange(maxint)"
1000000 loops, best of 3: 0.622 usec per loop
$ python -mtimeit -s "from sys import maxint" "maxint in xrange(maxint)"
Still waiting.. ;)
Index: Objects/rangeobject.c
===================================================================
--- Objects/rangeobject.c (revision 56666)
+++ Objects/rangeobject.c (working copy)
@@ -129,12 +129,31 @@
return rtn;
}
+static int
+range_contains(rangeobject *r, PyIntObject *key) {
+ if (PyInt_Check(key)) {
+ int keyval = key->ob_ival;
+ int start = r->start;
+ int step = r->step;
+ int end = start + (r->len * step);
+
+ if ((step < 0 && keyval <= start && keyval > end) \
+ || (step > 0 && keyval >= start && keyval < end)) {
+ return ((keyval - start) % step) == 0;
+ }
+ }
+ return 0;
+}
+
static PySequenceMethods range_as_sequence = {
(lenfunc)range_length, /* sq_length */
0, /* sq_concat */
0, /* sq_repeat */
(ssizeargfunc)range_item, /* sq_item */
0, /* sq_slice */
+ 0, /* sq_ass_item */
+ 0, /* sq_ass_slice */
+ (objobjproc)range_contains, /* sq_contains */
};
static PyObject * range_iter(PyObject *seq);
Test suite
==========
OK
288 tests OK.
CAUTION: stdout isn't compared in verbose mode:
a test that passes in verbose mode may fail without it.
1 test failed:
test_nis # due to verbosity, I guess
38 tests skipped:
test_aepack test_al test_applesingle test_bsddb test_bsddb185
test_bsddb3 test_bz2 test_cd test_cl test_codecmaps_cn
test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr
test_codecmaps_tw test_curses test_dbm test_gdbm test_gl
test_imageop test_imgfile test_linuxaudiodev test_macostools
test_normalization test_ossaudiodev test_pep277 test_plistlib
test_scriptpackages test_socketserver test_sqlite test_startfile
test_sunaudiodev test_tcl test_timeout test_urllib2net
test_urllibnet test_winreg test_winsound test_zipfile64
5 skips unexpected on linux2:
test_tcl test_dbm test_bz2 test_gdbm test_bsddb
Should I submit the patch to the SF patch manager as well?
Regards,
Stargaming
More information about the Python-3000
mailing list