[issue10227] Improve performance of MemoryView slicing

Thu Feb 3 13:34:06 CET 2011

Stefan Behnel <scoder at users.sourceforge.net> added the comment:

Here are some real micro benchmarks (note that the pybench benchmarks actually do lots of other stuff besides slicing):

base line:

$ ./python -m timeit -s 'l = list(range(100)); s=slice(None)' 'l[s]'
1000000 loops, best of 3: 0.464 usec per loop
$ ./python -m timeit -s 'l = list(range(10)); s=slice(None)' 'l[s]'
10000000 loops, best of 3: 0.149 usec per loop
$ ./python -m timeit -s 'l = list(range(10)); s=slice(None,1)' 'l[s]'
10000000 loops, best of 3: 0.135 usec per loop

patched:

$ ./python -m timeit -s 'l = list(range(100))' 'l[:1]'
10000000 loops, best of 3: 0.158 usec per loop
$ ./python -m timeit -s 'l = list(range(100))' 'l[:]'
1000000 loops, best of 3: 0.49 usec per loop
$ ./python -m timeit -s 'l = list(range(100))' 'l[1:]'
1000000 loops, best of 3: 0.487 usec per loop
$ ./python -m timeit -s 'l = list(range(100))' 'l[1:3]'
10000000 loops, best of 3: 0.184 usec per loop

$ ./python -m timeit -s 'l = list(range(10))' 'l[:]'
10000000 loops, best of 3: 0.185 usec per loop
$ ./python -m timeit -s 'l = list(range(10))' 'l[1:]'
10000000 loops, best of 3: 0.181 usec per loop

original:

$ ./python -m timeit -s 'l = list(range(100))' 'l[:1]'
10000000 loops, best of 3: 0.171 usec per loop
$ ./python -m timeit -s 'l = list(range(100))' 'l[:]'
1000000 loops, best of 3: 0.499 usec per loop
$ ./python -m timeit -s 'l = list(range(100))' 'l[1:]'
1000000 loops, best of 3: 0.509 usec per loop
$ ./python -m timeit -s 'l = list(range(100))' 'l[1:3]'
10000000 loops, best of 3: 0.198 usec per loop

$ ./python -m timeit -s 'l = list(range(10))' 'l[:]'
10000000 loops, best of 3: 0.188 usec per loop
$ ./python -m timeit -s 'l = list(range(10))' 'l[1:]'
1000000 loops, best of 3: 0.196 usec per loop

So the maximum impact seems to be 8% for very short slices (<10) and it quickly goes down for longer slices where the copy impact clearly dominates. There's still some 2% for 100 items, though.

I find it interesting that the base line is way below the other timings. That makes me think it's actually worth caching constant slice instances, as CPython already does for tuples. Cython also caches both now. I would expect that constant slices like [:], [1:] or [:-1] are extremely common. As you can see above, caching them could speed up slicing by up to 30% for short lists, and still some 7% for a list of length 100.

Stefan

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10227>
_______________________________________