Real-world Python code 700 times slower than C
Tim Hochberg
tim.hochberg at ieee.org
Sat Jan 5 12:04:19 EST 2002
By cheating a bit I came up with a version using Numeric that runs more than
50x faster than the original Python version. I didn't compare with the C
version directly, but this seems like it should put things back in the
10-15x slower range. I'm cheating because I use an array of floats instead
of doubles as the C code does, but it seems that this would be sufficient
for a color picker. Not sure if this is useful to anyone, but it was sort of
entertaining.
-tim
def Ramp1(result, size, start, end):
step = (end-start)/(size-1)
for i in xrange(size):
result[i] = start + step*i
import Numeric as np
_counter = np.arange(100).astype('f')
def Ramp2(result, size, start, end):
global _counter
try:
result[:] = _counter[:size]
except ValueError:
_counter = np.arange(len(result)).astype('f')
result[:] = _counter[:size]
result *= (end-start)/(size-1)
result += start
def main():
import time
array = [0.0]*10000
array = np.array(array, 'f', savespace=1)
size, start, end = 10000, 0.0, 1.0
r = range(1000)
t0 = time.clock()
for i in r:
pass
t1 = time.clock()
for i in r:
Ramp1(array, size, start, end)
t2 = time.clock()
for i in r:
Ramp2(array, size, start, end)
t3 = time.clock()
print (t2 - t1) - (t1 - t0),
print (t3 - t2) - (t1 - t0),
main()
"Brent Burley" <brent.burley at disney.com> wrote in message
news:e2942cb3.0201041647.20271a23 at posting.google.com...
> I often use a "10x" rule of thumb for comparing Python to C, but I
> recently hit one real-world case where Python is almost 700 times
> slower than C! We just rewrote the routine in C and moved on, but
> this has interesting implications for Python optimization efforts.
>
> python
> ------
> def Ramp(result, size, start, end):
> step = (end-start)/(size-1)
> for i in xrange(size):
> result[i] = start + step*i
>
> def main():
> array = [0]*10000
> for i in xrange(100):
> Ramp(array, 10000, 0.0, 1.0)
>
> main()
>
>
> c version
> ---------
> void Ramp(double* result, int size, double start, double end)
> {
> double step = (end-start)/(size-1);
> int i;
> for (i = 0; i < size; i++)
> *result++ = start + step*i;
> }
>
> void main()
> {
> double array[10000];
> int i;
> for (i = 0; i < 100000; i++)
> Ramp(array, 10000, 0.0, 1.0);
> }
>
> We use a Ramp function similar to this to generate rgb swatches for a
> color picker. There are many, possibly large color swatches that are
> updated on every mouse event and performance was unacceptable in the
> pure Python version. There are also 2d and circular swatches that
> would be even worse if coded in Python.
>
> The Python version runs in 7.7 seconds. The C version runs in 11.3
> seconds, but loops 1000 times at much. The ratio is therefore
> 7.7*1000/11.3 = 681 (or 68100%).
>
> As expected, 99.9% of the time is spent in eval_code2, the main
> interpreter loop. Within the loop, the profile is:
>
> --- General loop overhead ---
> switch(opcode) .66 sec
> HAS_ARG .48 sec
> tstate->ticker .42 sec
> NEXTARG .30 sec
> NEXTOP .15 sec
>
> --- Individual opcodes ---
> FOR_LOOP: 1.59 sec
> BINARY_MULTIPLY: 1.02 sec
> LOAD_FAST: .99 sec
> BINARY_ADD: .96 sec
> STORE_SUBSCR: .78 sec
> STORE_FAST: .15 sec
> SET_LINENO: .09 sec
> JUMP_ABSOLUTE: .06 sec
>
> For comparison, consider that the entire equivalent C program runs in
> .01 sec (when you equalize the number of iterations). That means that
> just running the switch(opcode) statement takes 66 times as long as
> all the C code.
>
> All the proposals I've seen for Python optimization are aimed at
> general speedups. That's fine, but a 50% (or even 90%) speedup won't
> help much when your code is 500 times slower than it needs to be. I
> don't think that even JIT native code compilation will help much in
> this case because of Python's dynamic nature.
>
> I like the approach that the Perl Inline module takes where you can
> put C code directly inline with your Perl code and the Inline module
> compiles and caches the C code automatically. However the fact that
> it's C (with all of its safety and portability problems) and the fact
> that it relies on a C compiler to be properly installed and accessible
> make this approach unappealing for general use.
>
> What I really want is something spiritually equivalent to a portable
> inline assembly language with python-ish syntax that generates really
> fast native code and seamlessly integrates with python. I can dream
> can't I?
>
> --
>
> As an aside, there's another interesting bottleneck we hit in our
> production code. We're reading a lookup table from a text file (for
> doing image display color correction) that consists of 64K lines with
> 3 integers on each line. The python code looks something like:
>
> rArray = []
> gArray = []
> bArray = []
> for line in open(lutPath).xreadlines():
> entry = split(line)
> rArray.append(int(entry[0]))
> gArray.append(int(entry[1]))
> bArray.append(int(entry[2]))
>
> There are all kinds of ways to optimize this a little bit, but there
> doesn't seem to be a way to make it acceptably fast.
> map(int,open(path).read().split()) gets you pretty close, but
> deinterleaving is still slow. The C version ended up being several
> hundred times faster.
>
> Brent Burley
More information about the Python-list
mailing list