Real-world Python code 700 times slower than C

Thu Jan 3 16:18:26 EST 2002

Brent Burley wrote:

> I often use a "10x" rule of thumb for comparing Python to C, but I
> recently hit one real-world case where Python is almost 700 times
> slower than C!  We just rewrote the routine in C and moved on, but
> this has interesting implications for Python optimization efforts.
> 
> python
> ------
> def Ramp(result, size, start, end):
>     step = (end-start)/(size-1)
>     for i in xrange(size):
>         result[i] = start + step*i
> 
> def main():
>     array = [0]*10000
>     for i in xrange(100):
>         Ramp(array, 10000, 0.0, 1.0)
> 

A quick rewrite in Numeric gives me about a 5x speedup, but there's still a 
nasty bottleneck: the malloc() call implicit in every call to RampNum:

def RampNum(result, size, start, end):
    step = (end-start)/(size-1)
    result[:] = arange(size)*step + start

There's no easy way to do (that I know of) the in-place operation in Numeric, 
a very annoying limitation. Numeric will always compute a new array on the 
right hand side, unfortunately (with the associated allocation).

> 
> I like the approach that the Perl Inline module takes where you can
> put C code directly inline with your Perl code and the Inline module
> compiles and caches the C code automatically.  However the fact that
> it's C (with all of its safety and portability problems) and the fact
> that it relies on a C compiler to be properly installed and accessible
> make this approach unappealing for general use.

Still with those limitations, but look at:

http://www.scipy.org/site_content/weave

> 
> As an aside, there's another interesting bottleneck we hit in our
> production code.  We're reading a lookup table from a text file (for
> doing image display color correction) that consists of 64K lines with
> 3 integers on each line.  The python code looks something like:
> 
> rArray = []
> gArray = []
> bArray = []
> for line in open(lutPath).xreadlines():
>     entry = split(line)
>     rArray.append(int(entry[0]))
>     gArray.append(int(entry[1]))
>     bArray.append(int(entry[2]))
> 

For this problem, if you can change the format of your lut files to a binary 
one (rgbrgbrgb..... in raw binary), a simple 'fromstring()' call (from 
Numeric) will give you near-C speed. If the file is in text format, I don't 
see an easy way out of the split() calls which get expensive.

If binary files aren't an option, the following:

def read_lutNum(lutPath):
    lut = array(map(int,open(lutPath).read().split()))
    lut.shape = (lut.shape[0]/3,3)
    red   = lut[:,0]
    green = lut[:,1]
    blue  = lut[:,2]
    return red,green,blue

is about 3x faster than your code. But until there's a way to avoid all the 
text processing on the file, you'll get much worse performance in python 
compared to raw C.

If this kind of reading is something that you do a lot, it may be worth 
writing the LUTs in binary, and you could even use the raw 64Kx3 Numeric 
arrays for simplicity.

Cheers,

f.