Real-world Python code 700 times slower than C
Fernando PĂ©rez
fperez528 at yahoo.com
Thu Jan 3 16:18:26 EST 2002
Brent Burley wrote:
> I often use a "10x" rule of thumb for comparing Python to C, but I
> recently hit one real-world case where Python is almost 700 times
> slower than C! We just rewrote the routine in C and moved on, but
> this has interesting implications for Python optimization efforts.
>
> python
> ------
> def Ramp(result, size, start, end):
> step = (end-start)/(size-1)
> for i in xrange(size):
> result[i] = start + step*i
>
> def main():
> array = [0]*10000
> for i in xrange(100):
> Ramp(array, 10000, 0.0, 1.0)
>
A quick rewrite in Numeric gives me about a 5x speedup, but there's still a
nasty bottleneck: the malloc() call implicit in every call to RampNum:
def RampNum(result, size, start, end):
step = (end-start)/(size-1)
result[:] = arange(size)*step + start
There's no easy way to do (that I know of) the in-place operation in Numeric,
a very annoying limitation. Numeric will always compute a new array on the
right hand side, unfortunately (with the associated allocation).
>
> I like the approach that the Perl Inline module takes where you can
> put C code directly inline with your Perl code and the Inline module
> compiles and caches the C code automatically. However the fact that
> it's C (with all of its safety and portability problems) and the fact
> that it relies on a C compiler to be properly installed and accessible
> make this approach unappealing for general use.
Still with those limitations, but look at:
http://www.scipy.org/site_content/weave
>
> As an aside, there's another interesting bottleneck we hit in our
> production code. We're reading a lookup table from a text file (for
> doing image display color correction) that consists of 64K lines with
> 3 integers on each line. The python code looks something like:
>
> rArray = []
> gArray = []
> bArray = []
> for line in open(lutPath).xreadlines():
> entry = split(line)
> rArray.append(int(entry[0]))
> gArray.append(int(entry[1]))
> bArray.append(int(entry[2]))
>
For this problem, if you can change the format of your lut files to a binary
one (rgbrgbrgb..... in raw binary), a simple 'fromstring()' call (from
Numeric) will give you near-C speed. If the file is in text format, I don't
see an easy way out of the split() calls which get expensive.
If binary files aren't an option, the following:
def read_lutNum(lutPath):
lut = array(map(int,open(lutPath).read().split()))
lut.shape = (lut.shape[0]/3,3)
red = lut[:,0]
green = lut[:,1]
blue = lut[:,2]
return red,green,blue
is about 3x faster than your code. But until there's a way to avoid all the
text processing on the file, you'll get much worse performance in python
compared to raw C.
If this kind of reading is something that you do a lot, it may be worth
writing the LUTs in binary, and you could even use the raw 64Kx3 Numeric
arrays for simplicity.
Cheers,
f.
More information about the Python-list
mailing list