Is it Python or is it C ? (long)

Wed Mar 8 13:19:29 EST 2000

Don Tuttle <tuttledon at hotmail.com> wrote:
> No.  Although I may have made my question a little to open ended, I'm
> looking for a peer review on the concept --"With a little care in the way
> you design your code, most of your code will run as compiled C..."

well, I think John meant that "with a little care, most of the
work will be done by compiled C".

after all, the Python interpreter is written in C, so everything
you do runs as compiled C...

> If there is a way to optimize Python, short of memorizing it's source code
> and/or years of trial and error, someone should have their next book
> project! (I'll buy it!)  And I say this as someone who doesn't have enough
> Python experience to know if this is a true statement or not.

how about a sentence instead of a book?

    the more time spent on the Python level, the more
    chances you have to restructure your code so that
    more *work* is done by compiled C.

to figure out how much time you spend on the Python
level, and where that time is spent, use the profiler.

...

to make this a bit clearer, how about an example?

assume you need a function which calculates a histogram
over a greyscale image.  this function should return a list,
containing the number of pixels in the image having the
same value as the index into that list.

now, the standard way to count things is to use a dictionary.
maybe something like this will work:

def histogram1(image):
    # use a dictionary
    xsize, ysize = image.size
    data = {}
    for y in range(ysize):
        for x in range(xsize):
            v = image.getpixel((x, y))
            data[v] = data.get(v, 0) + 1
    # convert to list
    result = []
    for i in range(256):
        result.append(data.get(i, 0))
    return result

(here, image is a PIL Image object).

running this on a small image works fine, but when the
image gets larger, things go out of hand.  for a 512x512
image, this takes just over 11 seconds on my machine.

looking at the code, it's pretty clear that we might speed
things up by using a list instead of a dictionary.  after all,
an 8-bit grayscale image can only contain pixel values from
0 to 255.  list indexing should be much faster than dictionary
lookups, right?

def histogram2(image):
    # wait, use a list instead
    xsize, ysize = image.size
    result = [0] * 256
    for y in range(ysize):
        for x in range(xsize):
            v = image.getpixel((x, y))
            result[v] = result[v] + 1
    return result

less code.  but it doesn't run much faster: just
under 9 seconds, on my machine.

let's run this under the python profiler, and see if
we can figure out what's going on here:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   17.463   17.463   58.874   58.874 <script>:26(histogram2)
   262144   29.388    0.000   41.412    0.000 Image.py:501(getpixel)

as you can see, most of the time (41 seconds when running
under the profiler) is spent down in the Image module.  note
that we're doing 262144 Python-level calls to getpixel (one
for each pixel, by obvious reasons).

what if we could get rid of some of these?

in fact, PIL contains a "getdata" method that returns a
"flattened" one-dimensional view of the 2D pixel array.
this lets us process all pixels inside a single for-in loop:

def histogram3(image):
    # wait, use a list, but get rid of that getpixel call
    result = [0] * 256
    for v in image.getdata():
        result[v] = result[v] + 1
    return result

running this takes just over one (1) second on my machine.
about 10 times faster than our first attempt. that's pretty
good, don't you think?

lets use the profiler again:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.234    1.234    1.234    1.234 <script>:36(histogram3)
        1    0.000    0.000    0.000    0.000 Image.py:487(getdata)

this time, there's only one call down to Image.py -- and that
call doesn't take any time at all! [1]

...

now, assuming that one second is still too much, is there
anything else we can do?  well, that "x=x+1" stuff looks
fishy.  maybe we can count things a bit faster?  in fact,
the list object contains a method called "count" that does
exactly this -- it tells us how many times a given value
is found in a list.

unfortunately, doing something like "im.getdata().count(1)"
doesn't work -- the getdata method returns a sequence
object, but that object doesn't implement the full list inter-
face.

explicitly turning it into a list might help here:

def histogram4(image):
    # wait, use the list.count operator
    data = list(image.getdata())
    result = []
    for i in range(256):
        result.append(data.count(i))
    return result

unfortunately, running this takes over 23 seconds.  more
than twice as slow as the original implementation.  and
some additional tests reveal that 22 of those are spent
in the "count" method.  what's wrong here?  isn't "count"
implemented in C, or what?

(if you haven't realized what's wrong here, why
not spend a minute or so and see if you can figure
that out by yourself)

the problem here is that we've replaced 2x256x256
list accesses with 256x256x256 integer comparisions.

even though they're implemented in C, doing 16 million
calls to the integer object takes a little time...

...

so is one second per image really the best we can do?

nope -- PIL provides a method that does all this for
us, using a tightly written C loop.

def histogram5(image):
    return image.histogram()

this script takes slightly less than 0.05 seconds on
my machine.  that's over 200 times faster than the
first version, and more than 450 times faster than
the slowest one...

...

summing up, the more you do in C, the faster your
program runs.  but C doesn't help if you use a lousy
algorithm... [2]

or in other words, if your Python program doesn't
run  fast enough, use the profiler.  or ask for help
on comp.lang.python!

</F>

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->

1) getdata returns a reference to an already existing
sequence object.  a Python method cannot be much
faster...

2) one early PIL adopter told me they'd rewritten
a C++ program in Python, using PIL, and was rather
surprised to find that the Python code ran nearly
three times faster ;-)