[Image-SIG] PIL speed in single/multi threaded apps

Thu Aug 7 00:38:31 EDT 2003

Hi everyone;

I've been doing a lot of multi-threaded GUI apps that use PIL lately, and
was looking for ways to improve performance, especially when rapidly
updating images being displayed on the screen after color changes.

I decided to try to enable true multi-threading in the PIL core library to
see what effects it had on performance.  Basically, although Python has
multi-threading, unless an extension library explicitly "allows" other
threads to run while it's busy doing intensive tasks, only one thread runs
at a time.  So, even on a multi-CPU computer (or a Pentium 4 with
hyperthreading), you don't really get the benefit of true multi-threading.

So, I modified the geometry.c source file to add "Py_BEGIN_ALLOW_THREADS"
and "Py_END_ALLOW_THREADS" blocks around the intensive tasks within the
resize mechanisms to see what would happen.  I then wrote a test script that
used multiple threads to simultaneously resize images, and I counted how
many times I could resize an image in 30 seconds.  (see the script at the
end for details).

Here are the results, they're very interesting!  These were done with the
"standard" PIL 1.1.4 binary and my modified version, both on a dual-CPU
Athlon MP-1200 computer with 512MB RAM.

1 Thread:  original:  805 resizes (max 60% CPU usage)
1 Thread:  modified:  920 resizes (max 75% CPU usage)

2 Threads: original:  758 resizes (max 60% CPU usage)
2 Threads: modified:  1185 resizes (max 100% CPU usage)

3 Threads:  original: 755 resizes (max 60% CPU usage)
3 Threads:  modified:  1205 resizes (max 100% CPU usage)

4 Threads:  original:  755 resizes (max 60% CPU usage)
4 Threads:  modified:  1194 resizes (max 100% CPU usage)

I'm not surprised that with multiple threads, the increase in speed was up
to 60%.  However, I WAS surprised that with only one thread, there was still
a 14% speed improvement!  I'm assuming this is because PIL is allowing the
Python core to perform background tasks simultaneously with imaging tasks.

I'd like to test this on a single-CPU Pentium 4 system that has
Hyperthreading enabled too, to see if it has similar speed increases.

Question:  how many users would benefit from such speed increases?  I can't
guarantee that all "expensive" PIL tasks would be sped up similarly, but
it's quite likely they would be.  Do many users write multi-threaded PIL
applications or run on multi-CPU/P4 computer systems?

I'm quite willing to put the work into finding the right places in the core
to implement the threading switches, but I would like to see it adopted in
the primary distribution rather than a fork... comments?

Thanks!  Here's the test code I used if you're curious.  It's not ideal, but
it's pretty fair.

Kevin.
###############################
import threading
import Image
import time

class t:
    def __init__(self, counter, lock, rounds, killFlag):
        self.counter = counter
        self.lock = lock
        self.rounds = rounds
        self.killFlag = killFlag
        self.start()

    def start(self):
        im = Image.new("RGB", (2048, 2048))
        for i in range(self.rounds):
            if self.killFlag.isSet():
                break
            x = im.resize((1024,1093))
            self.lock.acquire()
            self.counter.increment()
            self.lock.release()

class c:
    def __init__(self):
        self.counter = 0
    def increment(self):
        self.counter = self.counter + 1

def run(threadCount = 2):
    threads = []
    counter = c()
    lock = threading.Lock()
    killFlag = threading.Event()
    killFlag.clear()

    for Th in range(threadCount):
        threads.append(threading.Thread(target = t, args = (counter, lock,
10000, killFlag)))

    for Th in threads:
        Th.start()
    for q in range(30):
        print "Time: %s\nImages: %s" %(time.time(), counter.counter)
        time.sleep(1)

    killFlag.set()
    print "\n\ndone timing"

if __name__ == "__main__":
    run()