[pypy-dev] Thinking about the GIL

Ben.Young at sungard.com Ben.Young at sungard.com
Wed Mar 16 11:06:36 CET 2011

It's a nice solution for lots of problems, but unfortunately not the areas I'm interested in, where there's a shared constant memory data set of 20-100GB and 32-64 threads all performing calculations on it. As soon as you do any data copying you blow the memory requirements and kill the performance. Of course, I can't say I'm working in a "standard" environment, but for some things nothing but true threading will do


Note this is in C# which does scale quite well to 64 threads, as long as you don't do any allocation!





From: René Dudfield [mailto:renesd at gmail.com] 
Sent: 16 March 2011 09:52
To: Benjamin Peterson
Cc: Young, Ben; pypy-dev at codespeak.net
Subject: Re: [pypy-dev] Thinking about the GIL



one alternative approach is to have a separate VM in each thread.  Then pass messages between them.  Works well, and no GIL in each VM.  You have to have clean code that allows you to have a separate VMs in a process.  However, it's easier to make your code be able to run in separate VMs, than to recode it to allow concurrent thread access to all data structures.  This way is easier to implement than removing a GIL.

>>> vms = [VM() for i in range(8)]
>>> vms[0].send(["going in"])
>>> vms[0].get()
["coming out!"]

I did this with tinypy vms in a cpython host, and it worked kind of ok.  It still used the GIL when I wanted to communicate with the VMs.  Only one vm could be talked to at a time.  I used it with the SDL "fastevent" event queue, so each vm posted into the queue could be read from cpython.  The other communication that worked well was using mmap'd data structures.  This means you do not need to serialise the data when sharing messages between threads.

(As a side note... It turned out to be less code just using CPython separate processes communicating via shared mmap buffers for this particular task.  Since the data was all numpy/pygame.Surface they could be shared easily.  If it was pure python code, it probably would have been a different matter.)

Apart from easier implementation on the VM level, this approach also has other advantages.  One is that it makes message sharing more explicit between threads.The other is that GC pressure is made smaller.  Each VM has its own GC heap, rather than all of the objects in one big heap shared by all the threads.

(another aside, CPython can also use multiple vms in one process, and that's how some webservers have embedded python... one python vm per thread).


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110316/70ce1c57/attachment.html>

More information about the Pypy-dev mailing list