[pypy-dev] Thoughts on multithreading in PyPy

Sun Apr 10 00:15:50 CEST 2011

Saturday 09 April 2011 you wrote:
> Hi Jacob,
> 
> On Sat, Apr 9, 2011 at 10:50 PM, Jacob Hallén <jacob at openend.se> wrote:
> > So, in a second step, we provide for special data types that can be
> > shared between threads. These would typically be allocated in
> > non-movable memory, to avoid the complexity of garbage collection of
> > memory with shared use. You can make simple fifo structures for
> > communication between the threads and complex structures with advanced
> > algorithms for dealing with shared access.
> 
> That's where the real issue is.  You can come up with some reasonable
> API to communicate with other threads, but precisely, they will be
> some API, which means that they will only work in programs written
> specifically to use them.  Designing a new API (at the level of the
> Python language) is something we carefully avoided so far in PyPy; but
> it's possible that this issue is important enough for us to break that
> rule :-)
> 
> What you are describing sounds similar to the multiprocessing module
> in CPython, which achieves the same goal using separated processes
> (and tons and tons of hacks), and requires the program to use a custom
> API.  The advantage of doing it in PyPy rather than CPython is
> probably limited to the fact that it would be easier in PyPy (but
> still some work) to make sure that the multiple threads have no shared
> state.  You still have to design some custom API.

The multiprocessing API contains some classic primitives that could be kept. 
Lots of the rest seem way too complicated. This is because they are dealing 
with separate processes.

I think you are downplaying the advantages of using PyPy. Apart from being 
easier and cleaner to implement, it would be using threads instead of 
processes, providing for much quicker communication and context switches 
between threads. Then it would be able to use the JIT, providing much better 
performance. You could also provide proxy object spaces to transparently 
spread load over multiple physical machines.

Now, I don't think we should go ahead and start work on this now. I just like 
exploring the idea. If people come along wanting to do GIL removal, we can 
present them with a plan and set them off working.

> There is also another possible goal with more "pypy-like" goals and
> results, which would be to use some technique to "weave" a solution in
> the interpreter transparently for the Python programmer (so, a
> solution that works without requiring the Python programmer to learn
> another system than threads).  I can imagine a Software Transactional
> Memory solution that would in theory work very nicely, but in practice
> have completely dreadful performance, because it would do large
> amounts of checked memory access for each bytecode.  As far as I know
> it means that that approach does not work, but it may one day, if
> Hardware Transactional Memory really shows up and supports that scale.

While being a very neat idea, it is still pie in the sky. My idea could be 
pie-on-plate, though it hasn't been baked yet.

Jacob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110410/8b5e9466/attachment.pgp>