[pypy-dev] Thoughts on multithreading in PyPy

Dima Tisnek dimaqq at gmail.com
Sun Apr 10 05:02:01 CEST 2011


IMO real zest and usefulness of multithreading are complex shared data
structures.

That said, most multithreaded programs share a very small, albeit
critical subset of their data.

What you propose is something akin to cpython multiprocessing or
stackless with the addition of parallel execution of independent
tasklets. I think stackless users would like that actually.

Now when you get to the point of introducing some communication
between processes, do you mean to pass only byte streams? primitive
types? complex data structures? The last you cannot do as you have
separate gc's, so you are limited to copyable data structures only,
which is as useful as multiprocessing (concept, not module) and
json/pickle messages.

In short, a clone of multiprocessing would be useful, perhaps when
numpy evolves to support user defined pypy functions well.

It's a niche, not general approach though.

d.

On 9 April 2011 15:15, Jacob Hallén <jacob at openend.se> wrote:
> Saturday 09 April 2011 you wrote:
>> Hi Jacob,
>>
>> On Sat, Apr 9, 2011 at 10:50 PM, Jacob Hallén <jacob at openend.se> wrote:
>> > So, in a second step, we provide for special data types that can be
>> > shared between threads. These would typically be allocated in
>> > non-movable memory, to avoid the complexity of garbage collection of
>> > memory with shared use. You can make simple fifo structures for
>> > communication between the threads and complex structures with advanced
>> > algorithms for dealing with shared access.
>>
>> That's where the real issue is.  You can come up with some reasonable
>> API to communicate with other threads, but precisely, they will be
>> some API, which means that they will only work in programs written
>> specifically to use them.  Designing a new API (at the level of the
>> Python language) is something we carefully avoided so far in PyPy; but
>> it's possible that this issue is important enough for us to break that
>> rule :-)
>>
>> What you are describing sounds similar to the multiprocessing module
>> in CPython, which achieves the same goal using separated processes
>> (and tons and tons of hacks), and requires the program to use a custom
>> API.  The advantage of doing it in PyPy rather than CPython is
>> probably limited to the fact that it would be easier in PyPy (but
>> still some work) to make sure that the multiple threads have no shared
>> state.  You still have to design some custom API.
>
> The multiprocessing API contains some classic primitives that could be kept.
> Lots of the rest seem way too complicated. This is because they are dealing
> with separate processes.
>
> I think you are downplaying the advantages of using PyPy. Apart from being
> easier and cleaner to implement, it would be using threads instead of
> processes, providing for much quicker communication and context switches
> between threads. Then it would be able to use the JIT, providing much better
> performance. You could also provide proxy object spaces to transparently
> spread load over multiple physical machines.
>
> Now, I don't think we should go ahead and start work on this now. I just like
> exploring the idea. If people come along wanting to do GIL removal, we can
> present them with a plan and set them off working.
>
>> There is also another possible goal with more "pypy-like" goals and
>> results, which would be to use some technique to "weave" a solution in
>> the interpreter transparently for the Python programmer (so, a
>> solution that works without requiring the Python programmer to learn
>> another system than threads).  I can imagine a Software Transactional
>> Memory solution that would in theory work very nicely, but in practice
>> have completely dreadful performance, because it would do large
>> amounts of checked memory access for each bytecode.  As far as I know
>> it means that that approach does not work, but it may one day, if
>> Hardware Transactional Memory really shows up and supports that scale.
>
> While being a very neat idea, it is still pie in the sky. My idea could be
> pie-on-plate, though it hasn't been baked yet.
>
> Jacob
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>



More information about the Pypy-dev mailing list