[pypy-dev] pre-emptive micro-threads utilizing shared memory message passing?

Kevin Ar18 kevinar18 at hotmail.com
Tue Jul 27 04:09:16 CEST 2010

Might as well warn you: This is going to be a rather long post.
I'm not sure if this is appropriate to post here or if would fit right in with the mailing list.  Sorry, if it is the wrong place to post about this.

I've looked through the documenation (http://codespeak.net/pypy/dist/pypy/doc/stackless.html) and didn't really see what I was looking for.  I've also investigated several options in the default CPython.

What I'm trying to accomplish:
I am trying to write a particular threading scenario that follows these rules.  It is partly an experiment and partly for actual production code.

1. Hundreds or thousands of micro-threads that are essentially small self-contained programs (not really, but you can think of them that way).
2. No shared state - data is passed around from one micro-thread to another; only one micro-thread has access to the data at a time. (although the programmer gets the impression there is no shared state, in reality, the underlying implementation uses shared memory / shared state for speed; the data does not move; you just pass around a reference/pointer to some shared memory)
3. The micro-threads can run in parallel on different cpu cores, get moved to a different core, etc....
4. The micro-threads are truly pre-emptive (uses hardware interrupt pre-emption).
5. It is my intention to write my own scheduler that will suspend the micro-threads, start them, control the sharing of data, assign them to different CPU cores etc....  In fact, for my purposes, I MUST write my own scheduler as I have very specific requirements on when they should and should not run.

Now, I have spent some time trying to find a way to achieve this ... and I can implement a rather poor version using default Python.  However, I don't see any way to implement my ideal version.  Maybe someone here might have some pointers for me.

Shared Memory between parallel processes
Quick Question: Do queues from the multiprocessing module use shared memory?  If the answer is YES, you can just skip this section, because that would solve this particular problem.

(For simplicity, let's assume a quad core CPU)
It is my intent to create 4 threads/processs (one per core) and use the scheduler to assign a micro-thread (of which there may be hundreds) to one of the 4 threads/processes.  However, the micro-threads need to exchange data quickly; to do that I need shared memory -- and that is where I'm having some trouble.
Normally, 4 threads would be the ideal solution -- as they can run in parallel and use shared memory.  However, because of the Python GIL, I can't use threads in this way; thus, I have to use 4 processes, which are not setup to share memory.

Question: How can I share Python Objects between processes USING SHARED MEMORY?  I do not want to have to copy or "pass" data back and forth between processes or have to use a proxy "server" process.  These are both too much of a performance hit for my needs; shared memory is what I need.

The multiprocessing module offers me 4 options: "queues", "pipes", "shared memory map", and a "server process".
"Shared memory map" won't work as it only handles C values and arrays (not Python objects or variables).
"Server Process" sounds like a bad idea.  Am I correct in that this option requires extra processing power and does not even use shared memory?  If so, that would be a very bad choice for me.
The big question then... do "queues" and "pipes" used shared memory or do they pass data back and forth between processes?  (if they used shared memory, then that would be perfect)

Does PyPy have any other options for me?

True Pre-emptive scheduling?


Any way to get pre-emptive micro-threads?  Stackless (the real 
Stackless, not the one in PyPy) has the ability to suspend them after a 
certain number of interpreter instructions; however, this is prone to 
problems because it can run much longer than expected.  Ideally, I would 
like to have true pre-emptive scheduling using 
hardware interrupts based on timing or CPU cycles (like the OS does for 
real threads).

I am currently not aware of any way to achieve this in CPython, PyPy, Unladen Swallow, Stackless, etc....

Are there detailed docs on why the Python GIL exists?
I don't mean trivial statements like "because of C extensions" or "because the interpreter can't handle it".
It may be possible that my particular usage would not require the GIL.  However, I won't know this until I can understand what threading problems the Python interpreter has that the GIL was meant to protect against.  Is there detailed documentation about this anywhere that covers all the threading issues that the GIL was meant to solve?

Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.

More information about the Pypy-dev mailing list