Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory message passing?

July 28, 2010

      On Tue, Jul 27, 2010 at 20:20, Kevin Ar18 <kevinar18@hotmail.com> wrote:
...
I won't even bother giving individual replies.  It's
going to take me some time to go through all that information on the
GIL, so I guess there's no much of a reply I can give anyways.  :)  Let me explain what this is all about in greater detail.

...
BTW, if there are more links on the GIL, feel free to post.
...
Anonymous memory-mapped regions would work, with a suitable data
abstraction. Or even memory-mapped files, which aren't really all that
different on systems anymore.
I considered that... however, that would mean writing a significant library to convert Python data types to C/machine types and I wasn't looking forward to that prospect... although after some experimenting, maybe I will find that it won't be that big a deal for my particular situation.
...
I am attempting to experiment with FBP - Flow Based Programming (http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com/fbp/book.pdf)  There is something very similar in Python: http://www.kamaelia.org/MiniAxon.html  Also, there are some similarities to Erlang - the share nothing memory model... and on some very broad levels, there are similarities that can be found in functional languages.
Except for the "visual programming" part, the general idea you
describe stems from CSP (Communicating Sequential Processes) and is
also found at least in the Scala actor library and in Google's Go with
goroutines.
In both languages you can easily pretend that no memory is shared by
avoiding to share any pointers (unlike C, even buggy code can't modify
a pointer which wasn't shared), and Go recommends programming this
way. A difference is that this is a convention.

For the "visual programming", it looks like a particular case of what
the Eclipse Modeling Framework is doing (they allow you to define
types of diagrams, called metamodels, and a way to convert them to
code, and generate a diagram editor and other support stuff. I'm not
an expert on that).
...
From what you describe, FBP seems to give nothing new, except the
combination among "visual programming" with this idea. Disclaimer: I
did not read the book.
...
Consider p74 and p75 of the FBP book (http://www.jpaulmorrison.com/fbp/book.pdf).  Programs essentially consist of many "black boxes" connected together.  A box receives data, processes it and passes it along to another box, to output or drops/deletes it.  Each box, is like a mini-program written in a traditional programming language (like C++ or Python).
The process of connecting the boxes together was actually designed to be programmed visually, as you can see from the examples in the book (I have no idea if it works well, as I am merely starting to experiment with it).
Each box, being a self contained "program," the only data it has access to is 3 parts:
(1) it's own internal variables
(2) The "in ports" These are connections from other boxes allowing the box to receive data to be processed (very similar to the arguments in a function call)
(3) The "out ports" After processing the data, the box sends results to various "out ports" (which, in turn, go to anther box's "in port" or to system output).  There is no "return" like in functions... and a box can continually generate many pieces of data on the "out ports", unlike a function which only generates one return.
------------------------
At this point, my understanding of the FBP concept is extremely limited.  Unfortunately, the author does not have very detailed documentation on the implementation details.  So, I am going to try exploring the concept on my own and see if I can actually use it in some production code.
Implementation of FBP requires a custom scheduler for several reasons:
(1) A box can only run if it has actual data on the "in port(s)"  Thus, the scheduler would only schedule boxes to run when they can actually process some data.
(2) In theory, it may be possible to end up with hundreds or thousands of these light weight boxes.  Using heavy-weight OS threads or processes for every one is out of the question.
The Kamaelia website describes a simplistic single-threaded way to write a scheduler in Python that would work for the FBP concept (even though they never heard of FBP when they designed Kamaelia).  Based on that, it seems like writing a simple scheduler would be rather easy:
...
In a perfect world, here's what I might do:
* Assume a quad core cpu
(1) Spawn 1 process
(2) Spawn 4 threads & assign each thread to only 1 core -- in other words, don't let the OS handle moving threads around to different cores
(3) Inside each thread, have a mini scheduler that switches back and forth between the many micro-threads (or "boxes") -- note that the OS should not handle any of the switching between micro-threads/boxes as it does it all wrong (and to heavyweight) for this situation.
(4) Using a shared memory queue, each of the 4 schedulers can get the next box to run... or add more boxes to the schedule queue.
Most of this is usual or standard - even if somebody possibly won't
set thread-CPU affinity, possibly because they don't know about the
syscalls to do it, i.e. sched_setaffinity. IIRC, this was not
mentioned in the paper I read about the Scala actor library.
Look for 'N:M threading library' (without quotes) on Google.
...
(5) Each box has access to its "in ports" and "out ports" only -- and nothing else.  These can be implemented as shared memory for speed.
...
Some notes:
Garbage Collection - I noticed that one of the issues mentioned about the GIL was garbage collection.  Within the FBP concept, this MIGHT be easily solved: (a) only 1 running piece of code (1 box) can access a piece of data at a time, so there is no worries about whether there are dangling pointers to the var/object somewhere, etc...
...
(b) data must be manually "dropped" inside a box to get rid of it; thus, there is no need to go checking for data that is not used anymore
A "piece of data" can point to other objects, and the pointer can be
modified. So you need GC anyway: having that, requiring data to be
dropped explicitly seems just an annoyance (there might be deeper
reasons, however).
...
Threading protection - In theory, there is significantly less threading issues since: (a) only one box can control/access data at a time (b) the only place where there is contention is when you push/pop from the in/out ports ... and that is trivial to protect against.
Agreed.
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/