[pypy-dev] pypy-dev Digest, Vol 360, Issue 13

Thu Jul 29 18:56:52 CEST 2010

Hi Kevin:

Message: 1
Date: Tue, 27 Jul 2010 14:20:10 -0400
From: Kevin Ar18 <kevinar18 at hotmail.com>
Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared
    memory message passing?
To: <pypy-dev at codespeak.net>
Message-ID: <SNT110-W23BAAD6B986917DCB074DFAAA70 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

>I am attempting to experiment with FBP - Flow Based Programming >(http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com>/fbp/book.pdf)? There is something very similar in Python: >http://www.kamaelia.org/MiniAxon.html? Also, there are some similarities >to Erlang - the share nothing memory model... and on some very broad >levels, there are similarities that can be found in functional languages.

I just came back from EuroPython. A lot of discussion on concurrency....

Well functional languages (like Erlang), variables tend to be immutable. This is a bonus in a concurrent system - makes it easier to reason about the system - and helps to avoid various race conditions. As for the shared memory. I think there is a difference between whether things are shared at the application programmer level, or under the hood controlled by the system. Programmers tend to beare bad at the former. 

>http://www.kamaelia.org/MiniAxon.html

I took a quick look. Maybe I am biased but Stackless Python gives you most of that. Also tasklets and channels can do everything a generator can and more (a generator is more specialised than a coroutine). Also it is easy to mimic asynchrony with a CSP style messaging system where microthreads and channels are cheap. A line from the book "Actors: A Model of Concurrent Computation in Distributed Systems" by Gul A. Agha comes to mind: "synchrony is mere buffered asynchrony."

>The process of connecting the boxes together was actually designed to be >programmed visually, as you can see from the examples in the book (I have >no idea if it works well, as I am merely starting to experiment with it).

What bought me to Stackless Python and PyPy was work concerning WS-BPEL. Allegedly, WS-BPEL/XLang/WSFL (Web-Services Flow Language) are based on formalisms like pi calculus.

Since I don't own a multi-core machine and I am not doing CPU intense stuff, I never really cared. However I have been doing things where I needed to impose logical orderings upon processes (i.e., process C can only run after process A and B are finished). My initial native uses of Stackless (easy to do in anything system based on CSP), resulted in deadlocking the system. So I found understanding deadlock to be very important.

>Each box, being a self contained "program," the only data it has access >to is 3 parts:

>Implementation of FBP requires a custom scheduler for several reasons:
>(1) A box can only run if it has actual data on the "in port(s)"? Thus, >the scheduler would only schedule boxes to run when they can actually >process some data.

Stackless Python already works like this. No custom scheduler needed. I would recommend you read Rob Pike's paper "The Implementation of Newsqueak" or some of the Cardilli papers to understand how CSP constructs with channels work. And if you need to customize schedulers - you have two routes 1) Use pre-existing classes and API 2) Experiment with PyPy's stackless.py

>(2) In theory, it may be possible to end up with hundreds or thousands of >these light weight boxes.? Using heavy-weight OS threads or processes for every one is out of the question.

Stackless Python.

>In a perfect world, here's what I might do:
* Assume a quad core cpu
>(1) Spawn 1 process
>(2) Spawn 4 threads & assign each thread to only 1 core -- in other >words, don't let the OS handle moving threads around to different cores
>(3) Inside each thread, have a mini scheduler that switches back and >forth between the many micro-threads (or "boxes") -- note that the OS >should not handle any of the switching between micro-threads/boxes as it >does it all wrong (and to heavyweight) for this situation.
>(4) Using a shared memory queue, each of the 4 schedulers can get the >next box to run... or add more boxes to the schedule queue.

My advice: get stuff properly working under a single threaded model first so you understand the machinery. That said, I think Carlos Eduardo de Paula a few years ago played with adapting Stackless for multi-processing.

Second piece of advice: start looking at how Go does things. Stackless Python and Go share a common ancestor. However Go does much more on the multi-core front.

Cheers,
Andrew