[Python-Dev] Pythonic concurrency
ms at cerenity.org
Fri Oct 7 23:02:38 CEST 2005
[ Possibly overlengthy reply. However given a multiple sets of cans of
On Friday 07 October 2005 07:25, Josiah Carlson wrote:
> One thing I notice is absent from the Kamaelia page is benchmarks.
That's largely for one simple reason: we haven't done any yet.
At least not anything I'd call a benchmark. "There's lies, damn lies,
statistics and then there's benchmarks."
//Theoretically// I suspect that the system /could/ perform as well as
traditional approaches to dealing with concurrent problems single threaded
(and multi-thread/process). This is based on the recognition of two things:
* Event systems (often implementing state machine type behaviour, not
always though), often have intermediate buffers between states &
operations. Some systems divide a problem into multiple reactors and
stages and have communication between them, though this can sometimes
be hidden. All we've done is make this much more explicit.
* Event systems (and state machine based approaches) can often be used
to effectively say "I want to stop and wait here, come back to me later"
or simply "I'm doing something processor intensive, but I'm being nice
and letting something else have a go". The use of generators here simply
makes that particular behaviour more explicit. This is a nice bonus of
[neither is a negative really, just different. The first bullet has implicit
buffers in the system, the latter has a more implicit state machine
in the system. ICBVW here of course.]
However, COULD is not is, and whilst I say "in theory", I am painfully aware
that theory and practice often have a big gulf between them.
Also, I'm certain that at present our performance is nowhere near optimal.
We've focussed on trying to find what works from a few perspectives rather
than performance (one possible definition of correctness here, but certainly
not the only one). Along the way we've made compomises in favour of clarity
as to what's going on, rather than performance.
For example, one are we know we can optimise is the handling of
message delivery. The mini-axon tutorial represents delivery between
active components as being performed by an independent party - a
postman. This is precisely what happens in the current system.
That can be optimised for example by collapsing outboxes into inboxes
(ie removing one of the lists when a linkage is made and changing the
refernce), and at that point you have a single intermediate buffer (much
like an event/state system communicating between subsystems). We haven't
done this yet, Whilst it would partly simplify things, it makes other
areas more complex, and seems like premature optimisation.
However I have performed an //informal comparison// between the use of a
Kamaelia type approach and a traditional approach not using any framework at
all for implementing a trivial game. (Cats bouncing around the screen
scaling, rotating, etc, controlled by a user) The reason I say Kamaelia-type
approach is because it was a mini-axon based experiment using collapsed
outboxes to inboxes (as above).
The measure I used was simply framerate. This is a fair real value and has a
real use - if it drops too low, the system is simply unusable. I measured the
framerate before transforming the simplistic game to work well in the
framework, and after transforming it. The differences were:
* 5% drop in performance/framerate
* The ability to reuse much of the code in other systems and environments.
From that perspective it seems acceptable (for now). This *isn't* as you would
probably say a rigorous or trustable benchmark, but was a useful "smoke test"
if you like of the approach.
From a *pragmatic* perspective, currently the system is fast enough for simple
games (say a hundred, 2 hundred, maybe more, sprites actve at once),
for interactive applications, video players, realtime audio mixing and a
variety of other things, so currently we're leaving that aside.
Also from an even more pragmatic perspective, I would say if you're after
performance and throughput then I'd say use Twisted, since it's a proven
**If** our stuff turns out to be useful, we'd like to find way of making our
stuff available inside twisted -- if they'd like it (*) -- since we're not
the least bit interested in competing with anyone :-) So far *we're* finding
it useful, which is all I'd personally claim, and hope that it's useful to
(*) The all too brief conversation I had with Tommi Virtanen at Europython
suggested that he at least thought the pipeline/graphline idea was
worth taking - so I'd like to do that at some point, even if it
sidelines our work to date.
Once we've validated the model though (which I expect to take some time,
you only learn if it's validated by builiding things IMO), then we'll look at
optimisation. (if the model is validated :-)
All that said, I'm open to suggestion as to what sort of benchmark you'd like
to see. I'm more interested in benchmarks that actually mean something rather
than say X is better than Y though.
Summarising them, no benchmarks, yet. If you're after speed, I'm certain
you can find that elsewhere. If you're after an easy way of dealing with a
concurrent problem, that's where we're starting from, and then optimising.
We're very open to suggestions for improvement on both usability/leanability
and on keeping doors open/open doors to performance though. I'd hate to
have to rewrite everything in a another language later simply due to poor
[ Network controlled Networked Audio Mixing Matrix ]
> Very neat. How much data? What kind of throughput? What kinds of
For the test system we tested with 3 raw PCM audio data streams. That 's
3 x 44.1Khz, 16 bit stereo - which is around 4.2Mbit/s of data from the
network being processed realtime and output back to the network at
1.4Mbit/s. So, not huge numbers, but not insignificant amounts of data
either. I suppose one thing I can take more time with now is to look at
the specific latency of the mixer. It didn't *appear* to be large however.
(there appeared to be similar latency in the system with or without the
[[The aim of the rapid prototyping session was to see what could be done
rather than to measure the results. The total time taken for coding the
mixing matrix was 2.5 days. About 1/2 day spent on finding an issue we had
with network resends regarding non-blocking sockets. A day with me totally
misunderstanding how mixing raw audio byte streams works. The backplane
was written during that 3 day time period. The control protocol for
switching on/off mixes and querying the system though was ~1.5 hours
from start to finish, including testing. To experiment with what dataflow
architecture might work, I knocked up a command line controlled dynamic
graph viewer (add nodes, link nodes, delete nodes) in about 5 minutes and
then experimented with what the system would look like if done naively. The
backplane idea became clear as useful here because we wanted to allow
multiple mixers. ]]
A more interesting effect we found was dealing with mouse movement in pygame
where we found that *huge* numbers of messages being sent one at a time and
processed one at a time (with yields after each) became a huge bottleneck.
It became more sense to batch the events and pass them to client surfaces.
(If that makes no sense we allow pygame components to act as if they have
control of the display by giving them a surface from a pygame display service.
This acts essentially as a simplistic window manager. That means pygame events
need to be passed through quickly and cleanly.)
The reason I like using pygame for these things is because a) it's relatively
raw and fast b) games are another often /naturally/ concurrent system. Also
it normally allows other senses beyond reading numbers/graphs to kick in when
evaluating changes "that looks better/worse", "Theres's something wrong
> I have two recent posts about the performance and features of a (hacked
> together) tuple space system
Great :-) I'll have a dig around.
> The only thing that it is missing is a prioritization mechanism (fifo,
> numeric priority, etc.), which would get us a job scheduling kernel. Not
> bad for a "message passing"/"tuple space"/"IPC" library.
Sounds interesting. I'll try and find some time to have a look and have a
play. FWIW, we're also missing a prioritisation mechanism right now. Though
currently I have SImon Wittber's latest release of Nanothreads on my stack of
to look at. I do have a soft spot for Linda type approaches though :-)
Michael Sparks, Senior R&D Engineer, Digital Media Group
Michael.Sparks at rd.bbc.co.uk, http://kamaelia.sourceforge.net/
British Broadcasting Corporation, Research and Development
Kingswood Warren, Surrey KT20 6NP
This e-mail may contain personal views which are not the views of the BBC.
More information about the Python-Dev