[Python-Dev] Pythonic concurrency

Sat Oct 8 05:05:20 CEST 2005

Michael Sparks <ms at cerenity.org> wrote:
> [ Possibly overlengthy reply. However given a multiple sets of cans of
>   worms... ]
> On Friday 07 October 2005 07:25, Josiah Carlson wrote:
> > One thing I notice is absent from the Kamaelia page is benchmarks.
> 
> That's largely for one simple reason: we haven't done any yet. 

Perfectly reasonable.  If you ever do, I'd be happy to know!

> At least not anything I'd call a benchmark. "There's lies, damn lies,
> statistics and then there's benchmarks."

Indeed.  But it does allow people to get an idea whether a system could
handle their workload.

> The measure I used was simply framerate. This is a fair real value and has a 
> real use - if it drops too low, the system is simply unusable. I measured the 
> framerate before transforming the simplistic game to work well in the 
> framework, and after transforming it. The differences were:
>    * 5% drop in performance/framerate
>    * The ability to reuse much of the code in other systems and environments.

Single process?  Multi-process single machine?  Multiprocess multiple
machine?

> Also from an even more pragmatic perspective, I would say if you're after 
> performance and throughput then I'd say use Twisted, since it's a proven 
> technology.

I'm just curious.  I keep my fingers away from Twisted as a matter of
personal taste (I'm sure its great, but it's not for me).

> All that said, I'm open to suggestion as to what sort of benchmark you'd like 
> to see. I'm more interested in benchmarks that actually mean something rather 
> than say X is better than Y though.

I wouldn't dream of saying that X was better or worse than Y, unless one
was obvious crap (since it works for you, and you've gotten new users to
use it successfully, that is obviously not the case).

There are five benchmarks that I think would be interesting to see:
1. Send ~500 bytes of data round-trip from process A to process B and
back on the same machine as fast as you can (simulates a synchronous
message passing and discovers transfer latencies) a few (tens of)
thousands of times (A doesn't send message i until it has recieved
message i-1 back from B).

2. Increase the number of processes that round trip with B.  A quick
chart of #senders vs. messages/second would be far more than adequate.

3. Have process B send ~500 byte messages to many listening processes
via whatever is the fastest method (direct connections, multiple
subscriptions to a 'channel', etc.).  Knowing #listeners vs.
messages/second would be cool.

4. Send blocks of data from process A to process B (any size you want).
B immediately discards the data, but you pay attention to how much
data/second B recieves (a dual processor machine with proper processor
affinities would be fine here).

5. Start increasing the number of processes that send data to B.  A
quick chart of #senders vs. total bytes/second would be far more than
adequate.

I'm just offering the above as example benchmarks (you certainly don't
need to do them to satisfy me, but I'll be doing those when my tuple
space implementation is closer to being done). They are certainly not
exhaustive, but they do offer a method by which one can measure
latencies, message volume throughput, data volume throughput, and
ability to handle many senders and/or recipients.

> [ Network controlled Networked Audio Mixing Matrix ]
> > Very neat.  How much data?  What kind of throughput?  What kinds of
> > latencies?
> 
> For the test system we tested with 3 raw PCM audio data streams. That 's
> 3 x 44.1Khz, 16 bit stereo - which is around 4.2Mbit/s of data from the
> network being processed realtime and output back to the network at
> 1.4Mbit/s. So, not huge numbers, but not insignificant amounts of data
> either. I suppose one thing I can take more time with now is to look at
> the specific latency of the mixer. It didn't *appear* to be large however.
> (there appeared to be similar latency in the system with or without the
> mixer)

530Kbytes/second in, 176kbytes/second out.  Not bad (I imagine you are
using a C library/extension of some sort to do the mixing...perhaps
numarray, Numeric, ...).  How large are the blocks of data that you are
shuffling around at one time?  1,5,10,50,150kbytes?

> A more interesting effect we found was dealing with mouse movement in pygame
> where we found that *huge* numbers of messages being sent one at a time and
> processed one at a time (with yields after each) became a huge bottleneck.

I can imagine.

> The reason I like using pygame for these things is because a) it's relatively 
> raw and fast b) games are another often /naturally/ concurrent system. Also 
> it normally allows other senses beyond reading numbers/graphs to kick in when 
> evaluating changes "that looks better/worse", "Theres's something wrong 
> there".

Indeed.  I'm should get my fingers into PyGame, but haven't yet due to
other responsibilities.

> > I have two recent posts about the performance and features of a (hacked
> > together) tuple space system 
> 
> Great :-) I'll have a dig around.

Make that 3.

> > The only thing that it is missing is a prioritization mechanism (fifo,
> > numeric priority, etc.), which would get us a job scheduling kernel. Not
> > bad  for a "message passing"/"tuple space"/"IPC" library.   
> 
> Sounds interesting. I'll try and find some time to have a look and have a 
> play. FWIW, we're also missing a prioritisation mechanism right now. Though 
> currently I have SImon Wittber's latest release of Nanothreads on my stack of 
> to look at. I do have a soft spot for Linda type approaches though :-)

I've not yet released anything.  The version I'm working on essentially
indexes tuples in a set of specialized structures to make certain kinds
of matching fast (both insertions and removals are also fast), which has
a particular kind of queue at the 'leaf' (if one were to look at it as a
tree).

Those queues also support listeners which want to be notified about one
or many tuples which happen to match up with the pattern, resulting in
the tuple being consumed by one listener, broadcast to all listeners,
etc.

In the case of no listeners, but someone who just wants one tuple, one
can prioritize tuple fetches based on fifo, numeric priority, lifo, or
whatever other useful semantic that I get around to putting in there for
whatever set of tuples matches it.

 - Josiah