Mailman 3 Re: [Python-Dev] Pythonic concurrency - cooperative MT - Python-Dev

newer
Re: [Python-Dev] Autoloading?...

Re: [Python-Dev] Pythonic concurrency - cooperative MT

older
Re: [Python-Dev] Extending tuple...

Martin Blais

2 Oct 2005 2 Oct '05

9:49 p.m.

On 10/1/05, Antoine wrote:

...

...
like this with their "deferred objects", no? I figure they would need to do something like this too. I will have to check.)

A Deferred object is just the abstraction of a callback - or, rather, two callbacks: one for success and one for failure. Twisted is architected around an event loop, which calls your code back when a registered event happens (for example when an operation is finished, or when some data arrives on the wire). Compared to generators, it is a different way of expressing cooperative multi-threading.

So, the question is, in Twisted, if I want to defer on an operation that is going to block, say I'm making a call to run a database query that I'm expecting will take much time, and want to yield ("defer") for other events to be processed while the query is executed, how do I do that? As far as I remember the Twisted docs I read a long time ago did not provide a solution for that.

Show replies by date

Christopher Armstrong

2 Oct 2 Oct

11:19 p.m.

New subject: Pythonic concurrency - cooperative MT

On 10/3/05, Martin Blais wrote:

...

On 10/1/05, Antoine wrote:

...
...
like this with their "deferred objects", no? I figure they would need to do something like this too. I will have to check.)

A Deferred object is just the abstraction of a callback - or, rather, two callbacks: one for success and one for failure. Twisted is architected around an event loop, which calls your code back when a registered event happens (for example when an operation is finished, or when some data arrives on the wire). Compared to generators, it is a different way of expressing cooperative multi-threading.

So, the question is, in Twisted, if I want to defer on an operation that is going to block, say I'm making a call to run a database query that I'm expecting will take much time, and want to yield ("defer") for other events to be processed while the query is executed, how do I do that? As far as I remember the Twisted docs I read a long time ago did not provide a solution for that.

Deferreds don't make blocking code non-blocking; they're just a way to make it nicer to write non-blocking code. There are utilities in Twisted for wrapping a blocking function call in a thread and having the result returned in a Deferred, though (see deferToThread). There is also a lightweight and complete wrapper for DB-API2 database modules in twisted.enterprise.adbapi, which does the threading interaction for you. So, since this then exposes a non-blocking API, you can do stuff like d = pool.runQuery('SELECT User_ID FROM Users') d.addCallback(gotDBData) d2 = ldapfoo.getUser('bob') d2.addCallback(gotLDAPData) And both the database call and the ldap request will be worked on concurrently. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | -- http://radix.twistedmatrix.com | Release Manager, Twisted Project \\\V/// | -- http://twistedmatrix.com |o O| | w----v----w-+

Martin Blais

3 Oct 3 Oct

5:53 a.m.

New subject: Pythonic concurrency - cooperative MT

On 10/2/05, Christopher Armstrong wrote:

...

On 10/3/05, Martin Blais wrote:

...
On 10/1/05, Antoine wrote:

...
...
like this with their "deferred objects", no? I figure they would need to do something like this too. I will have to check.)

A Deferred object is just the abstraction of a callback - or, rather, two callbacks: one for success and one for failure. Twisted is architected around an event loop, which calls your code back when a registered event happens (for example when an operation is finished, or when some data arrives on the wire). Compared to generators, it is a different way of expressing cooperative multi-threading.

So, the question is, in Twisted, if I want to defer on an operation that is going to block, say I'm making a call to run a database query that I'm expecting will take much time, and want to yield ("defer") for other events to be processed while the query is executed, how do I do that? As far as I remember the Twisted docs I read a long time ago did not provide a solution for that.

Deferreds don't make blocking code non-blocking; they're just a way to make it nicer to write non-blocking code. There are utilities in Twisted for wrapping a blocking function call in a thread and having the result returned in a Deferred, though (see deferToThread). There is also a lightweight and complete wrapper for DB-API2 database modules in twisted.enterprise.adbapi, which does the threading interaction for you.

So, since this then exposes a non-blocking API, you can do stuff like

d = pool.runQuery('SELECT User_ID FROM Users') d.addCallback(gotDBData) d2 = ldapfoo.getUser('bob') d2.addCallback(gotLDAPData)

And both the database call and the ldap request will be worked on concurrently.

Very nice! However, if you're using a thread to do just that, it's just using a part of what threads were designed for: it's really just using the low-level kernel knowledge about resource access and when they become ready to wait on the resource, since you're not going to run much actual code in the thread itself (apart from setting up to do the blocking call and returning its value). Now, if we had something in the language that allows us to do something like that--make the most important potentially blocking calls asynchronously-- we could implement a more complete scheduler that could really leverage generators to create a more interesting concurrency solution with less overhead. For example, imagine that some class of generators are used as tasks, like we were discussing before. When you would call the special yield_read() call (a variation on e.g. os.read() call), there is an implicit yield that allows other generators which are ready to run until the data is available, without the overhead of 1. context switching to the helper threads and back; 2. synchronization for communcation with the helper threads (I assume threads would not be created dynamically, for efficiency. I imagine there is a pool of helpers waiting to do the async call jobs, and communication with them to dispatch the call jobs does not come for free (i.e. locking)). We really don't need threads at all to do that (at least for the common blocking calls), just some low-level support for building a scheduler. Using threads to do that has a cost, it is more or less a kludge, in that context (but we have nothing better for now). cheers,

Bruce Eckel

6 Oct 6 Oct

5:12 p.m.

New subject: Pythonic concurrency

Jeremy Jones published a blog discussing some of the ideas we've talked about here: http://www.oreillynet.com/pub/wlg/8002 Although I hope our conversation isn't done, as he suggests! At some point when more ideas have been thrown about (and TIJ4 is done) I hope to summarize what we've talked about in an article. Bruce Eckel http://www.BruceEckel.com mailto:BruceEckel-Python3234@mailblocks.com Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e" Web log: http://www.artima.com/weblogs/index.jsp?blogger=beckel Subscribe to my newsletter: http://www.mindview.net/Newsletter My schedule can be found at: http://www.mindview.net/Calendar

Paolo Invernizzi

5:26 p.m.

New subject: Pythonic concurrency

Just to add another 2 cents.... http://www.erights.org/talks/promises/paper/tgc05.pdf --- Paolo Invernizzi Bruce Eckel wrote:

...

Jeremy Jones published a blog discussing some of the ideas we've talked about here: http://www.oreillynet.com/pub/wlg/8002 Although I hope our conversation isn't done, as he suggests!

At some point when more ideas have been thrown about (and TIJ4 is done) I hope to summarize what we've talked about in an article.

Michael Sparks

7:54 p.m.

New subject: Pythonic concurrency

Hi Bruce, On Thursday 06 October 2005 18:12, Bruce Eckel wrote:

...

Although I hope our conversation isn't done, as he suggests! ... At some point when more ideas have been thrown about (and TIJ4 is done) I hope to summarize what we've talked about in an article.

I don't know if you saw my previous post[1] to python-dev on this topic, but Kamaelia is specifically aimed at making concurrency simple and easy to use. Initially we were focussed on using scheduled generators for co-operative CSP-style (but with buffers) concurrency. [1] http://tinyurl.com/dfnah, http://tinyurl.com/e4jfq We've tested the system so far on 2 relatively inexperienced programmers (as well as experienced, but the more interesting group is novices). The one who hadn't done much programming at all (a little bit of VB, pre-university) actually fared better IMO. This is probably because concurrency became part of his standard toolbox of approaches. I've placed the slides I've produced for Euro OSCON on Kamaelia here: * http://cerenity.org/KamaeliaEuroOSCON2005.pdf The corrected URL for the whitepaper based on work now 6 months old (we've come quite a way since then!) is here: * http://www.bbc.co.uk/rd/pubs/whp/whp113.shtml Consider a simple server for sending text (generated by a user typing into the server) to multiple clients connecting to a server. This is a naturally concurrent problem in various ways (user interaction, splitting, listening for connections, serving connections, etc). Why is that interesting to us? It's effectively a microcosm of how subtitling works. (I work at the BBC) In Kamaelia this looks like this: === start === class ConsoleReader(threadedcomponent): def run(self): while 1: line = raw_input(">>> ") line = line + "\n" self.outqueues["outbox"].put(line) Backplane("subtitles").activate() pipeline( ConsoleReader(), publishTo("subtitles"), ).activate() def subtitles_protocol(): return subscribeTo("subtitles") SimpleServer(subtitles_protocol, 5000).run() === end === The ConsoleReader is threaded to allow the use of the naive way of reading from the input, whereas the server, backplane (a named splitter component in practice), pipelines, publishing, subscribing, splitting, etc are all single threaded co-operative concurrency. A possible client for this text service might be: pipeline( TCPClient("subtitles.rd.bbc.co.uk", 5000), Ticker(), ).run() (Though that would be a bit bare, even if it does use pygame :) The entire system is based around communicating generators, but we also have threads for blocking operations. (Though the entire network subsystem is non-blocking) What I'd be interested in, is hearing how our system doesn't match with the goals of the hypothetical concurrency system you'd like to see (if it doesn't). The main reason I'm interested in hearing this, is because the goals you listed are ones we want to achieve. If you don't think our system matches it (we don't have process migration as yet, so that's one area) I'd be interested in hearing what areas you think are deficient. However, the way we're beginning to refer to the project is to refer to just the component aspect rather than concurrency - for one simple reason - we're getting to stage where we can ignore /most/ concurrency issues(not all). If you have any time for feedback, it'd be appreciated. If you don't I hope it's useful food for thought! Best Regards, Michael -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Bruce Eckel

8:06 p.m.

New subject: Pythonic concurrency

This does look quite fascinating, and I know there's a lot of really interesting work going on at the BBC now -- looks like some really pioneering stuff going on with respect to TV show distribution over the internet, new compression formats, etc. So yes indeed, this is quite high on my list to research. Looks like people there have been doing some interesting work. Right now I'm just trying to cast a net, so that people can put in ideas, for when the Java book is done and I can spend more time on it. Thursday, October 6, 2005, 1:54:56 PM, Michael Sparks wrote:

...

Hi Bruce,

...

On Thursday 06 October 2005 18:12, Bruce Eckel wrote:

...
Although I hope our conversation isn't done, as he suggests! ... At some point when more ideas have been thrown about (and TIJ4 is done) I hope to summarize what we've talked about in an article.

...

I don't know if you saw my previous post[1] to python-dev on this topic, but Kamaelia is specifically aimed at making concurrency simple and easy to use. Initially we were focussed on using scheduled generators for co-operative CSP-style (but with buffers) concurrency. [1] http://tinyurl.com/dfnah, http://tinyurl.com/e4jfq

...

We've tested the system so far on 2 relatively inexperienced programmers (as well as experienced, but the more interesting group is novices). The one who hadn't done much programming at all (a little bit of VB, pre-university) actually fared better IMO. This is probably because concurrency became part of his standard toolbox of approaches.

...

I've placed the slides I've produced for Euro OSCON on Kamaelia here: * http://cerenity.org/KamaeliaEuroOSCON2005.pdf

...

The corrected URL for the whitepaper based on work now 6 months old (we've come quite a way since then!) is here: * http://www.bbc.co.uk/rd/pubs/whp/whp113.shtml

...

Consider a simple server for sending text (generated by a user typing into the server) to multiple clients connecting to a server. This is a naturally concurrent problem in various ways (user interaction, splitting, listening for connections, serving connections, etc). Why is that interesting to us? It's effectively a microcosm of how subtitling works. (I work at the BBC)

...

In Kamaelia this looks like this:

...

=== start === class ConsoleReader(threadedcomponent): def run(self): while 1: line = raw_input(">>> ") line = line + "\n" self.outqueues["outbox"].put(line)

...

Backplane("subtitles").activate() pipeline( ConsoleReader(), publishTo("subtitles"), ).activate() def subtitles_protocol(): return subscribeTo("subtitles")

...

SimpleServer(subtitles_protocol, 5000).run() === end ===

...

The ConsoleReader is threaded to allow the use of the naive way of reading from the input, whereas the server, backplane (a named splitter component in practice), pipelines, publishing, subscribing, splitting, etc are all single threaded co-operative concurrency.

...

A possible client for this text service might be:

...

pipeline( TCPClient("subtitles.rd.bbc.co.uk", 5000), Ticker(), ).run()

...

(Though that would be a bit bare, even if it does use pygame :)

...

The entire system is based around communicating generators, but we also have threads for blocking operations. (Though the entire network subsystem is non-blocking)

...

What I'd be interested in, is hearing how our system doesn't match with the goals of the hypothetical concurrency system you'd like to see (if it doesn't). The main reason I'm interested in hearing this, is because the goals you listed are ones we want to achieve. If you don't think our system matches it (we don't have process migration as yet, so that's one area) I'd be interested in hearing what areas you think are deficient.

...

However, the way we're beginning to refer to the project is to refer to just the component aspect rather than concurrency - for one simple reason - we're getting to stage where we can ignore /most/ concurrency issues(not all).

...

If you have any time for feedback, it'd be appreciated. If you don't I hope it's useful food for thought!

...

Best Regards,

...

Michael

Bruce Eckel http://www.BruceEckel.com mailto:BruceEckel-Python3234@mailblocks.com Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e" Web log: http://www.artima.com/weblogs/index.jsp?blogger=beckel Subscribe to my newsletter: http://www.mindview.net/Newsletter My schedule can be found at: http://www.mindview.net/Calendar

Michael Sparks

7 Oct 7 Oct

10:49 p.m.

New subject: Pythonic concurrency

On Thursday 06 October 2005 21:06, Bruce Eckel wrote:

...

So yes indeed, this is quite high on my list to research. Looks like people there have been doing some interesting work.

Right now I'm just trying to cast a net, so that people can put in ideas, for when the Java book is done and I can spend more time on it.

Thanks for your kind words. Hopefully it's of use! :-) Michael.

Josiah Carlson

6 Oct 6 Oct

10:15 p.m.

New subject: Pythonic concurrency

Michael Sparks wrote:

...

What I'd be interested in, is hearing how our system doesn't match with the goals of the hypothetical concurrency system you'd like to see (if it doesn't). The main reason I'm interested in hearing this, is because the goals you listed are ones we want to achieve. If you don't think our system matches it (we don't have process migration as yet, so that's one area) I'd be interested in hearing what areas you think are deficient.

I've not used the system you have worked on, so perhaps this is easy, but the vast majority of concurrency issues can be described as fitting into one or more of the following task distribution categories. 1. one to many (one producer, many consumers) without duplication (no consumer has the same data, essentially a distributed queue) 2. one to many (one producer, many consumers) with duplication (the producer broadcasts to all consumers) 3. many to one (many producers, one consumer) 4. many to many (many producers, many consumers) without duplication (no consumer has the same data, essentially a distributed queue) 5. many to many (many producers, many consumers) with duplication (all producers broadcast to all consumers) 6. one to one without duplication MPI, for example, handles all the above cases with minor work, and tuple space systems such as Linda can support all of the above with a bit of work in cases 2 and 5. If Kamaelia is able to handle all of the above mechanisms in both a blocking and non-blocking fashion, then I would guess it has the basic requirements for most concurrent applications. If, however, it is not able to easily handle all of the above mechanisms, or has issues with blocking and/or non-blocking semantics on the producer and/or consumer end, then it is likely that it will have difficulty gaining traction in certain applications where the unsupported mechanism is common and/or necessary. One nice thing about the message queue style (which it seems as though Kamaelia implements) is that it guarantees that a listener won't recieve the same message twice when broadcasting a message to multiple listeners (case 2 and 5 above) - something that is a bit more difficult to guarantee in a tuple space scenario, but which is still possible (which spurns me to add it into my tuple space implementation before it is released). Another nice thing is that subscriptions to a queue seem to be persistant in Kamaelia, which I should also implement. - Josiah

Michael Sparks

7 Oct 7 Oct

12:45 a.m.

New subject: Pythonic concurrency

On Thursday 06 October 2005 23:15, Josiah Carlson wrote: [... 6 specific use cases ...]

...

If Kamaelia is able to handle all of the above mechanisms in both a blocking and non-blocking fashion, then I would guess it has the basic requirements for most concurrent applications.

It can. I can easily knock up examples for each if required :-) That said, a more interesting example implemented this week (as part of a rapid prototyping project to look at collaborative community radio) implements an networked audio mixer matrix. That allows mutiple sources of audio to be mixed, sent on to multiple destinations, may be duplicate mixes of each other, but also may select different mixes. The same system also includes point to point communications for network control of the mix. That application covers ( I /think/ ) 1, 2, 3, 4, and 6 on your list of things as I understand what you mean. 5 is fairly trivial though. (The largest bottleneck in writing it was my personal misunderstanding of how to actually mix 16bit signed audio :-) Regarding blocking & non-blocking, links can be marked to synchronous, which forces blocking style behaviour. Since generally we're using generators, we can't block for real which is why we throw an exception there. However, threaded components can & do block. The reason for this was due to the architecture being inspired by noting the similarities between asynchronous hardware systems/langages and network systems.

...

into my tuple space implementation before it is released.

I'd be interested in hearing more about that BTW. One thing we've found is that much organic systems have a neural system for communications between things, (hence Axon :), that you also need to equivalent of a hormonal system. In the unix shell world, IMO the environment acts as that for pipelines, and similarly that's why we have an assistant system. (Which has key/value lookup facilities) It's a less obvious requirement, but is a useful one nonetheless, so I don't really see a message passing style as excluding a linda approach - since they're orthoganal approaches. Best Regards, Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Josiah Carlson

6:25 a.m.

New subject: Pythonic concurrency

Michael Sparks wrote:

...

On Thursday 06 October 2005 23:15, Josiah Carlson wrote: [... 6 specific use cases ...]

...
If Kamaelia is able to handle all of the above mechanisms in both a blocking and non-blocking fashion, then I would guess it has the basic requirements for most concurrent applications.

It can. I can easily knock up examples for each if required :-)

That's cool, I trust you. One thing I notice is absent from the Kamaelia page is benchmarks. On the one hand, benchmarks are technically useless, as one can tend to benchmark those things that a system does well, and ignore those things that it does poorly (take, for example how PyLinda's speed test only ever inserts and removes one tuple at a time...try inserting 100k and use wildcards to extract those 100k, and you'll note how poor it performs, or database benchmarks, etc.). However, if one's benchmarks provide examples from real use, then it shows that at least someone has gotten some X performance from the system. I'm personally interested in latency and throughput for varying sizes of data being passed through the system.

...

That said, a more interesting example implemented this week (as part of a rapid prototyping project to look at collaborative community radio) implements an networked audio mixer matrix. That allows mutiple sources of audio to be mixed, sent on to multiple destinations, may be duplicate mixes of each other, but also may select different mixes. The same system also includes point to point communications for network control of the mix.

Very neat. How much data? What kind of throughput? What kinds of latencies?

...

That application covers ( I /think/ ) 1, 2, 3, 4, and 6 on your list of things as I understand what you mean. 5 is fairly trivial though.

Cool.

...

Regarding blocking & non-blocking, links can be marked to synchronous, which forces blocking style behaviour. Since generally we're using generators, we can't block for real which is why we throw an exception there. However, threaded components can & do block. The reason for this was due to the architecture being inspired by noting the similarities between asynchronous hardware systems/langages and network systems.

On the client side, I was lazy and used synchronous/blocking sockets to block on read/write (every client thread gets its own connection, meaning that tuple puts are never sitting in a queue). I've also got server-side timeouts for when you don't want to wait too long for data. rslt = tplspace.get(PATTERN, timeout=None)

...

...
into my tuple space implementation before it is released.

I'd be interested in hearing more about that BTW. One thing we've found is that much organic systems have a neural system for communications between things, (hence Axon :), that you also need to equivalent of a hormonal system. In the unix shell world, IMO the environment acts as that for pipelines, and similarly that's why we have an assistant system. (Which has key/value lookup facilities)

I have two recent posts about the performance and features of a (hacked together) tuple space system I worked on (for two afternoons) in my blog. "Feel Lucky" for "Josiah Carlson" in google and you will find it.

...

It's a less obvious requirement, but is a useful one nonetheless, so I don't really see a message passing style as excluding a linda approach - since they're orthoganal approaches.

Indeed. For me, the idea of being able to toss a tuple into memory somewhere and being able to find it later maps into my mind as: ('name', arg1, ...) -> name(arg1, ...), which is, quite literally, an RPC semantic (which seems a bit more natural to me than subscribing to the 'name' queue). With the ability to send to either single or multiple listeners, you get message passing, broadcast messages, and a standard job/result queueing semantic. The only thing that it is missing is a prioritization mechanism (fifo, numeric priority, etc.), which would get us a job scheduling kernel. Not bad for a "message passing"/"tuple space"/"IPC" library. (all of the above described have direct algorithms for implementation). - Josiah

Bruce Eckel

4:47 p.m.

New subject: Pythonic concurrency

Early in this thread there was a comment to the effect that "if you don't know how to use threads, don't use them," which I pointedly avoided responding to because it seemed to me to simply be inflammatory. But Ian Bicking just posted a weblog entry: http://blog.ianbicking.org/concurrency-and-processes.html where he says "threads aren't as hard as they imply" and "An especially poor argument is one that tells me that I'm currently being beaten with a stick, but apparently don't know it." I always have a problem with this. After many years of studying concurrency on-and-off, I continue to believe that threading is very difficult (indeed, the more I study it, the more difficult I understand it to be). And I admit this. The comments I sometimes get back are to the effect that "threading really isn't that hard." Thus, I am just too dense to get it. It's hard to know how to answer. I've met enough brilliant people to know that it's just possible that the person posting really does easily grok concurrency issues and thus I must seem irreconcilably thick. This may actually be one of those people for whom threading is obvious (and Ian has always seemed like a smart guy, for example). But. I do happen to have contact with a lot of people who are at the forefront of the threading world, and *none* of them (many of whom have written the concurrency libraries for Java 5, for example) ever imply that threading is easy. In fact, they generally go out of their way to say that it's insanely difficult. And Java has taken until version 5 to (apparently) get it right, partly by defining a new memory model in order to accurately describe what goes on with threading issues. This same model is being adapted for the next version of C++. This is not stuff that was already out there, that everyone knew about -- this is new stuff. Also, look at the work that Scott Meyers, Andrei Alexandrescu, et al did on the "Double Checked Locking" idiom, showing that it was broken under threading. That was by no means "trivial and obvious" during all the years that people thought that it worked. My own experience in discussions with folks who think that threading is transparent usually uncovers, after a few appropriate questions, that said person doesn't actually understand the depth of the issues involved. A common story is someone who has written a few programs and convinced themselves that these programs work (the "it works for me" proof of correctness). Thus, concurrency must be easy. I know about this because I have learned the hard way throughout many years, over and over again. Every time I've thought that I understood concurrency, something new has popped up and shown me a whole new aspect of things that I have heretofore missed. Then I start thinking "OK, now I finally understand concurrency." One example: when I was rewriting the threading chapter for the 3rd (previous) edition of Thinking in Java, I decided to get a dual-processor machine so I could really test things. This way, I discovered that the behavior of a program on a single-processor machine could be dramatically different than the same program on a multiprocessor machine. That seems obvious, now, but at the time I thought I was writing pretty reasonable code. In addition, it turns out that some things in Java concurrency were broken (even the people who were creating thread support in the language weren't getting it right) so that threw in extra monkey wrenches. And when you start studying the new memory model, which takes into account instruction reordering and cache coherency issues, you realize that it's mind-numbingly far from trivial. Or maybe not, for those who think it's easy. But my experience is that the people who really do understand concurrency never suggest that it's easy. Bruce Eckel http://www.BruceEckel.com mailto:BruceEckel-Python3234@mailblocks.com Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e" Web log: http://www.artima.com/weblogs/index.jsp?blogger=beckel Subscribe to my newsletter: http://www.mindview.net/Newsletter My schedule can be found at: http://www.mindview.net/Calendar

Aahz

5:45 p.m.

New subject: Pythonic concurrency

On Fri, Oct 07, 2005, Bruce Eckel wrote:

...

I always have a problem with this. After many years of studying concurrency on-and-off, I continue to believe that threading is very difficult (indeed, the more I study it, the more difficult I understand it to be). And I admit this. The comments I sometimes get back are to the effect that "threading really isn't that hard." Thus, I am just too dense to get it.

What I generally say is that threading isn't too hard if you stick with some fairly simple idioms and tools -- and make absolutely certain to follow some rules about sharing data. But it's certainly true that threading (and concurrency) in general is mind-numbingly complex. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair

Phillip J. Eby

6:07 p.m.

New subject: Pythonic concurrency

At 10:47 AM 10/7/2005 -0600, Bruce Eckel wrote:

...

Also, look at the work that Scott Meyers, Andrei Alexandrescu, et al did on the "Double Checked Locking" idiom, showing that it was broken under threading. That was by no means "trivial and obvious" during all the years that people thought that it worked.

One of the nice things about the GIL is that it means double-checked locking *does* work in Python. :)

...

My own experience in discussions with folks who think that threading is transparent usually uncovers, after a few appropriate questions, that said person doesn't actually understand the depth of the issues involved. A common story is someone who has written a few programs and convinced themselves that these programs work (the "it works for me" proof of correctness). Thus, concurrency must be easy.

I know about this because I have learned the hard way throughout many years, over and over again. Every time I've thought that I understood concurrency, something new has popped up and shown me a whole new aspect of things that I have heretofore missed. Then I start thinking "OK, now I finally understand concurrency."

The day when I knew, beyond all shadow of a doubt, that the people who say threading is easy are full of it, is when I wrote an event-driven co-operative multitasking system in Python and managed to create a race condition in *single-threaded code*. Of course, due to its nature, a race condition in an event-driven system is at least reproducible given the same sequence of events, and it's fixable using "turns" (as described in a paper posted here yesterday). With threads, it's not anything like reproducible, because pre-emptive threading is non-deterministic. What the GIL-ranters don't get is that the GIL actually gives you just enough determinism to be able to write threaded programs that don't crash, and that maybe will even work if you treat every point of interaction between threads as a minefield and program with appropriate care. So, if threads are "easy" in Python compared to other langauges, it's *because of* the GIL, not in spite of it.

Ian Bicking

10 Oct 10 Oct

9:57 p.m.

New subject: Pythonic concurrency

Phillip J. Eby wrote:

...

What the GIL-ranters don't get is that the GIL actually gives you just enough determinism to be able to write threaded programs that don't crash, and that maybe will even work if you treat every point of interaction between threads as a minefield and program with appropriate care. So, if threads are "easy" in Python compared to other langauges, it's *because of* the GIL, not in spite of it.

Three cheers for the GIL! For the record, since I was quoted at the beginning of this subthread, *I* don't think threads are easy. But among all ways to handle concurrency, I just don't think they are so bad. And unlike many alternatives, they are relatively easy to get started with, and you can do a lot of work in a threaded system without knowing anything about threads. Of course, threads aren't the only way to accomplish that, just one of the easiest. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org

Greg Ewing

11 Oct 11 Oct

12:09 a.m.

New subject: Pythonic concurrency

Ian Bicking wrote:

...

What the GIL-ranters don't get is that the GIL actually gives you just enough determinism to be able to write threaded programs that don't crash,

The GIL no doubt helps, but your threads can still get preempted between bytecodes, so I can't see it making much difference at the Python thought-level. I'm wondering whether Python threads should be non-preemptive by default. Preemptive threading is massive overkill for many applications. You don't need it, for example, if you just want to use threads to structure your program, overlap processing with I/O, etc. Preemptive threading would still be there as an option to turn on when you really need it. Or perhaps there could be a priority system, with a thread only able to be preempted by a thread of higher priority. If you ignore priorities, all your threads default to the same priority, so there's no preemption. If you want a thread that can preempt others, you give it a higher priority. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+

Guido van Rossum

12:18 a.m.

New subject: Pythonic concurrency

On 10/10/05, Greg Ewing wrote:

...

I'm wondering whether Python threads should be non-preemptive by default. Preemptive threading is massive overkill for many applications. You don't need it, for example, if you just want to use threads to structure your program, overlap processing with I/O, etc.

I recall using a non-preemptive system in the past; in Amoeba, to be precise. Initially it worked great. But as we added more powerful APIs to the library, we started to run into bugs that were just as if you had preemptive scheduling: it wouldn't always be predictable whether a call into the library would need to do I/O or not (it might use some sort of cache) so it would sometimes allow other threads to run and sometimes not. Or a change to the library would change this behavior (making a call that didn't use to block into sometimes-blocking). Given the tendency of Python developers to build layers of abstractions I don't think it will help much.

...

Preemptive threading would still be there as an option to turn on when you really need it.

Or perhaps there could be a priority system, with a thread only able to be preempted by a thread of higher priority. If you ignore priorities, all your threads default to the same priority, so there's no preemption. If you want a thread that can preempt others, you give it a higher priority.

If you ask me, priorities are worse than the problem they are trying to solve. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Christopher Armstrong

1:01 a.m.

New subject: Pythonic concurrency

On 10/11/05, Guido van Rossum wrote:

...

I recall using a non-preemptive system in the past; in Amoeba, to be precise.

Initially it worked great.

But as we added more powerful APIs to the library, we started to run into bugs that were just as if you had preemptive scheduling: it wouldn't always be predictable whether a call into the library would need to do I/O or not (it might use some sort of cache) so it would sometimes allow other threads to run and sometimes not. Or a change to the library would change this behavior (making a call that didn't use to block into sometimes-blocking).

I'm going to be giving a talk at OSDC (in Melbourne) this year about concurrency systems, and I'm going to talk a lot about the subtleties between these various non-preemptive (let's call them cooperative :) systems. I advocate a system that gives you really straightforward-looking code, but still requires you to annotate the fact that context switches can occur on every frame where they might occur (i.e., with a yield). I've given examples before of my new 2.5-yield + twisted Deferred code here, but to recap it just means that you have to do: def foo(): x = yield getPage() return "Yay" when you want to download a web page, and the caller of 'foo' would *also* need to do something like "yay = yield foo()". I think this is a very worthwhile tradeoff for those obsessed with "natural" code. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | -- http://radix.twistedmatrix.com | Release Manager, Twisted Project \\\V/// | -- http://twistedmatrix.com |o O| | w----v----w-+

Bill Janssen

1:05 a.m.

New subject: Pythonic concurrency

Guido writes:

...

Given the tendency of Python developers to build layers of abstractions I don't think [non-preemptive threads] will help much.

I think that's right, although I think adding priorities to Python's existing preemptive threads might be useful for real-time programmers (yes, as machines continue to get faster people are writing real-time software on top of VMs). IMO, if one understands the issues of simultaneous memory access by multiple threads, and understands condition variables (and their underlying concept of mutexes), threads are pretty easy to use. Getting into the habit of always writing thread-safe code is a good idea, too. It would be nice if some of these programming environments (IDLE, Emacs, Eclipse, Visual Studio) provided better support for analysis of threading issues in programs. I'd love to have the Interlisp thread inspector for Python. I sympathize with Bruce's Java experience, though. Java's original threading design is one of the many misfeatures of that somewhat horrible language (along with lack of multiple-inheritance, hybrid types, omission of unsigned integers, static typing, etc.). Synchronized methods is a weird way of presenting mutexes, IMO. Java's condition variables don't (didn't? has this been fixed?) quite work. The emphasis on portability and the resulting notions of red/green threading packages at the beginning didn't help either. Read Allen Holub's book. And Doug Lea's book. I understand much of this has been addressed with a new package in Java 1.5. Bill

Bruce Eckel

4:53 p.m.

New subject: Pythonic concurrency

...

Java's condition variables don't (didn't? has this been fixed?) quite work. The emphasis on portability and the resulting notions of red/green threading packages at the beginning didn't help either. Read Allen Holub's book. And Doug Lea's book. I understand much of this has been addressed with a new package in Java 1.5.

Not only are there significant new library components in java.util.concurrent in J2SE5, but perhaps more important is the new memory model that deals with issues that are (especially) revealed in multiprocessor environments. The new memory model represents new work in the computer science field; apparently the original paper is written by Ph.D.s and is a bit too theoretical for the normal person to follow. But the smart threading guys studied this and came up with the new Java memory model -- so that volatile, for example, which didn't work quite right before, does now. This is part of J2SE5, and this work is being incorporated into the upcoming C++0x. Java concurrency is certainly one of the bad examples of language design. Apparently, they grabbed stuff from C++ (mostly the volatile keyword) and combined it with what they new about pthreads, and decided that being able to declare a method as synchronized made the whole thing object-oriented. But you can see how ill-thought-out the design was because in later versions of Java some fundamental methods: stop(), suspend(), resume() and destroy(), were deprecated because ... oops, we didn't really think those out very well. And then finally, with J2SE5, it *appears* that all the kinks have been fixed, but only with some really smart folks like Doug Lea, Brian Goetz, and that gang, working long and hard on all these issues and (we hope) figuring them all out. I think threading *can* be much simpler, and I *want* it to be that way in Python. But that can only happen if the right model is chosen, and that model is not pthreads. People migrate to pthreads if they already understand it and so it might seem "simple" to them because of that. But I think we need something that supports an object-oriented approach to concurrency that doesn't prevent beginners from using it safely. Bruce Eckel

Michael Hudson

13 Oct 13 Oct

2:36 p.m.

New subject: Pythonic concurrency

Bruce Eckel writes:

...

Not only are there significant new library components in java.util.concurrent in J2SE5, but perhaps more important is the new memory model that deals with issues that are (especially) revealed in multiprocessor environments. The new memory model represents new work in the computer science field; apparently the original paper is written by Ph.D.s and is a bit too theoretical for the normal person to follow. But the smart threading guys studied this and came up with the new Java memory model -- so that volatile, for example, which didn't work quite right before, does now. This is part of J2SE5, and this work is being incorporated into the upcoming C++0x.

Do you have a link that explains this sort of thing for the layman? Cheers, mwh -- When physicists speak of a TOE, they don't really mean a theory of *everything*. Taken literally, "Everything" covers a lot of ground, including biology, art, decoherence and the best way to barbecue ribs. -- John Baez, sci.physics.research

Bruce Eckel

5:10 p.m.

New subject: Pythonic concurrency

I don't know of anything that exists. There is an upcoming book that may help: Java Concurrency in Practice, by Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, and Doug Lea (Addison-Wesley 2006). I have had assistance from some of the authors, but don't know if it introduces the concepts from the research paper. Estimated publication is February. However, you might get something from Scott Meyer's analysis of the concurrency issues surrounding the double-checked locking algorithm: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf Thursday, October 13, 2005, 8:36:21 AM, Michael Hudson wrote:

...

Bruce Eckel writes:

...

...
Not only are there significant new library components in java.util.concurrent in J2SE5, but perhaps more important is the new memory model that deals with issues that are (especially) revealed in multiprocessor environments. The new memory model represents new work in the computer science field; apparently the original paper is written by Ph.D.s and is a bit too theoretical for the normal person to follow. But the smart threading guys studied this and came up with the new Java memory model -- so that volatile, for example, which didn't work quite right before, does now. This is part of J2SE5, and this work is being incorporated into the upcoming C++0x.

...

Do you have a link that explains this sort of thing for the layman?

...

Cheers, mwh

Shane Hathaway

7 Oct 7 Oct

6:42 p.m.

New subject: Pythonic concurrency

Bruce Eckel wrote:

...

But. I do happen to have contact with a lot of people who are at the forefront of the threading world, and *none* of them (many of whom have written the concurrency libraries for Java 5, for example) ever imply that threading is easy. In fact, they generally go out of their way to say that it's insanely difficult.

What's insanely difficult is really locking, and locking is driven by concurrency in general, not just threads. It's hard to reason about locks. There are only general rules about how to apply locking correctly, efficiently, and without deadlocks. Personally, to be absolutely certain I've applied locks correctly, I have to think for hours. Even then, it's hard to express my conclusions, so it's hard to be sure future maintainers will keep the locking correct. Java uses locks very liberally, which is to be expected of a language that provides locking using a keyword. This forces Java programmers to deal with the burden of locking everywhere. It also forces the developers of the language and its core libraries to make locking extremely fast yet safe. Java threads would be easy if there wasn't so much locking going on. Zope, OTOH, is far more conservative with locks. There is some code that dispatches HTTP requests to a worker thread, and other code that reads and writes an object database, but most Zope code isn't aware of concurrency. Thus locking is hardly an issue in Zope, and as a result, threading is quite easy in Zope. Recently, I've been simulating high concurrency on a PostgreSQL database, and I've discovered that the way you reason about row and table locks is very similar to the way you reason about locking among threads. The big difference is the consequence of incorrect locking: in PostgreSQL, using the serializable mode, incorrect locking generally only leads to aborted transactions; while in Python and most programming languages, incorrect locking instantly causes corruption and chaos. That's what hurts developers. I want a concurrency model in Python that acknowledges the need for locking while punishing incorrect locking with an exception rather than corruption. *That* would be cool, IMHO. Shane

Barry Warsaw

6:58 p.m.

New subject: Pythonic concurrency

On Fri, 2005-10-07 at 14:42, Shane Hathaway wrote:

...

What's insanely difficult is really locking, and locking is driven by concurrency in general, not just threads. It's hard to reason about locks.

I think that's a very interesting observation! I have not built a tremendous number of concurrent apps, but even the dumb locking that Mailman does (which is not a great model of granularity ;) has burned many bch's (brain cell hours) to get right. Where I have used more concurrency, I generally try to structure my apps into the one-producer-many-independent-consumers architecture that was outlined in a previous message. In that case, if you can narrow your touch points to the Queue module for example, then yeah, threading is easy. A gaggle of independent workers isn't that hard to get right in Python. -Barry

Antoine Pitrou

7:13 p.m.

New subject: Pythonic concurrency

Hi, (my 2 cents, probably not very constructive)

...

Recently, I've been simulating high concurrency on a PostgreSQL database, and I've discovered that the way you reason about row and table locks is very similar to the way you reason about locking among threads. The big difference is the consequence of incorrect locking: in PostgreSQL, using the serializable mode, incorrect locking generally only leads to aborted transactions; while in Python and most programming languages, incorrect locking instantly causes corruption and chaos. That's what hurts developers. I want a concurrency model in Python that acknowledges the need for locking while punishing incorrect locking with an exception rather than corruption. *That* would be cool, IMHO.

A relational database has a very strict and regular data model. Also, it has transactions. This makes it easy to precisely define concurrency at the engine level. To apply the same thing to Python you would at least need : 1. a way to define a subset of the current bag of reachable objects which has to stay consistent w.r.t. transactions that are applied to it (of course, you would have several such subsets in any non-trivial application) 2. a way to start and end a transaction on a bag of objects (begin / commit / rollback) 3. a precise definition of the semantics of "consistency" here : for example, only one thread could modify a bag of objects at any given time, and other threads would continue to see the frozen, stable version of that bag until the next version is committed by the writing thread For 1), a helpful paradigm would be to define an object as being the "root" of a bag, and all its properties would automatically and recursively (or not ?) belong to this bag. One has to be careful that no property "leaks" and makes the bag become the set of all reachable Python objects (one could provide a means to say that a specific property must not be transitively put in the bag). Then, use my_object.begin_transaction() and my_object.commit_transaction(). The implementation of 3) does not look very obvious ;-S Regards Antoine.

Shane Hathaway

8:55 p.m.

New subject: Pythonic concurrency

Antoine Pitrou wrote:

...

A relational database has a very strict and regular data model. Also, it has transactions. This makes it easy to precisely define concurrency at the engine level.

To apply the same thing to Python you would at least need : 1. a way to define a subset of the current bag of reachable objects which has to stay consistent w.r.t. transactions that are applied to it (of course, you would have several such subsets in any non-trivial application) 2. a way to start and end a transaction on a bag of objects (begin / commit / rollback) 3. a precise definition of the semantics of "consistency" here : for example, only one thread could modify a bag of objects at any given time, and other threads would continue to see the frozen, stable version of that bag until the next version is committed by the writing thread

For 1), a helpful paradigm would be to define an object as being the "root" of a bag, and all its properties would automatically and recursively (or not ?) belong to this bag. One has to be careful that no property "leaks" and makes the bag become the set of all reachable Python objects (one could provide a means to say that a specific property must not be transitively put in the bag). Then, use my_object.begin_transaction() and my_object.commit_transaction().

The implementation of 3) does not look very obvious ;-S

Well, I think you just described ZODB. ;-) I'd be happy to explain how ZODB solves those problems, if you're interested. However, ZODB doesn't provide locking, and that bothers me somewhat. If two threads try to modify an object at the same time, one of the threads will be forced to abort, unless a method has been defined for resolving the conflict. If there are too many writers, ZODB crawls. ZODB's strategy works fine when there aren't many conflicting, concurrent changes, but the complex locking done by relational databases seems to be required for handling a lot of concurrent writers. Shane

Jim Fulton

9:07 p.m.

New subject: Pythonic concurrency

Shane Hathaway wrote:

...

Antoine Pitrou wrote:

...
A relational database has a very strict and regular data model. Also, it has transactions. This makes it easy to precisely define concurrency at the engine level.

To apply the same thing to Python you would at least need : 1. a way to define a subset of the current bag of reachable objects which has to stay consistent w.r.t. transactions that are applied to it (of course, you would have several such subsets in any non-trivial application) 2. a way to start and end a transaction on a bag of objects (begin / commit / rollback) 3. a precise definition of the semantics of "consistency" here : for example, only one thread could modify a bag of objects at any given time, and other threads would continue to see the frozen, stable version of that bag until the next version is committed by the writing thread

For 1), a helpful paradigm would be to define an object as being the "root" of a bag, and all its properties would automatically and recursively (or not ?) belong to this bag. One has to be careful that no property "leaks" and makes the bag become the set of all reachable Python objects (one could provide a means to say that a specific property must not be transitively put in the bag). Then, use my_object.begin_transaction() and my_object.commit_transaction().

The implementation of 3) does not look very obvious ;-S

Well, I think you just described ZODB. ;-) I'd be happy to explain how ZODB solves those problems, if you're interested.

However, ZODB doesn't provide locking, and that bothers me somewhat. If two threads try to modify an object at the same time, one of the threads will be forced to abort, unless a method has been defined for resolving the conflict. If there are too many writers, ZODB crawls. ZODB's strategy works fine when there aren't many conflicting, concurrent changes, but the complex locking done by relational databases seems to be required for handling a lot of concurrent writers.

I don't think it would be all that hard to use a locking (rather than a time-stamp) strategy for ZODB, although ZEO would make this extra challenging. In any case, the important thing to agree on here is that transactions provide a useful approach to concurrency control in the case where - separate control flows are independent, and - we need to mediate access to shared resources. Someone else pointed out essentially the same thing at the beginning of this thread. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

Antoine Pitrou

9:19 p.m.

New subject: Pythonic concurrency

...

Well, I think you just described ZODB. ;-)

*gasp*

...

I'd be happy to explain how ZODB solves those problems, if you're interested.

Well, yes, I'm interested :) (I don't anything about Zope internals though, and I've never even used it)

Shane Hathaway

10:12 p.m.

New subject: Pythonic concurrency

Antoine Pitrou wrote:

...

...
I'd be happy to explain how ZODB solves those problems, if you're interested.

Well, yes, I'm interested :) (I don't anything about Zope internals though, and I've never even used it)

Ok. Quoting your list:

...

To apply the same thing to Python you would at least need : 1. a way to define a subset of the current bag of reachable objects which has to stay consistent w.r.t. transactions that are applied to it (of course, you would have several such subsets in any non-trivial application)

ZODB holds a tree of objects. When you add an attribute to an object managed by ZODB, you're expanding the tree. Consistency comes from several features: - Each thread has its own lazy copy of the object tree. - The application doesn't see changes to the object tree except at transaction boundaries. - The ZODB store keeps old revisions, and the new MVCC feature lets the application see the object system as it was at the beginning of the transaction. - If you make a change to the object tree that conflicts with a concurrent change, all changes to that copy of the object tree are aborted.

...

2. a way to start and end a transaction on a bag of objects (begin / commit / rollback)

ZODB includes a transaction module that does just that. In fact, the module is so useful that I think it belongs in the standard library.

...

3. a precise definition of the semantics of "consistency" here : for example, only one thread could modify a bag of objects at any given time, and other threads would continue to see the frozen, stable version of that bag until the next version is committed by the writing thread

As mentioned above, the key is that ZODB maintains a copy of the objects per thread. A fair amount of RAM is lost that way, but the benefit in simplicity is tremendous. You also talked about the risk that applications would accidentally pull a lot of objects into the tree just by setting an attribute. That can and does happen, but the most common case is already solved by the pickle machinery: if you pickle something global like a class, the pickle stores the name and location of the class instead of the class itself. Shane

Nick Coghlan

10:54 p.m.

New subject: Pythonic concurrency

Bruce Eckel wrote:

...

I always have a problem with this. After many years of studying concurrency on-and-off, I continue to believe that threading is very difficult (indeed, the more I study it, the more difficult I understand it to be). And I admit this. The comments I sometimes get back are to the effect that "threading really isn't that hard." Thus, I am just too dense to get it.

The few times I have encountered anyone saying anything resembling "threading is easy", it was because the full sentence went something like "threading is easy if you use message passing and copy-on-send or release-reference-on-send to communicate between threads, and limit the shared data structures to those required to support the messaging infrastructure". And most of the time there was an implied "compared to using semaphores and locks directly, " at the start. Which is obiously a far cry from simply saying "threading is easy". If I encountered anyone who thought it was easy *in general*, then I would fear any threaded code they wrote, because they clearly weren't thinking about the problem hard enough ;) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com

Donovan Baarda

10 Oct 10 Oct

2:45 p.m.

New subject: Pythonic concurrency

On Fri, 2005-10-07 at 23:54, Nick Coghlan wrote: [...]

...

The few times I have encountered anyone saying anything resembling "threading is easy", it was because the full sentence went something like "threading is easy if you use message passing and copy-on-send or release-reference-on-send to communicate between threads, and limit the shared data structures to those required to support the messaging infrastructure". And most of the time there was an implied "compared to using semaphores and locks directly, " at the start.

LOL! So threading is easy if you restrict inter-thread communication to message passing... and what makes multi-processing hard is your only inter-process communication mechanism is message passing :-) Sounds like yet another reason to avoid threading and use processes instead... effort spent on threading based message passing implementations could instead be spent on inter-process messaging. -- Donovan Baarda

Michael Sparks

1:58 p.m.

New subject: Pythonic concurrency

On Monday 10 Oct 2005 15:45, Donovan Baarda wrote:

...

Sounds like yet another reason to avoid threading and use processes instead... effort spent on threading based message passing implementations could instead be spent on inter-process messaging.

I can't let that pass (even if our threaded component has a couple of warts at the moment). # Blocking thread example (uses raw_input) to single threaded pygame # display ticker. (The display is rate limited to 8 words per second at # most since it was designed for subtitles) # from Axon.ThreadedComponent import threadedcomponent from Kamaelia.Util.PipelineComponent import pipeline from Kamaelia.UI.Pygame.Ticker import Ticker class ConsoleReader(threadedcomponent): def __init__(self, prompt=">>> "): super(ConsoleReader, self).__init__() self.prompt = prompt def run(self): # implementation wart, should be "main" while 1: line = raw_input(self.prompt) line = line + "\n" self.outqueues["outbox"].put(line) # implementation wart, should be self.send(line, "outbox") pipeline( ConsoleReader(), Ticker() # Single threaded pygame based text ticker ).run() There's other ways with other systems to achieve the same goal. Inter-process based messaging can be done in various ways. The API though can look pretty much the same. (There's obviously some implications of crossing process boundaries though, but that's for the system composer to deal with, not the components). Regards, Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Nick Coghlan

11 Oct 11 Oct

10:04 a.m.

New subject: Pythonic concurrency

Donovan Baarda wrote:

...

On Fri, 2005-10-07 at 23:54, Nick Coghlan wrote: [...]

...
The few times I have encountered anyone saying anything resembling "threading is easy", it was because the full sentence went something like "threading is easy if you use message passing and copy-on-send or release-reference-on-send to communicate between threads, and limit the shared data structures to those required to support the messaging infrastructure". And most of the time there was an implied "compared to using semaphores and locks directly, " at the start.

LOL! So threading is easy if you restrict inter-thread communication to message passing... and what makes multi-processing hard is your only inter-process communication mechanism is message passing :-)

Sounds like yet another reason to avoid threading and use processes instead... effort spent on threading based message passing implementations could instead be spent on inter-process messaging.

Actually, I think it makes it worth building a decent message-passing paradigm (like, oh, PEP 342) that can then be scaled using backends with four different levels of complexity: - logical threading (generators) - physical threading (threading.Thread and Queue.Queue) - multiple processing - distributed processing Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com

Kalle Anke

8 Oct 8 Oct

7:55 a.m.

New subject: Pythonic concurrency

On Fri, 7 Oct 2005 18:47:51 +0200, Bruce Eckel wrote (in article <415220344.20051007104751@MailBlocks.com>):

...

It's hard to know how to answer. I've met enough brilliant people to know that it's just possible that the person posting really does easily grok concurrency issues and thus I must seem irreconcilably thick. This may actually be one of those people for whom threading is obvious (and Ian has always seemed like a smart guy, for example).

I think it depends on which "level" you're talking about, concurrency IS very easy and "natural" at a conceptual level. It's also quite easy for doing basic stuff ... but it can become very complicated if you introduce different requirements and/or the system becomes complex and/or you're going to implement the actual mechanism. That's my limited experience (personally, I really like concurrency ... and to be honest, some people can't really understand the concept at all while others have no problem so it's a "personal thing" also)

Donovan Baarda

10 Oct 10 Oct

3:01 p.m.

New subject: Pythonic concurrency

On Fri, 2005-10-07 at 17:47, Bruce Eckel wrote:

...

Early in this thread there was a comment to the effect that "if you don't know how to use threads, don't use them," which I pointedly avoided responding to because it seemed to me to simply be inflammatory. But Ian Bicking just posted a weblog entry: http://blog.ianbicking.org/concurrency-and-processes.html where he says "threads aren't as hard as they imply" and "An especially poor argument is one that tells me that I'm currently being beaten with a stick, but apparently don't know it."

The problem with threads is at first glance they appear easy, which seduces many beginning programmers into using them. The hard part is knowing when and how to lock shared resources... at first glance you don't even realise you need to do this. So many threaded applications are broken and don't know it, because this kind of broken-ness is nearly always intermittant and very hard to reproduce and debug. One common alternative is async polling frameworks like Twisted. These scare beginners away because a first glance, they appear hideously complicated. However, if you take the time to get your head around them, you get a better feel for all the nasty implications of concurrency, and end up designing better applications. This is the reason why, given a choice between an async and a threaded implementation of an application, I will always choose the async solution. Not because async is inherently better than threading, but because the programmer who bothered to grock async is more likely to get it right. -- Donovan Baarda

Bill Janssen

5:59 p.m.

New subject: Pythonic concurrency

...

The problem with threads is at first glance they appear easy...

Anyone who thinks that a "glance" is enough to understand something is too far gone to worry about. On the other hand, you might be referring to a putative brokenness of the Python documentation on Python threads. I'm not sure they're broken, though. They just point out the threading that Python provides, for folks who want to use threads. Are they required to provide a full course in threads?

...

...which seduces many beginning programmers into using them.

Don't worry about this. That's how "beginning programmers" learn.

...

The hard part is knowing when and how to lock shared resources...

Well, I might say the "careful part".

...

...at first glance you don't even realise you need to do this.

Again, I'm not sure why you care what "glancers" do and don't realize. You could say the same about most algorithms and data structures. Bill

skip＠pobox.com

6:20 p.m.

New subject: Pythonic concurrency

>> The hard part is knowing when and how to lock shared resources... Bill> Well, I might say the "careful part". With the Mojam middleware stuff I suffered quite awhile with a single-threaded implementation that would hang the entire webserver if a backend query took too long. I realized I needed to do something (threads, asyncore, whatever), but didn't think I understood the issues well enough to do it right. Once I finally bit the bullet and switched to a multithreaded implementation, I didn't have too much trouble. Of course, the application was pretty mature at that point and I understood what objects were shared and needed to be locked. Oh, and I took Aahz's admonition to heart and pretty much stuck to using Queues for all synchronization. It ain't rocket science, but it can be subtle. Skip

Bill Janssen

7:26 p.m.

New subject: Pythonic concurrency

Skip,

...

With the Mojam middleware stuff I suffered quite awhile with a single-threaded implementation that would hang the entire webserver if a backend query took too long. I realized I needed to do something (threads, asyncore, whatever), but didn't think I understood the issues well enough to do it right.

Yes, there's a troublesome meme in the world: "threads are hard". They aren't, really. You just have to know what you're doing. But that meme seems to keep quite capable people from doing things they are well qualified to do.

...

Once I finally bit the bullet and switched to a multithreaded implementation, I didn't have too much trouble.

Yep.

...

Of course, the application was pretty mature at that point and I understood what objects were shared and needed to be locked.

Kind of like managing people, isn't it :-?. I've done a lot of middleware myself, of course. ILU was based on a thread-safe C library and worked with Python threads quite well. Lately I've been building UpLib (a threaded Python service) on top of Medusa, which has worked quite well. UpLib handles calls sequentially, but uses threads internally to manage underlying data transformations. Medusa almost but not quite supports per-request threads; I'm wondering if I should just fix that and post a patch. Or would that just be re-creating ZServer, which I admit I haven't figured out how to look at? Bill

Bruce Eckel

8:15 p.m.

New subject: Pythonic concurrency

...

Yes, there's a troublesome meme in the world: "threads are hard". They aren't, really. You just have to know what you're doing.

I would say that the troublesome meme is that "threads are easy." I posted an earlier, rather longish message about this. The gist of which was: "when someone says that threads are easy, I have no idea what they mean by it." Perhaps this means "threads in Python are easier than threads in other languages." But I just finished a 150-page chapter on Concurrency in Java which took many months to write, based on a large chapter on Concurrency in C++ which probably took longer to write. I keep in reasonably good touch with some of the threading experts. I can't get any of them to say that it's easy, even though they really do understand the issues and think about it all the time. *Because* of that, they say that it's hard. So alright, I'll take the bait that you've laid down more than once, now. Perhaps you can go beyond saying that "threads really aren't hard" and explain the aspects of them that seem so easy to you. Perhaps you can give a nice clear explanation of cache coherency and memory barriers in multiprocessor machines? Or explain atomicity, volatility and visibility? Or, even better, maybe you can come up with a better concurrency model, which is what I think most of us are looking for in this discussion. Bruce Eckel http://www.BruceEckel.com mailto:BruceEckel-Python3234@mailblocks.com Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e" Web log: http://www.artima.com/weblogs/index.jsp?blogger=beckel Subscribe to my newsletter: http://www.mindview.net/Newsletter My schedule can be found at: http://www.mindview.net/Calendar

Neil Hodgson

11:52 p.m.

New subject: Pythonic concurrency

Bruce Eckel:

...

I would say that the troublesome meme is that "threads are easy." I posted an earlier, rather longish message about this. The gist of which was: "when someone says that threads are easy, I have no idea what they mean by it."

I think you are overcomplicating the issue by looking at too many levels at once. The memory model is something that implementers of threading support need to understand. Users of that threading support just need to know that concurrent access to variables is dangerous and that they should use locks to access shared variables or use other forms of packaged inter-thread communication. Double Checked Locking is an optimization (removal of a lock) of an attempt to better modularize code (by automating the helper object creation). I'd either just leave the lock in or if benchmarking revealed an unacceptable performance problem, allocate the helper object before the resource is accessible to more than one thread. For statics, expose an Init method that gets called when the application is in the initial one user thread state.

...

But I just finished a 150-page chapter on Concurrency in Java which took many months to write, based on a large chapter on Concurrency in C++ which probably took longer to write. I keep in reasonably good touch with some of the threading experts. I can't get any of them to say that it's easy, even though they really do understand the issues and think about it all the time. *Because* of that, they say that it's hard.

Implementing threading is hard. Using threading is not that hard. Its a source of complexity but so are many aspects of development. I get scared by reentrance in UI code. Neil

Nick Coghlan

11 Oct 11 Oct

10:36 a.m.

New subject: Pythonic concurrency

Bruce Eckel wrote:

...

...
Yes, there's a troublesome meme in the world: "threads are hard". They aren't, really. You just have to know what you're doing.

I would say that the troublesome meme is that "threads are easy." I posted an earlier, rather longish message about this. The gist of which was: "when someone says that threads are easy, I have no idea what they mean by it."

Perhaps this means "threads in Python are easier than threads in other languages."

One key thing is that the Python is so dynamic that the compiler can't get too fancy with the order in which it does things. However, Python threading has its own traps for the unwary (mainly related to badly-behaved C extensions, but they're still traps). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com

Steve Holden

11:06 a.m.

New subject: Pythonic concurrency

Bruce Eckel wrote: [Bill Janssen]

...

...
Yes, there's a troublesome meme in the world: "threads are hard". They aren't, really. You just have to know what you're doing.

But that begs the question, because there is a significant amount of evidence that when it comes to threads "knowing what you are doing" is hard to the point that people can *think* they do when they demonstrably don't!

...

I would say that the troublesome meme is that "threads are easy." I posted an earlier, rather longish message about this. The gist of which was: "when someone says that threads are easy, I have no idea what they mean by it."

I would suggest that the truth lies in the middle ground, and would say that "you can get yourself into a lot of trouble using threads without considering the subtleties". It's an area where anything but the most simplistic solutions are almost always wrong at some point.

...

Perhaps this means "threads in Python are easier than threads in other languages."

But I just finished a 150-page chapter on Concurrency in Java which took many months to write, based on a large chapter on Concurrency in C++ which probably took longer to write. I keep in reasonably good touch with some of the threading experts. I can't get any of them to say that it's easy, even though they really do understand the issues and think about it all the time. *Because* of that, they say that it's hard.

So alright, I'll take the bait that you've laid down more than once, now. Perhaps you can go beyond saying that "threads really aren't hard" and explain the aspects of them that seem so easy to you. Perhaps you can give a nice clear explanation of cache coherency and memory barriers in multiprocessor machines? Or explain atomicity, volatility and visibility? Or, even better, maybe you can come up with a better concurrency model, which is what I think most of us are looking for in this discussion.

The nice thing about Python threads (or rather threading.threads) is that since each thread is an instance it's *relatively* easy to ensure that a thread restricts itself to manipulating thread-local resources (i.e. instance members). This makes it possible to write algorithms parameterized for the number of "worker threads" where the workers are taking their tasks off a Queue with entries generated by a single producer thread. With care, multiple producers can be used. More complex inter-thread communications are problematic, and arbitrary access to foreign-thread state is a nightmare (although the position has been somewhat alleviated by the introduction of threading.local). Beyond the single-producer many-consumers model there is still plenty of room to shoot yourself in the foot. In the case of threads true sophistication is staying away from the difficult cases, an option which unfortunately isn't always available in the real world. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/

Robert Brewer

2:46 p.m.

New subject: Pythonic concurrency

Steve Holden wrote:

...

The nice thing about Python threads (or rather threading.threads) is that since each thread is an instance it's *relatively* easy to ensure that a thread restricts itself to manipulating thread-local resources (i.e. instance members).

This makes it possible to write algorithms parameterized for the number of "worker threads" where the workers are taking their tasks off a Queue with entries generated by a single producer thread. With care, multiple producers can be used. More complex inter-thread communications are problematic, and arbitrary access to foreign-thread state is a nightmare (although the position has been somewhat alleviated by the introduction of threading.local).

"Somewhat alleviated" and somewhat worsened. I've had half a dozen conversations in the last year about sharing data between threads; in every case, I've had to work quite hard to convince the other person that threading.local is *not* magic pixie thread dust. Each time, they had come to the conclusion that if they had a global variable, they could just stick a reference to it into a threading.local object and instantly have safe, concurrent access to it. Robert Brewer System Architect Amor Ministries fumanchu@amor.org

Nick Coghlan

2:55 p.m.

New subject: Pythonic concurrency

Robert Brewer wrote:

...

"Somewhat alleviated" and somewhat worsened. I've had half a dozen conversations in the last year about sharing data between threads; in every case, I've had to work quite hard to convince the other person that threading.local is *not* magic pixie thread dust. Each time, they had come to the conclusion that if they had a global variable, they could just stick a reference to it into a threading.local object and instantly have safe, concurrent access to it.

Ouch. Copy, yes, reference, no. . . Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com

Josiah Carlson

6:26 p.m.

New subject: Pythonic concurrency

"Robert Brewer" wrote:

...

"Somewhat alleviated" and somewhat worsened. I've had half a dozen conversations in the last year about sharing data between threads; in every case, I've had to work quite hard to convince the other person that threading.local is *not* magic pixie thread dust. Each time, they had come to the conclusion that if they had a global variable, they could just stick a reference to it into a threading.local object and instantly have safe, concurrent access to it.

*boggles* Perhaps there should be an entry in the documentation about this. Here is a proposed modification. Despite desires and assumptions to the contrary, <b>threading.local is not magic</b>. Placing references to global shared objects into threading.local <b>will not make them magically threadsafe</b>. Only by using threadsafe shared objects (by design with Queue.Queue, or by desire with lock.acquire()/release() placed around object accesses) will you have the potential for doing safe things. - Josiah

Donovan Baarda

10 Oct 10 Oct

6:39 p.m.

New subject: Pythonic concurrency

On Mon, 2005-10-10 at 18:59, Bill Janssen wrote:

...

...
The problem with threads is at first glance they appear easy...

Anyone who thinks that a "glance" is enough to understand something is too far gone to worry about. On the other hand, you might be referring to a putative brokenness of the Python documentation on Python threads. I'm not sure they're broken, though. They just point out the threading that Python provides, for folks who want to use threads. Are they required to provide a full course in threads?

I was speaking in general, not about Python in particular. If anything, Python is one of the simplest and safest platforms for threading (thanks mostly to the GIL). And I find the documentation excellent :-)

...

...
...which seduces many beginning programmers into using them.

Don't worry about this. That's how "beginning programmers" learn.

Many other things "beginning programmers" learn very quickly break if you do it wrong, until you learn to do it right. Threads are tricky in that they can "mostly work", and it can be a long while before you realise it is actually broken. I don't know how many bits of other people's code I've had to fix that worked for years until it was run on hardware fast enough to trigger that nasty race condition :-) -- Donovan Baarda

Michael Sparks

7 Oct 7 Oct

9:02 p.m.

New subject: Pythonic concurrency

[ Possibly overlengthy reply. However given a multiple sets of cans of worms... ] On Friday 07 October 2005 07:25, Josiah Carlson wrote:

...

One thing I notice is absent from the Kamaelia page is benchmarks.

That's largely for one simple reason: we haven't done any yet. At least not anything I'd call a benchmark. "There's lies, damn lies, statistics and then there's benchmarks." //Theoretically// I suspect that the system /could/ perform as well as traditional approaches to dealing with concurrent problems single threaded (and multi-thread/process). This is based on the recognition of two things: * Event systems (often implementing state machine type behaviour, not always though), often have intermediate buffers between states & operations. Some systems divide a problem into multiple reactors and stages and have communication between them, though this can sometimes be hidden. All we've done is make this much more explicit. * Event systems (and state machine based approaches) can often be used to effectively say "I want to stop and wait here, come back to me later" or simply "I'm doing something processor intensive, but I'm being nice and letting something else have a go". The use of generators here simply makes that particular behaviour more explicit. This is a nice bonus of python. [neither is a negative really, just different. The first bullet has implicit buffers in the system, the latter has a more implicit state machine in the system. ICBVW here of course.] However, COULD is not is, and whilst I say "in theory", I am painfully aware that theory and practice often have a big gulf between them. Also, I'm certain that at present our performance is nowhere near optimal. We've focussed on trying to find what works from a few perspectives rather than performance (one possible definition of correctness here, but certainly not the only one). Along the way we've made compomises in favour of clarity as to what's going on, rather than performance. For example, one are we know we can optimise is the handling of message delivery. The mini-axon tutorial represents delivery between active components as being performed by an independent party - a postman. This is precisely what happens in the current system. That can be optimised for example by collapsing outboxes into inboxes (ie removing one of the lists when a linkage is made and changing the refernce), and at that point you have a single intermediate buffer (much like an event/state system communicating between subsystems). We haven't done this yet, Whilst it would partly simplify things, it makes other areas more complex, and seems like premature optimisation. However I have performed an //informal comparison// between the use of a Kamaelia type approach and a traditional approach not using any framework at all for implementing a trivial game. (Cats bouncing around the screen scaling, rotating, etc, controlled by a user) The reason I say Kamaelia-type approach is because it was a mini-axon based experiment using collapsed outboxes to inboxes (as above). The measure I used was simply framerate. This is a fair real value and has a real use - if it drops too low, the system is simply unusable. I measured the framerate before transforming the simplistic game to work well in the framework, and after transforming it. The differences were: * 5% drop in performance/framerate * The ability to reuse much of the code in other systems and environments. From that perspective it seems acceptable (for now). This *isn't* as you would probably say a rigorous or trustable benchmark, but was a useful "smoke test" if you like of the approach. From a *pragmatic* perspective, currently the system is fast enough for simple games (say a hundred, 2 hundred, maybe more, sprites actve at once), for interactive applications, video players, realtime audio mixing and a variety of other things, so currently we're leaving that aside. Also from an even more pragmatic perspective, I would say if you're after performance and throughput then I'd say use Twisted, since it's a proven technology. **If** our stuff turns out to be useful, we'd like to find way of making our stuff available inside twisted -- if they'd like it (*) -- since we're not the least bit interested in competing with anyone :-) So far *we're* finding it useful, which is all I'd personally claim, and hope that it's useful to others. (*) The all too brief conversation I had with Tommi Virtanen at Europython suggested that he at least thought the pipeline/graphline idea was worth taking - so I'd like to do that at some point, even if it sidelines our work to date. Once we've validated the model though (which I expect to take some time, you only learn if it's validated by builiding things IMO), then we'll look at optimisation. (if the model is validated :-) All that said, I'm open to suggestion as to what sort of benchmark you'd like to see. I'm more interested in benchmarks that actually mean something rather than say X is better than Y though. Summarising them, no benchmarks, yet. If you're after speed, I'm certain you can find that elsewhere. If you're after an easy way of dealing with a concurrent problem, that's where we're starting from, and then optimising. We're very open to suggestions for improvement on both usability/leanability and on keeping doors open/open doors to performance though. I'd hate to have to rewrite everything in a another language later simply due to poor design decisions. [ Network controlled Networked Audio Mixing Matrix ]

...

Very neat. How much data? What kind of throughput? What kinds of latencies?

For the test system we tested with 3 raw PCM audio data streams. That 's 3 x 44.1Khz, 16 bit stereo - which is around 4.2Mbit/s of data from the network being processed realtime and output back to the network at 1.4Mbit/s. So, not huge numbers, but not insignificant amounts of data either. I suppose one thing I can take more time with now is to look at the specific latency of the mixer. It didn't *appear* to be large however. (there appeared to be similar latency in the system with or without the mixer) [[The aim of the rapid prototyping session was to see what could be done rather than to measure the results. The total time taken for coding the mixing matrix was 2.5 days. About 1/2 day spent on finding an issue we had with network resends regarding non-blocking sockets. A day with me totally misunderstanding how mixing raw audio byte streams works. The backplane was written during that 3 day time period. The control protocol for switching on/off mixes and querying the system though was ~1.5 hours from start to finish, including testing. To experiment with what dataflow architecture might work, I knocked up a command line controlled dynamic graph viewer (add nodes, link nodes, delete nodes) in about 5 minutes and then experimented with what the system would look like if done naively. The backplane idea became clear as useful here because we wanted to allow multiple mixers. ]] A more interesting effect we found was dealing with mouse movement in pygame where we found that *huge* numbers of messages being sent one at a time and processed one at a time (with yields after each) became a huge bottleneck. It became more sense to batch the events and pass them to client surfaces. (If that makes no sense we allow pygame components to act as if they have control of the display by giving them a surface from a pygame display service. This acts essentially as a simplistic window manager. That means pygame events need to be passed through quickly and cleanly.) The reason I like using pygame for these things is because a) it's relatively raw and fast b) games are another often /naturally/ concurrent system. Also it normally allows other senses beyond reading numbers/graphs to kick in when evaluating changes "that looks better/worse", "Theres's something wrong there".

...

I have two recent posts about the performance and features of a (hacked together) tuple space system

Great :-) I'll have a dig around.

...

The only thing that it is missing is a prioritization mechanism (fifo, numeric priority, etc.), which would get us a job scheduling kernel. Not bad for a "message passing"/"tuple space"/"IPC" library.

Sounds interesting. I'll try and find some time to have a look and have a play. FWIW, we're also missing a prioritisation mechanism right now. Though currently I have SImon Wittber's latest release of Nanothreads on my stack of to look at. I do have a soft spot for Linda type approaches though :-) Best Regards, Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Bruce Eckel

10:26 p.m.

New subject: Pythonic concurrency

...

//Theoretically// I suspect that the system /could/ perform as well as traditional approaches to dealing with concurrent problems single threaded (and multi-thread/process).

I also think it's important to factor in the possibility of multiprocessors. If Kamaelia (for example) has a very safe and straightforward programming model so that more people are easily able to use it, but it has some performance impact over more complex systems, I think the ease of use issue opens up far greater possibilities if you include multiprocessing -- because if you can easily write concurrent programs in Python, then Python could gain a significant advantage over less agile languages when multiprocessors become common. That is, with multiprocessors, it could be way easier to write a program in Python that also runs way faster than the competition. Yes, of course given enough time they might theoretically be able to write a program that is as fast or faster using their threading mechanism, but it would be so hard by comparison that they'll either never get it done or never be sure if it's reliable. That's what I'm looking for. Bruce Eckel http://www.BruceEckel.com mailto:BruceEckel-Python3234@mailblocks.com Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e" Web log: http://www.artima.com/weblogs/index.jsp?blogger=beckel Subscribe to my newsletter: http://www.mindview.net/Newsletter My schedule can be found at: http://www.mindview.net/Calendar

Michael Sparks

10:47 p.m.

New subject: Pythonic concurrency

On Friday 07 October 2005 23:26, Bruce Eckel wrote:

...

I think the ease of use issue opens up far greater possibilities if you include multiprocessing ... That's what I'm looking for.

In which case that's an area we need to push our work into sooner rather than later. After all, the PS3 and CELL arrive next year. Sun already has some interesting stuff shipping. I'd like to use that kit effectively, and more importantly make using that kit effectively available to collegues sooner rather than later. That really means multiprocess "now" not later. BTW, I hope it's clear that I'm not saying concurrency is easy per se (noting your previous post ;-) but rather than it /should/ be made as simple as is humanly possible. Thanks! Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Josiah Carlson

8 Oct 8 Oct

3:05 a.m.

New subject: Pythonic concurrency

Michael Sparks wrote:

...

[ Possibly overlengthy reply. However given a multiple sets of cans of worms... ] On Friday 07 October 2005 07:25, Josiah Carlson wrote:

...
One thing I notice is absent from the Kamaelia page is benchmarks.

That's largely for one simple reason: we haven't done any yet.

Perfectly reasonable. If you ever do, I'd be happy to know!

...

At least not anything I'd call a benchmark. "There's lies, damn lies, statistics and then there's benchmarks."

Indeed. But it does allow people to get an idea whether a system could handle their workload.

...

The measure I used was simply framerate. This is a fair real value and has a real use - if it drops too low, the system is simply unusable. I measured the framerate before transforming the simplistic game to work well in the framework, and after transforming it. The differences were: * 5% drop in performance/framerate * The ability to reuse much of the code in other systems and environments.

Single process? Multi-process single machine? Multiprocess multiple machine?

...

Also from an even more pragmatic perspective, I would say if you're after performance and throughput then I'd say use Twisted, since it's a proven technology.

I'm just curious. I keep my fingers away from Twisted as a matter of personal taste (I'm sure its great, but it's not for me).

...

All that said, I'm open to suggestion as to what sort of benchmark you'd like to see. I'm more interested in benchmarks that actually mean something rather than say X is better than Y though.

I wouldn't dream of saying that X was better or worse than Y, unless one was obvious crap (since it works for you, and you've gotten new users to use it successfully, that is obviously not the case). There are five benchmarks that I think would be interesting to see: 1. Send ~500 bytes of data round-trip from process A to process B and back on the same machine as fast as you can (simulates a synchronous message passing and discovers transfer latencies) a few (tens of) thousands of times (A doesn't send message i until it has recieved message i-1 back from B). 2. Increase the number of processes that round trip with B. A quick chart of #senders vs. messages/second would be far more than adequate. 3. Have process B send ~500 byte messages to many listening processes via whatever is the fastest method (direct connections, multiple subscriptions to a 'channel', etc.). Knowing #listeners vs. messages/second would be cool. 4. Send blocks of data from process A to process B (any size you want). B immediately discards the data, but you pay attention to how much data/second B recieves (a dual processor machine with proper processor affinities would be fine here). 5. Start increasing the number of processes that send data to B. A quick chart of #senders vs. total bytes/second would be far more than adequate. I'm just offering the above as example benchmarks (you certainly don't need to do them to satisfy me, but I'll be doing those when my tuple space implementation is closer to being done). They are certainly not exhaustive, but they do offer a method by which one can measure latencies, message volume throughput, data volume throughput, and ability to handle many senders and/or recipients.

...

[ Network controlled Networked Audio Mixing Matrix ]

...
Very neat. How much data? What kind of throughput? What kinds of latencies?

For the test system we tested with 3 raw PCM audio data streams. That 's 3 x 44.1Khz, 16 bit stereo - which is around 4.2Mbit/s of data from the network being processed realtime and output back to the network at 1.4Mbit/s. So, not huge numbers, but not insignificant amounts of data either. I suppose one thing I can take more time with now is to look at the specific latency of the mixer. It didn't *appear* to be large however. (there appeared to be similar latency in the system with or without the mixer)

530Kbytes/second in, 176kbytes/second out. Not bad (I imagine you are using a C library/extension of some sort to do the mixing...perhaps numarray, Numeric, ...). How large are the blocks of data that you are shuffling around at one time? 1,5,10,50,150kbytes?

...

A more interesting effect we found was dealing with mouse movement in pygame where we found that *huge* numbers of messages being sent one at a time and processed one at a time (with yields after each) became a huge bottleneck.

I can imagine.

...

The reason I like using pygame for these things is because a) it's relatively raw and fast b) games are another often /naturally/ concurrent system. Also it normally allows other senses beyond reading numbers/graphs to kick in when evaluating changes "that looks better/worse", "Theres's something wrong there".

Indeed. I'm should get my fingers into PyGame, but haven't yet due to other responsibilities.

...

...
I have two recent posts about the performance and features of a (hacked together) tuple space system

Great :-) I'll have a dig around.

Make that 3.

...

...
The only thing that it is missing is a prioritization mechanism (fifo, numeric priority, etc.), which would get us a job scheduling kernel. Not bad for a "message passing"/"tuple space"/"IPC" library.

Sounds interesting. I'll try and find some time to have a look and have a play. FWIW, we're also missing a prioritisation mechanism right now. Though currently I have SImon Wittber's latest release of Nanothreads on my stack of to look at. I do have a soft spot for Linda type approaches though :-)

I've not yet released anything. The version I'm working on essentially indexes tuples in a set of specialized structures to make certain kinds of matching fast (both insertions and removals are also fast), which has a particular kind of queue at the 'leaf' (if one were to look at it as a tree). Those queues also support listeners which want to be notified about one or many tuples which happen to match up with the pattern, resulting in the tuple being consumed by one listener, broadcast to all listeners, etc. In the case of no listeners, but someone who just wants one tuple, one can prioritize tuple fetches based on fifo, numeric priority, lifo, or whatever other useful semantic that I get around to putting in there for whatever set of tuples matches it. - Josiah

Michael Sparks

10:44 a.m.

New subject: Pythonic concurrency

On Saturday 08 October 2005 04:05, Josiah Carlson wrote: [ simplistic, informal benchmark of a test optimised versioned of the system, based on bouncing scaing rotating sprites around the screen. ]

...

Single process? Multi-process single machine? Multiprocess multiple machine?

SIngle process, single CPU, not very recent machine. (600MHz crusoe based machine so) That machine wasn't hardware accelerated though, so was only able to handle several dozen sprites before slowing down. The slowdown was due to the hardware not being able to keep up with pygame's drawing requests though rather than the framework.

...

I'm just offering the above as example benchmarks (you certainly don't need to do them to satisfy me, but I'll be doing those when my tuple space implementation is closer to being done).

I'll note them as things worth doing - they look reasonable and interesting benchmarks. (I can think of a few modifications I might make though. For example in 3 you say "fastest". I might have that as a 3b. 3a could be "simplest to use/read" or "most likely to pick". Obviously there's a good chance that's not the fastest. (Could be optimised to be under the hood I suppose, but that wouldn't be the point of the test)

...

...
[ Network controlled Networked Audio Mixing Matrix ] I imagine you are using a C library/extension of some sort to do the mixing...perhaps numarray, Numeric, ...

Nope, just plain old python (I'm now using a 1.6Ghz centrino machine though). My mixing function is particularly naive as well. To me that says more about python than my code. I did consider using pyrex to wrap (or write) an optimised version, but there didn't seem to be any need for last week (Though for a non-prototype something faster would be nice :). I'll save responding the linda things until I have a chance to read in detail what you've written. It sounds very promising though - having multiple approaches to different styles of concurrency that work nicely with each other safely is always a positive thing IMO. Thanks for the suggestions and best regards, Michael. -- "Though we are not now that which in days of old moved heaven and earth, that which we are, we are: one equal temper of heroic hearts made weak by time and fate but strong in will to strive, to seek, to find and not to yield" -- "Ulysses", Tennyson

Josiah Carlson

6:42 p.m.

New subject: Pythonic concurrency

Michael Sparks wrote:

...

On Saturday 08 October 2005 04:05, Josiah Carlson wrote:

...
I'm just offering the above as example benchmarks (you certainly don't need to do them to satisfy me, but I'll be doing those when my tuple space implementation is closer to being done).

I'll note them as things worth doing - they look reasonable and interesting benchmarks. (I can think of a few modifications I might make though. For example in 3 you say "fastest". I might have that as a 3b. 3a could be "simplest to use/read" or "most likely to pick". Obviously there's a good chance that's not the fastest. (Could be optimised to be under the hood I suppose, but that wouldn't be the point of the test)

Good point. 3a. Use 1024 byte blocks... 3b. Use whatever makes your system perform best (if you have the time to tune it)...

...

...
...
[ Network controlled Networked Audio Mixing Matrix ] I imagine you are using a C library/extension of some sort to do the mixing...perhaps numarray, Numeric, ...

Nope, just plain old python (I'm now using a 1.6Ghz centrino machine though). My mixing function is particularly naive as well. To me that says more about python than my code. I did consider using pyrex to wrap (or write) an optimised version, but there didn't seem to be any need for last week (Though for a non-prototype something faster would be nice :).

Indeed. A quick array.array('h',...) implementation is able to run 7-8x real time on 3->1 stream mixing on my 1.3 ghz laptop. Maybe numeric or numarray isn't necessary.

...

I'll save responding the linda things until I have a chance to read in detail what you've written. It sounds very promising though - having multiple approaches to different styles of concurrency that work nicely with each other safely is always a positive thing IMO.

Thanks for the suggestions and best regards,

Thank you for the interesting and informative discussion. - Josiah

6769

Age (days ago)

6780

Last active (days ago)

List overview

Download

51 comments

25 participants

participants (25)

Aahz
Antoine Pitrou
Barry Warsaw
Bill Janssen
Bruce Eckel
Christopher Armstrong
Donovan Baarda
Greg Ewing
Guido van Rossum
Ian Bicking
Jim Fulton
Josiah Carlson
Kalle Anke
Martin Blais
Michael Hudson
Michael Sparks
Michael Sparks
Neil Hodgson
Nick Coghlan
Paolo Invernizzi
Phillip J. Eby
Robert Brewer
Shane Hathaway
skip＠pobox.com
Steve Holden

Re: [Python-Dev] Pythonic concurrency - cooperative MT

Martin Blais

Martin Blais

Bruce Eckel

Paolo Invernizzi

Michael Sparks

Bruce Eckel

Michael Sparks

Josiah Carlson

Michael Sparks

Josiah Carlson

Bruce Eckel

Bruce Eckel

Michael Hudson

Bruce Eckel

Donovan Baarda

Michael Sparks

Kalle Anke

Donovan Baarda

Bruce Eckel

Robert Brewer

Josiah Carlson

Donovan Baarda

Michael Sparks

Bruce Eckel

Michael Sparks

Josiah Carlson

Michael Sparks

Josiah Carlson

tags

participants (25)