[Python-Dev] Pythonic concurrency
Shane Hathaway
shane at hathawaymix.org
Thu Sep 29 21:31:19 CEST 2005
Bruce Eckel wrote:
> I'd like to restart this discussion; I didn't mean to put forth active
> objects as "the" solution, only that it seems to be one of the better,
> more OO solutions that I've seen so far.
>
> What I'd really like to figure out is the "pythonic" solution for
> concurrency. Guido and I got as far as agreeing that it wasn't
> threads.
I've pondered this problem. Python deals programmers a double whammy
when it comes to threads: not only is threading unsafe like it is in
other languages, but the GIL also prevents you from using multiple
processors. Thus there's more pressure to improve concurrency in Python
than there is elsewhere.
I like to use fork(), but fork has its own set of surprises. In
particular, in the programmer's view, forking creates a disassociated
copy of every object except files. Also, there's no Pythonic way for
the two processes to communicate once the child has started.
It's tempting to create a library around fork() that solves the
communication problem, but the copied objects are still a major source
of bugs. Imagine what would happen if you forked a Zope process with an
open ZODB. If both the parent and child change ZODB objects, ZODB is
likely to corrupt itself, since the processes share file descriptors.
Thus forking can just as dangerous as threading.
Therefore, I think a better Python concurrency model would be a lot like
the subprocess module, but designed for calling Python code. I can
already think of several ways I would use such a module. Something like
the following would solve problems I've encountered with threads,
forking, and the subprocess module:
import pyprocess
proc = pyprocess.start('mypackage.mymodule', 'myfunc', arg1, arg2=5)
while proc.running():
# do something else
res = proc.result()
This code doesn't specify whether the subprocess should continue to
exist after the function completes (or throws an exception). I can
think of two ways to deal with that:
1) Provide two APIs. The first API stops the subprocess upon function
completion. The second API allows the parent to call other functions in
the subprocess, but never more than one function at a time.
2) Always leave subprocesses running, but use a 'with' statement to
guarantee the subprocess will be closed quickly. I prefer this option.
I think my suggestion fits most of your objectives.
> 1) It works by default, so that novices can use it without falling
> into the deep well of threading. That is, a program that you write
> using threading is broken by default, and the tool you have to fix it
> is "inspection." I want something that allows me to say "this is a
> task. Go." and have it work without the python programmer having to
> study and understand several tomes on the subject.
Done, IMHO.
> 2) Tasks can be automatically distributed among processors, so it
> solves the problems of (a) making python run faster (b) how to utilize
> multiprocessor systems.
Done. The OS automatically maps subprocesses to other processors.
> 3) Tasks are cheap enough that I can make thousands of them, to solve
> modeling problems (in which I also lump games). This is really a
> solution to a cerain type of program complexity -- if I can just
> assign a task to each logical modeling unit, it makes such a system
> much easier to program.
Perhaps the suggested module should have a queue-oriented API. Usage
would look like this:
import pyprocess
queue = pyprocess.ProcessQueue(max_processes=4)
task = queue.put('mypackage.mymodule', 'myfunc', arg1, arg2=5)
Then, you can create as many tasks as you like; parallelism will be
limited to 4 concurrent tasks. A variation of ProcessQueue might manage
the concurrency limit automatically.
> 4) Tasks are "self-guarding," so they prevent other tasks from
> interfering with them. The only way tasks can communicate with each
> other is through some kind of formal mechanism (something queue-ish,
> I'd imagine).
Done. Subprocesses have their own Python namespace. Subprocesses
receive messages through function calls and send messages by returning
from functions.
> 5) Deadlock is prevented by default. I suspect livelock could still
> happen; I don't know if it's possible to eliminate that.
No locking is done at all. (That makes me uneasy, though; have I just
moved locking problems to the application developer?)
> 6) It's natural to make an object that is actor-ish. That is, this
> concurrency approach works intuitively with objects.
Anything pickleable is legal.
> 7) Complexity should be eliminated as much as possible. If it requires
> greater limitations on what you can do in exchange for a clear,
> simple, and safe programming model, that sounds pythonic to me. The
> way I see it, if we can't easily use tasks without getting into
> trouble, people won't use them. But if we have a model that allows
> people to (for example) make easy use of multiple processors, they
> will use that approach and the (possible) extra overhead that you pay
> for the simplicity will be absorbed by the extra CPUs.
I think the solution is very simple.
> 8) It should not exclude the possibility of mobile tasks/active
> objects, ideally with something relatively straightforward such as
> Linda-style tuple spaces.
The proposed module could serve as a guide for a very similar module
that sends tasks to other machines.
Shane
More information about the Python-Dev
mailing list