[Python-3000] python-safethread project status

Mon Mar 17 20:42:42 CET 2008

On Mon, Mar 17, 2008 at 1:02 PM, Guido van Rossum <guido at python.org> wrote:
>  On Mon, Mar 17, 2008 at 12:56 PM, Adam Olsen <rhamph at gmail.com> wrote:
>  > Guido's asked me to give a quick status report, so here I go..
>  >
>  >  The critical parts of the design and implementation are basically
>  >  done.  I've implemented monitors to contain objects without an
>  >  explicit thread-safe API, which then let me remove the GIL.  I've
>  >  created a synchronous/asynchronous refcounting scheme that reuses our
>  >  existing INCREF/DECREF API, but also minimizes contention when used by
>  >  more than one thread.  The net result is that although there's a
>  >  significant amount of overhead, I can demonstrate scalability to more
>  >  than one thread (I've only been able to test with two cores though.)
>  >
>  >  I've replaced __del__ API (which resurrected objects) with a
>  >  __finalize__/__finalizeattrs__ API (which doesn't).  Attributes listed
>  >  in __finalizeattrs__ are proxied into a core object, a finalizer
>  >  thread is given a reference to the core, and when the main object is
>  >  deleted the GC asynchronously notifies the finalizer thread so that it
>  >  can call core.__finalize__().  The net result is an API very similar
>  >  to __del__ (you need to list attributes it might use), but it's now
>  >  impossible for the GC to run arbitrary code (I even enforce this).
>  >
>  >  Missing there is generator cleanup.  I don't allow the GC to run a
>  >  generator long enough to raise GeneratorExit, so an alternative will
>  >  be needed.
>  >
>  >  I'm currently working on flushing out the design.  I recently rewrote
>  >  and reenabled the tracing GC, next up is the automatic deadlock
>  >  detection/breaking.
>  >
>  >  As for merging back into CPython, I could build smaller patches, but
>  >  the design can't be completely separated.  For example, __finalize__
>  >  is called from another thread, so Monitor's @monitormethod should be
>  >  applied.  I don't specifically require Monitor, just that the object
>  >  (and therefor its methods) be shareable, and Monitor is the easiest
>  >  way to provide that.
>
> Thanks! Would you care to give us a hint on how a typical
>  multi-threaded application would be written using this approach? How
>  much pre-existing code would break do you expect?

Since I forgot it in my original post, here's my project site:

http://code.google.com/p/python-safethread/

A key advantage of using Monitors, rather than actors or some other
event-driven scheme, is that you retain the traditional style of
blocking function calls.  Most stdlib modules should remain basically
the same.

The first conflict will be in importing.  To be accessible to another
thread a module object must be shareable, which means everything it
references must also be shareable.  This is enabled on a per-module
basis using "from __future__ import shared_module".  It shouldn't
affect users of that module though, unless they modify its globals or
class attributes, so I expect we can apply it to most of the stdlib.

Module globals and class dicts use a shareddict, which is still
mutable, but requires its keys and values to be shareable.  Once you
stop modifying it it will switch to an unlocked mode, allowing
uncontended reads.  Writes at this point still work, but the first one
will be expensive, so such behaviour should be discouraged.

The bigger problem is having something like a list in the module
globals or class dict.  list isn't thread-safe and isn't shareable, so
it will be rejected.  If you're just using it as a constant you can
probably replace with a tuple (or frozenset?), but if you are
modifying it you'll need to wrap it with a Monitor or the like.

Another big problem is that I've scrapped the existing
thread/threading modules.  Getting their semantics right would require
I retain the GIL (at least for the main MonitorSpace), they wouldn't
get deadlock detection, I wouldn't support daemon threads, etc.  I can
do it, but only grudgingly. ;)

*****

Now, a simple example of how to spawn threads is given here:
http://code.google.com/p/python-safethread/wiki/Branching

    with branch() as children:
        children.add(func1, 42, name='bob')
        children.add(func2, *args, **kwargs)
        some_io_func()

Basically you use "with branch() as children:" to create a branching
point, then "children.add(func, *args, **kwargs)" to create a child
thread.  It automatically joins the threads when you try to leave the
context.  If one raises an exception, all others (including the base
thread) are cancelled, encouraging them to stop.  Once they have all
stopped the exceptions are bundled together (using
PyException_SetCause() and possibly MultipleError), then propagated up
to their caller.

If you want to collect their results you can use
"children.addresult(func)" instead, then follow with "data =
children.getresults()" after you leave the context.  This must be done
explicitly to avoid unintentionally keeping results alive in a
long-running server.

If you're operating only on immutable data and returning an immutable
result then that's sufficient.  However, to build a shared mutable
object you need to turn to Monitors:
http://code.google.com/p/python-safethread/wiki/Monitors

class Counter(Monitor):
    # implicit @monitormethod for __init__ (and __new__)
    def __init__(self):
        self.count = 0

    @monitormethod
    def tick(self):
        self.count += 1

    @monitormethod
    def value(self):
        return self.count

The basic idea here is that it acquires the Monitor's lock when a
monitormethod is called.  What you don't see in this example is that
it checks if all the arguments are shareable too, as well as the
return value.  You can still write staticmethods or classmethods -
only self.__dict__ is inaccessible if you're not in the Monitor.

Also not seen is a .wait(func) (to leave the Monitor), the
.enter(func) used internally by monitormethod, or other options to
give explicit control over recursive-locking situations.  Much of that
hasn't been implemented and still needs a final design.  Likewise, I
haven't added conditions yet.

-- 
Adam Olsen, aka Rhamphoryncus