another thread on Python threading

Mon Jun 4 16:11:09 EDT 2007

sturlamolden wrote:
> On Jun 4, 3:10 am, Josiah Carlson <josiah.carl... at sbcglobal.net>
> wrote:
>>  From what I understand, the Java runtime uses fine-grained locking on
>> all objects.  You just don't notice it because you don't need to write
>> the acquire()/release() calls.  It is done for you.  (in a similar
>> fashion to Python's GIL acquisition/release when switching threads)
> 
> The problem is CPython's reference counting. Access to reference
> counts must be synchronized.
> Java, IronPython and Jython uses another scheme for the garbage
> collector and do not need a GIL.

There was a discussion regarding this in the python-ideas list recently. 
  You *can* attach a lock to every object, and use fine-grained locking 
to handle refcounts.  Alternatively, you can use platform-specific 
atomic increments and decrements, or even a secondary 'owner thread' 
refcount that doesn't need to be locked by 1 thread at a time.

It turns out that atomic updates are slow, and I wasn't able to get any 
sort of productive results using 'owner threads' (seemed generally 
negative, and was certainly more work to make happen).  I don't believe 
anyone bothered to test fine-grained locking on objects.

However, locking isn't just for refcounts, it's to make sure that thread 
A isn't mangling your object while thread B is traversing it.  With 
object locking (course via the GIL, or fine via object-specific locks), 
you get the same guarantees, with the problem being that fine-grained 
locking is about a bazillion times more difficult to verify the lack of 
deadlocks than a GIL-based approach.

> Changing CPython's garbage collection from reference counting to a
> generational GC will be a major undertaking. There are also pros and
> cons to using reference counts instead of 'modern' garbage collectors.
> For example, unless there are cyclic references, one can always know
> when an object is garbage collected. One also avoids periodic delays
> when garbage are collected, and memory use can be more modest then a
> lot of small temporary objects are being used.

It was done a while ago.  The results?  On a single-processor machine, 
Python code ran like 1/4-1/3 the speed of the original runtime.  When 
using 4+ processors, there were some gains in threaded code, but not 
substantial at that point.

> There are a number of different options for exploiting multiple CPUs
> from CPython, including:

My current favorite is the processing package (available from the Python 
  cheeseshop).  You get much of the same API as threading, only you are 
using processes instead.  It works on Windows, OSX, and *nix.

> def synchronized(fun):
>    from threading import RLock
>    rl = RLock()
>    def decorator(*args,**kwargs):
>       with rl:
>          retv = fun(*args,**kwargs)
>       return retv
>    return decorator
> 
> It is not possible to define a 'synchronized' block though, as Python
> do not have Lisp macros :(

Except that you just used the precise mechanism necessary to get a 
synchronized block in Python:

     lock = threading.Lock()

     with lock:
         #synchronized block!
         pass

  - Josiah