The future of Python immutability

John Nagle nagle at animats.com
Sun Sep 6 01:06:55 EDT 2009


Steven D'Aprano wrote:
> On Fri, 04 Sep 2009 06:36:59 -0700, Adam Skutt wrote:
> 
>> Nope, preventing mutation of the objects themselves is not enough. You
>> also have to forbid variables from being rebound (pointed at another
>> object).  

    Right.  What's needed for safe concurrency without global locking looks
something like this:

    Object categories:
         Immutable
         Mutable
             Synchronized
             Unsynchronized
                 Owned by a thread.
                 Owned by synchronized object

"Synchronized" objects would be created with something like

	class foo(synchronized) :
	    pass

Only one thread can be active within a synchronized object, as in Java.
So there's implicit locking at entry, unlocking at exit, and temporary
unlocking when the thread is blocked on a lock.
External access to non-function members of synchronized objects has to be
prohibited, since that would allow race conditions.

Everything else can be handled implicitly, without declarations or
annotation.

Here's the big problem:

     class foo(synchronized) :
         def __init__(self) :
             self.items = []
         def putitem(self,item) :
             self.items.append(item) 	# adds item to object's list
         def getitem(self,item) :
             return(self.items.pop())	# removes item

      def test()
          words = ["hello","there"]	# a mutable object
          sobj = foo()			# a synchronized object
          sobj.putitem(words)		# add to object
          words[0] = "goodbye"		# ERROR - no longer can access


The concept here is that objects have an "owner", which is either
a thread or some synchronized object.   Locking is at the "owner"
level.  This is simple until "ownership" needs to be transferred.
Can this be made to work in a Pythonic way, without explicit
syntax?

What we want to happen, somehow, is to
transfer the ownership of "words" from the calling thread to the object
in "putitem", and transfer it to the calling thread in "getitem".
How can this be done?

If ownership by a synchronized object has "priority" over ownership
by a thread, it works.  When "putitem" above does the "append",
the instance of "foo" becomes the owner of "item".  In general,
when a reference to an object is created, and the reference
is from an object owned by a synchronized object, the object
being referenced has to undergo an ownership change.

Once "putitem" has returned, after the ownership change,
it is an error (a "sharing violation?") for the calling thread to
access the object.  That seems weird, but the alternative is
some explicit way of swapping ownership.

What's wrong with this?  It takes two reference counts and a pointer
for each mutable object, which is a memory cost.  Worse, when a
collection of unsynchronized mutable objects is passed in this manner,
all the elements in the collection have to undergo an ownership change.
That's expensive.  (If you're passing trees around, it's better if the
tree root is synchronized.  Then the tree root owns the tree, and
the tree can be passed around or even shared, with locking controlled
by the root.)

A compiler smart enough to notice when a variable goes "dead" can
optimize out some of the checking.

None of this affects single-thread programs at all.  This is purely
needed for safe, efficient concurrency.

It's kind of a pain, but if servers are going to have tens or hundreds
of CPUs in future, we're going to have to get serious about concurrency.

					John Nagle






More information about the Python-list mailing list