> I claim that there are two alternatives in the face of one thread
> mutating an object and the other observing:
Well, I did consider the possibility of one thread being able to change, the others observe, but I have no idea if that is too complicate like you are suggesting.
However, that is not even necessary.  An even more limited form, would work fine (at least for me):
 
Two possible modes:
Read/Write from 1 thread:
* ONLY one thread can change and observe(read) -- no other threads have access of any kind or even know of its existence until you transfer control to another thread (then only the thread you transferred control has acces).
(Optional) read only from all threads:
* Optionally, you could have objects that are in read only mode and all threads can observe it.
 
To make things easier, maybe special GIL-free threads could be added.  (They would still be OS-level threads, but with special properties in Python.) These threads would have the property that they could ONLY access data stored in the special object store to which they have read/write privilege.  They can't access other objects not in the special store.  As a result, these special threads would be free of the GIL and could run in parallel.

> Queues already are in a sense your per-object-lock,
> one-thread-mutating, but usually one thread has acquire semantics and
> one has release semantics, and that combination actually works. It's
> when you expect to have a full memory barrier that is the problem.

Now you brought up something interesting: queues
To be honest something like queues and pipes would good enough for my purposes -- if they used shared memory.  Currently, the implemenation of queues and pipes in the multiprocessing module seems rather costly as they use processes, and require copying data back and forth.
In particular, what would be useful:
 
* A queue that holds self-contained Python objects (with no pointers/references to other data not in the queue so as to prevent threading issues)
* The queue can be accessed by all special threads simultaneously (in parallel).  You would only need locks around queue operations, but that is pretty easy to do -- unless there is some hidden Interpreter problem that would make this easy task hard.
* Streaming buffers -- like a file buffer or something similar, so you can send data from one thread to another as it comes in (when you don't know when it will end or it may never end).  Only two threads have access: one to put data in, the other to extract it.
 
> 0. You can give up consistency and do fine-grained locking, which is
> reasonably fast but error prone, or
> 1. Expect python to handle all of this for you, effectively not making
> a change to the memory model. You could do this with implicit
> per-object locks which might be reasonably fast in the absence of
> contention, but not when several threads are trying to use the object.
>
...
>
> Come to think of it, you might be right Kevin: as long as only one
> thread mutates the object, the mutating thread never /needs/ to
> acquire, as it knows that it has the latest revision.
>
> Have I missed something?
I'm afraid I don't know enough about Python's Interpreter to say much.  The only way would be for me to do some studying on interpreters/compilers and get digging into the codebase -- and I'm not sure how much time I have to do that right now. :)
Perhaps the part about one thread only having read & write changes the situation?
 
One possible implemenation might be similar to how POSH does it:
Now, I'm not suggesting this, because I know enough to say it is possible, but just to put something out there that might work.
Create a special virtual memory address or lookup table for each thread.  When you assign a read+write object to a thread, it gets added to the virtual address/memory table.
Optinally, it could be up to the programmer to make sure they don't try to access data from a thread that does not have ownership/control of that object.  If a programmer does try to access it, it would fail as the memory address would point to nowhere/bad data/etc....
 
Of course, there are probably other, better ways to do it that are not as fickle as this... but I don't know if the limitations of the Python Interpreter and GIL would allow better methods.