[ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at https://sourceforge.net/projects/newthreading/ The user's guide is at http://www.animats.com/papers/languages/newthreadingintro.html This is a pure Python implementation of synchronized objects, along with a set of restrictions which make programs race-condition free, even without a Global Interpreter Lock. The basic idea is that classes derived from SynchronizedObject are automatically locked at entry and unlocked at exit. They're also unlocked when a thread blocks within the class. So at no time can two threads be active in such a class at one time. In addition, only "frozen" objects can be passed in and out of synchronized objects. (This is somewhat like the multiprocessing module, where you can only pass objects that can be "pickled". But it's not as restrictive; multiple threads can access the same synchronized object, one at a time. This pure Python implementation is usable, but does not improve performance. It's a proof of concept implementation so that programmers can try out synchronized classes and see what it's like to work within those restrictions. The semantics of Python don't change for single-thread programs. But when the program forks off the first new thread, the rules change, and some of the dynamic features of Python are disabled. Some of the ideas are borrowed from Java, and some are from "safethreading". The point is to come up with a set of liveable restrictions which would allow getting rid of the GIL. This is becoming essential as Unladen Swallow starts to work and the number of processors per machine keeps climbing. This may in time become a Python Enhancement Proposal. We'd like to get some experience with it first. Try it out and report back. The SourceForge forum for the project is the best place to report problems. John Nagle

On 26/06/2010 07:11, John Nagle wrote:
We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at
https://sourceforge.net/projects/newthreading/
The user's guide is at
http://www.animats.com/papers/languages/newthreadingintro.html
The user guide says: The suggested import is from newthreading import * The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. This is slightly off-topic for python-dev, although I appreciate that you want feedback with the eventual goal of producing a PEP - however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code. Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-) All the best, Michael Foord
This is a pure Python implementation of synchronized objects, along with a set of restrictions which make programs race-condition free, even without a Global Interpreter Lock. The basic idea is that classes derived from SynchronizedObject are automatically locked at entry and unlocked at exit. They're also unlocked when a thread blocks within the class. So at no time can two threads be active in such a class at one time.
In addition, only "frozen" objects can be passed in and out of synchronized objects. (This is somewhat like the multiprocessing module, where you can only pass objects that can be "pickled". But it's not as restrictive; multiple threads can access the same synchronized object, one at a time.
This pure Python implementation is usable, but does not improve performance. It's a proof of concept implementation so that programmers can try out synchronized classes and see what it's like to work within those restrictions.
The semantics of Python don't change for single-thread programs. But when the program forks off the first new thread, the rules change, and some of the dynamic features of Python are disabled.
Some of the ideas are borrowed from Java, and some are from "safethreading". The point is to come up with a set of liveable restrictions which would allow getting rid of the GIL. This is becoming essential as Unladen Swallow starts to work and the number of processors per machine keeps climbing.
This may in time become a Python Enhancement Proposal. We'd like to get some experience with it first. Try it out and report back. The SourceForge forum for the project is the best place to report problems.
John Nagle _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 26/06/2010 07:11, John Nagle wrote:
We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at
https://sourceforge.net/projects/newthreading/
The user's guide is at
http://www.animats.com/papers/languages/newthreadingintro.html
The user guide says:
The suggested import is
from newthreading import *
The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. This is slightly off-topic for python-dev, although I appreciate that you want feedback with the eventual goal of producing a PEP - however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code.
I asked John to drop a message here for this project - so feel free to flame me if anyone. This *is* relevant, and I'd guess fairly interesting to the group as a whole. jesse

On Sat, 26 Jun 2010 14:29:24 +0100 Michael Foord <fuzzyman@voidspace.org.uk> wrote:
the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code.
Exactly what I think too. cheers Antoine.

On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 26/06/2010 07:11, John Nagle wrote:
We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at
https://sourceforge.net/projects/newthreading/
The user's guide is at
http://www.animats.com/papers/languages/newthreadingintro.html
The user guide says:
The suggested import is
from newthreading import *
The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. This is slightly off-topic for python-dev, although I appreciate that you want feedback with the eventual goal of producing a PEP - however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code.
Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-)
All the best,
Michael Foord
I'd also like to point out, that one of the project John cites is Adam Olsen's Safethread work: http://code.google.com/p/python-safethread/ Which, in and of itself is a good read.

On 6/26/2010 7:44 AM, Jesse Noller wrote:
On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 26/06/2010 07:11, John Nagle wrote:
We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading".
....
The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason.
I agree. I just did that to make the examples cleaner.
however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code.
That's what I'm trying to address here.
Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-)
This work comes out of a discussion a few of us had at a restaurant in Palo Alto after a Stanford talk by the group at Facebook which is building a JIT compiler for PHP. We were discussing how to make threading both safe for the average programmer and efficient. Javascript and PHP don't have threads at all; Python has safe threading, but it's slow. C/C++/Java all have race condition problems, of course. The Facebook guy pointed out that you can't redefine a function dynamically in PHP, and they get a performance win in their JIT by exploiting this. I haven't gone into the memory model in enough detail in the technical paper. The memory model I envision for this has three memory zones: 1. Shared fully-immutable objects: primarily strings, numbers, and tuples, all of whose elements are fully immutable. These can be shared without locking, and reclaimed by a concurrent garbage collector like Boehm's. They have no destructors, so finalization is not an issue. 2. Local objects. These are managed as at present, and require no locking. These can either be thread-local, or local to a synchronized object. There are no links between local objects under different "ownership". Whether each thread and object has its own private heap, or whether there's a common heap with locks at the allocator is an implementation decision. 3. Shared mutable objects: mostly synchronized objects, but also immutable objects like tuples which contain references to objects that aren't fully immutable. These are the high-overhead objects, and require locking during reference count updates, or atomic reference count operations if supported by the hardware. The general idea is to minimize the number of objects in this zone. The zone of an object is determined when the object is created, and never changes. This is relatively simple to implement. Tuples (and frozensets, frozendicts, etc.) are normally zone 2 objects. Only "freeze" creates collections in zones 1 and 3. Synchronized objects are always created in zone 3. There are no difficult handoffs, where an object that was previously thread-local now has to be shared and has to acquire locks during the transition. Existing interlinked data structures, like parse trees and GUIs, are by default zone 2 objects, with the same semantics as at present. They can be placed inside a SynchronizedObject if desired, which makes them usable from multiple threads. That's optional; they're thread-local otherwise. The rationale behind "freezing" some of the language semantics when the program goes multi-thread comes from two sources - Adam Olsen's Safethread work, and the acceptance of the multiprocessing module. Olsen tried to retain all the dynamism of the language in a multithreaded environment, but locking all the underlying dictionaries was a boat-anchor on the whole system, and slowed things down so much that he abandoned the project. The Unladen Swallow documentation indicates that early thinking on the project was that Olsen's approach would allow getting rid of the GIL, but later notes indicate that no path to a GIL-free JIT system is currently in development. The multiprocessing module provides semantics similar to threading with "freezing". Data passed between processes is "frozen" by pickling. Processes can't modify each other's code. Restrictive though the multiprocessing module is, it appears to be useful. It is sometimes recommended as the Pythonic approach to multi-core CPUs. This is an indication that "freezing" is not unacceptable to the user community. Most of the real-world use cases for extreme dynamism involve events that happen during startup. Configuration files are read, modules are selectively included, functions are overridden, tables of references to functions are set up, regular expressions are compiled, and the code is brought into the appropriately configured state. Then the worker threads are started and the real work starts. The "newthreading" approach allows all that. After two decades of failed attempts remove the Global Interpreter Lock without making performance worse, it is perhaps time to take a harder look at scaleable threading semantics. John Nagle Animats

fyi - newthreading has been picked up by lwn. http://lwn.net/Articles/393822/#Comments

On Sun, Jun 27, 2010 at 9:33 PM, Gregory P. Smith <greg@krypto.org> wrote:
fyi - newthreading has been picked up by lwn. http://lwn.net/Articles/393822/#Comments
Do you know if any of the commenters is Nagle himself (and if so, which)? The discussion is hard to follow since the context of replies isn't always clear. There also seems to be a bunch of C++ thinking (and some knee-jerk responses by people who aren't actually all that familiar with Python) although I admit I don't have much of an intuition about memory models for fully free threading myself. It's a brave new world... --Guido -- --Guido van Rossum (python.org/~guido)

I'm moving this thread to python-ideas, where it belongs. I've looked at the implementation code (even stepped through it with pdb!), read the sample/test code, and read the two papers on animats.com fairly closely (they have a lot of overlap, and the memory model described below seems copied verbatim from http://www.animats.com/papers/languages/pythonconcurrency.html version 0.8). Some reactions (trying to hide my responses to the details of the code): - First of all, I'm very happy to see radical ideas proposed, even if they are at present unrealistic. We need a big brainstorm to come up with ideas from which an eventual solution to the multicore problem might be chosen. (Jesse Noller's multiprocessing is another; Adam Olsen's work yet another, at a different end of the spectrum.) - The proposed new semantics (frozen objects, memory model, auto-freezing of globals, enforcement of naming conventions) are radically different from Python's current semantics. They will break every 3rd party library in many more ways than Python 3. This is not surprising given the goals of the proposal (and its roots in Adam Olsen's work) but places a huge roadblock for acceptance. I see no choice but to keep trying to come up with a compromise that is more palatable and compatible without throwing away all the advantages. As it now stands, the proposal might as well be a new and different language. - SynchronizedObject looks like a mixture of a Java synchronized class (a non-standard concept in Java but easily understood as a class all whose public methods are synchronized) and a condition variable (which has the same semantics of releasing the lock while waiting but without crawling the stack for other locks to release). It looks like the examples showing off SynchronizedObject could be implemented just as elegantly using a condition variable (and voluntary abstention from using shared mutable objects). - If the goal is to experiment with new control structures, I recommend decoupling them from the memory model and frozen objects, instead relying (as is traditional in Python) on programmer caution to avoid races. This would make it much easier to see how programmers respond to the new control structures. - You could add the freeze() function for voluntary use, and you could even add automatic wrapping of arguments and return values for certain classes using a class decorator or a metaclass, but the performance overhead makes this unlikely to win over many converts. I don't see much use for the "whole program freezing" done by the current prototype -- there are way too many backdoors in Python for the prototype approach to be anywhere near foolproof, and if we want a non-foolproof approach, voluntary constraint (and, in some cases, voluntary, i.e. explicit, wrapping of modules or classes) would work just as well. - For a larger-scale experiment with the new memory model and semantic restrictions (or would it be better to call them syntactic restrictions? -- after all they are about statically detectable properties like naming conventions) I recommend looking at PyPy, which has as one of its explicitly stated project goals easy experimentation with different object models. - I'm sure I've forgotten something, but I wanted to keep my impressions fresh. - Again, John, thanks for taking the time to come up with an implementation of your idea! --Guido On Sat, Jun 26, 2010 at 9:39 AM, John Nagle <nagle@animats.com> wrote:
On 6/26/2010 7:44 AM, Jesse Noller wrote:
On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 26/06/2010 07:11, John Nagle wrote:
We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading".
....
The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason.
I agree. I just did that to make the examples cleaner.
however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code.
That's what I'm trying to address here.
Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-)
This work comes out of a discussion a few of us had at a restaurant in Palo Alto after a Stanford talk by the group at Facebook which is building a JIT compiler for PHP. We were discussing how to make threading both safe for the average programmer and efficient. Javascript and PHP don't have threads at all; Python has safe threading, but it's slow. C/C++/Java all have race condition problems, of course. The Facebook guy pointed out that you can't redefine a function dynamically in PHP, and they get a performance win in their JIT by exploiting this.
I haven't gone into the memory model in enough detail in the technical paper. The memory model I envision for this has three memory zones:
1. Shared fully-immutable objects: primarily strings, numbers, and tuples, all of whose elements are fully immutable. These can be shared without locking, and reclaimed by a concurrent garbage collector like Boehm's. They have no destructors, so finalization is not an issue.
2. Local objects. These are managed as at present, and require no locking. These can either be thread-local, or local to a synchronized object. There are no links between local objects under different "ownership". Whether each thread and object has its own private heap, or whether there's a common heap with locks at the allocator is an implementation decision.
3. Shared mutable objects: mostly synchronized objects, but also immutable objects like tuples which contain references to objects that aren't fully immutable. These are the high-overhead objects, and require locking during reference count updates, or atomic reference count operations if supported by the hardware. The general idea is to minimize the number of objects in this zone.
The zone of an object is determined when the object is created, and never changes. This is relatively simple to implement. Tuples (and frozensets, frozendicts, etc.) are normally zone 2 objects. Only "freeze" creates collections in zones 1 and 3. Synchronized objects are always created in zone 3. There are no difficult handoffs, where an object that was previously thread-local now has to be shared and has to acquire locks during the transition.
Existing interlinked data structures, like parse trees and GUIs, are by default zone 2 objects, with the same semantics as at present. They can be placed inside a SynchronizedObject if desired, which makes them usable from multiple threads. That's optional; they're thread-local otherwise.
The rationale behind "freezing" some of the language semantics when the program goes multi-thread comes from two sources - Adam Olsen's Safethread work, and the acceptance of the multiprocessing module. Olsen tried to retain all the dynamism of the language in a multithreaded environment, but locking all the underlying dictionaries was a boat-anchor on the whole system, and slowed things down so much that he abandoned the project. The Unladen Swallow documentation indicates that early thinking on the project was that Olsen's approach would allow getting rid of the GIL, but later notes indicate that no path to a GIL-free JIT system is currently in development.
The multiprocessing module provides semantics similar to threading with "freezing". Data passed between processes is "frozen" by pickling. Processes can't modify each other's code. Restrictive though the multiprocessing module is, it appears to be useful. It is sometimes recommended as the Pythonic approach to multi-core CPUs. This is an indication that "freezing" is not unacceptable to the user community.
Most of the real-world use cases for extreme dynamism involve events that happen during startup. Configuration files are read, modules are selectively included, functions are overridden, tables of references to functions are set up, regular expressions are compiled, and the code is brought into the appropriately configured state. Then the worker threads are started and the real work starts. The "newthreading" approach allows all that.
After two decades of failed attempts remove the Global Interpreter Lock without making performance worse, it is perhaps time to take a harder look at scaleable threading semantics.
John Nagle Animats
-- --Guido van Rossum (python.org/~guido)
participants (6)
-
Antoine Pitrou
-
Gregory P. Smith
-
Guido van Rossum
-
Jesse Noller
-
John Nagle
-
Michael Foord