Re: [Python-Dev] GIL removal question
Message: 1 Date: Tue, 9 Aug 2011 15:31:47 +0600 From: ???? ????????? <socketpair@gmail.com> To: python-dev@python.org Subject: [Python-Dev] GIL removal question Message-ID: <CAEmTpZGe2J6poDUW3sihHS3LHDdQ3cq5gWqfty_=z5W8R0R3-Q@mail.gmail.com> Content-Type: text/plain; charset=UTF-8
Probably I want to re-invent a bicycle. I want developers to say me why we can not remove GIL in that way:
1. Remove GIL completely with all current logick. 2. Add it's own RW-locking to all mutable objects (like list or dict) 3. Add RW-locks to every context instance 4. use RW-locks when accessing members of object instances
You're forgetting step 5. 5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch).
Only one reason, I see, not do that -- is performance of singlethreaded applications.
After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance. Just as an aside, I recently did some experiments with the fabled patch to remove the GIL from Python 1.4 (mainly for my own historical curiosity). On Linux, the performance isn't just slightly worse, it makes single-threaded code run about 6-7 times slower and threaded code runs even worse. So, basically everything runs like a dog. No GIL though. Cheers, Dave
On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave@dabeaz.com> wrote:
You're forgetting step 5.
5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch). ... After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave@dabeaz.com> wrote:
You're forgetting step 5.
5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch). ... After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
That's an experiment that would pretty interesting. I think the real question would boil down to what *else* do they have to lock to make everything work. Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python. Cheers, Dave
On Wed, Aug 10, 2011 at 9:32 PM, David Beazley <dave@dabeaz.com> wrote:
On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
That's an experiment that would pretty interesting. I think the real question would boil down to what *else* do they have to lock to make everything work. Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.
Yeah, the problem reduces back to the 4 steps in the original post. Still not trivial, since there's quite a bit of internal interpreter state to protect, but significantly more feasible than dealing with CPython's reference counting. However, you do get additional complexities like the JIT compiler coming into play, so it is really a question that would need to be raised directly with the PyPy dev team. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Aug 10, 2011 at 7:32 AM, David Beazley <dave@dabeaz.com> wrote:
On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave@dabeaz.com> wrote:
You're forgetting step 5.
5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch). ... After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
That's an experiment that would pretty interesting. I think the real question would boil down to what *else* do they have to lock to make everything work. Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.
They have a specific plan, based on Software Transactional Memory: http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.... Personally, I'm not holding my breath, because STM in other areas has so far captured many imaginations without bringing practical results (I keep hearing about it as this promising theory that needs more work to implement, sort-of like String Theory in theoretical physics). But I'm also not denying that Armin Rigo has a brain the size of the planet, and PyPy *has* already made much real, practical progress. -- --Guido van Rossum (python.org/~guido)
On Wed, Aug 10, 2011 at 1:43 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Aug 10, 2011 at 7:32 AM, David Beazley <dave@dabeaz.com> wrote:
On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave@dabeaz.com> wrote:
You're forgetting step 5.
5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch). ... After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
That's an experiment that would pretty interesting. I think the real question would boil down to what *else* do they have to lock to make everything work. Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.
They have a specific plan, based on Software Transactional Memory: http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill....
Personally, I'm not holding my breath, because STM in other areas has so far captured many imaginations without bringing practical results (I keep hearing about it as this promising theory that needs more work to implement, sort-of like String Theory in theoretical physics).
Note that the PyPy's plan does *not* assume the end result will be comparable in the single-threaded case. The goal is to be able to compile two *different* pypy's, one fast single-threaded, one gil-less, but with a significant overhead. The trick is to get this working in a way that does not increase maintenance burden. It's also research, so among other things it might not work. Cheers, fijal
Removing GIL is interesting work and probably multiple people are willing to contribute. Threading and synchronization is a deep topic and it might be that if just one person toys around with removing GIL he might not see performance improvement (not meaning to offend anyone who tried this, honestly) but what about forking a branch for this work, with some good benchmarks in place and have community contribute? Let's say first step would be just replacing GIL with some fine grained locks with expected performance degradation but afterwards we can try to incrementally improve on this. Thank you, Vlad On Wed, Aug 10, 2011 at 8:20 AM, Maciej Fijalkowski <fijall@gmail.com>wrote:
On Wed, Aug 10, 2011 at 7:32 AM, David Beazley <dave@dabeaz.com> wrote:
On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave@dabeaz.com>
wrote:
You're forgetting step 5.
5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch). ... After implementing the aforementioned step 5, you will find that the
On Wed, Aug 10, 2011 at 1:43 PM, Guido van Rossum <guido@python.org> wrote: performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
That's an experiment that would pretty interesting. I think the real question would boil down to what *else* do they have to lock to make everything work. Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.
They have a specific plan, based on Software Transactional Memory:
http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill....
Personally, I'm not holding my breath, because STM in other areas has so far captured many imaginations without bringing practical results (I keep hearing about it as this promising theory that needs more work to implement, sort-of like String Theory in theoretical physics).
Note that the PyPy's plan does *not* assume the end result will be comparable in the single-threaded case. The goal is to be able to compile two *different* pypy's, one fast single-threaded, one gil-less, but with a significant overhead. The trick is to get this working in a way that does not increase maintenance burden. It's also research, so among other things it might not work.
Cheers, fijal _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/riscutiavlad%40gmail.com
On Wed, Aug 10, 2011 at 11:14, Vlad Riscutia <riscutiavlad@gmail.com> wrote:
Removing GIL is interesting work and probably multiple people are willing to contribute. Threading and synchronization is a deep topic and it might be that if just one person toys around with removing GIL he might not see performance improvement (not meaning to offend anyone who tried this, honestly) but what about forking a branch for this work, with some good benchmarks in place and have community contribute? Let's say first step would be just replacing GIL with some fine grained locks with expected performance degradation but afterwards we can try to incrementally improve on this.
Thank you, Vlad
Feel free to start this: http://hg.python.org/cpython
On Wed, Aug 10, 2011 at 10:19 AM, Brian Curtin <brian.curtin@gmail.com> wrote:
On Wed, Aug 10, 2011 at 11:14, Vlad Riscutia <riscutiavlad@gmail.com> wrote:
Removing GIL is interesting work and probably multiple people are willing to contribute. Threading and synchronization is a deep topic and it might be that if just one person toys around with removing GIL he might not see performance improvement (not meaning to offend anyone who tried this, honestly) but what about forking a branch for this work, with some good benchmarks in place and have community contribute? Let's say first step would be just replacing GIL with some fine grained locks with expected performance degradation but afterwards we can try to incrementally improve on this. Thank you, Vlad
Feel free to start this: http://hg.python.org/cpython
+1 on not waiting for someone else to do it if you have an idea. :) Bitbucket makes it really easy for anyone to fork a repo into a new project and they keep an up to date mirror of the CPython repo: https://bitbucket.org/mirror/cpython/overview -eric
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail....
Den 10.08.2011 13:43, skrev Guido van Rossum:
They have a specific plan, based on Software Transactional Memory: http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill....
Microsoft's experiment to use STM in .NET failed though. And Linux got rid of the BKL without STM. There is a similar but simpler paradim called "bulk synchronous parallel" (BSP) which might work too. Threads work independently for a particular amount of time with private objects (e.g. copy-on-write memory), then enter a barrier, changes to global objects are synchronized and the GC collects garbage, after which worker threads leave the barrier, and the cycle repeats. To communicate changes to shared objects between synchronization barriers, Python code must use explicit locks and flush statements. But for the C code in the interpreter, BSP should give the same atomicity for Python bytecodes as the GIL (there is just one active thread inside the barrier). BSP is much simpler to implement than STM because of the barrier synchronization. BSP also cannot deadlock or livelock. And because threads in BSP work with private memory, there will be no trashing (false sharing) from the reference counting GC. Sturla
On Aug 10, 2011, at 4:15 AM, Nick Coghlan wrote:
After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
Jython may be a better choice. It is all about concurrency. Its dicts are built on top of Java's ConcurrentHashMap for example. Raymond
On Wed, Aug 10, 2011 at 7:19 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Aug 10, 2011, at 4:15 AM, Nick Coghlan wrote:
After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
PyPy would actually make a significantly better basis for this kind of experimentation, since they *don't* use reference counting for their memory management.
Jython may be a better choice. It is all about concurrency. Its dicts are built on top of Java's ConcurrentHashMap for example.
Jython is kind of boring choice because it does not have a GIL at all (same as IronPython). It might *work* for what you're trying to achieve but GIL-removal is not really that interesting.
participants (9)
-
Brian Curtin -
David Beazley -
Eric Snow -
Guido van Rossum -
Maciej Fijalkowski -
Nick Coghlan -
Raymond Hettinger -
Sturla Molden -
Vlad Riscutia