Hi, I know this may be tiresome by now, so feel free to ignore, but I'd like to share with the list an idea about the GIL, more specifically the reference counting mechanism. Simply put, make the reference counter a sharded one. That is, separate it into several subcounters, in this case one for each thread. The logic would then be something like this: - when increasing the refcount, a thread writes only to its own subcounter, creating one first if necessary. - similarly, when decreasing the refcount, there is no need to access other subcounters until that subcounter reaches zero. - when a subcounter gets to zero, delete it, and read the other subcounters to check if it was the last one. - delete the object only if there are no more subcounters. Contention could then be reduced to a minimum, since a thread only needs to read other subcounters when its own reaches zero or wants the total value. Depending on the implementation it might help with false sharing too, as subcounters may or may not be in the same cache-line. Unfortunately, in a crude test of mine there is already a severe performance degradation, and that is without rwlocks. I've used a basic linked list, and changed the INCREF/DECREF macros to functions to accommodate the extra logic so it may not be the best approach (too many dereferences). Does this makes sense to anyone? Thanks, Alfredo PS: At the very least, it might be another reason to keep the GIL.
On Mon, 01 Apr 2013 01:14:11 +0200, =?UTF-8?Q?Alfredo_Solano_Mart=C3=ADnez?=
Simply put, make the reference counter a sharded one. That is, separate it into several subcounters, in this case one for each thread.
It seems to me this has a family resemblance to some of the stuff Trent is doing in his parallel Python work. --David
On Mon, Apr 1, 2013 at 11:49 AM, R. David Murray
On Mon, 01 Apr 2013 01:14:11 +0200, =?UTF-8?Q?Alfredo_Solano_Mart=C3=ADnez?=
wrote: Simply put, make the reference counter a sharded one. That is, separate it into several subcounters, in this case one for each thread.
It seems to me this has a family resemblance to some of the stuff Trent is doing in his parallel Python work.
It's also a thread for python-ideas, not python-dev. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hello,
On Mon, 1 Apr 2013 01:14:11 +0200
Alfredo Solano Martínez
The logic would then be something like this: - when increasing the refcount, a thread writes only to its own subcounter, creating one first if necessary. - similarly, when decreasing the refcount, there is no need to access other subcounters until that subcounter reaches zero. - when a subcounter gets to zero, delete it, and read the other subcounters to check if it was the last one.
But then you will decrement another subcounter, right? Meaning you need a lock around all increments / decrements.
Unfortunately, in a crude test of mine there is already a severe performance degradation, and that is without rwlocks.
Yes, there will be. Given how often INCREF and DECREF are called, any complication of their implementation will significantly decrease single-threaded performance. See also http://mail.python.org/pipermail/python-ideas/2009-November/006599.html Regards Antoine.
participants (4)
-
Alfredo Solano Martínez
-
Antoine Pitrou
-
Nick Coghlan
-
R. David Murray