I know this may be tiresome by now, so feel free to ignore, but I'd like to share with the list an idea about the GIL, more specifically the reference counting mechanism.
Simply put, make the reference counter a sharded one. That is, separate it into several subcounters, in this case one for each thread.
The logic would then be something like this: - when increasing the refcount, a thread writes only to its own subcounter, creating one first if necessary. - similarly, when decreasing the refcount, there is no need to access other subcounters until that subcounter reaches zero. - when a subcounter gets to zero, delete it, and read the other subcounters to check if it was the last one. - delete the object only if there are no more subcounters.
Contention could then be reduced to a minimum, since a thread only needs to read other subcounters when its own reaches zero or wants the total value. Depending on the implementation it might help with false sharing too, as subcounters may or may not be in the same cache-line.
Unfortunately, in a crude test of mine there is already a severe performance degradation, and that is without rwlocks. I've used a basic linked list, and changed the INCREF/DECREF macros to functions to accommodate the extra logic so it may not be the best approach (too many dereferences).
Does this makes sense to anyone?
PS: At the very least, it might be another reason to keep the GIL.