"Brett Cannon" <brett@python.org> wrote:
On 4/12/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away.
In reality this is true, but obviously not technically true. You could delete a class if you really wanted to. But obviously it rarely happens.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking.
To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.)
An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated.
The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining.
I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical.
I wonder what the overhead is going to be. If for every INCREF or DECREF you have to check that an object is immortal or whether it is a thread-owned object is going to incur at least an 'if' check, if not more. I wonder what the performance hit is going to be.
The real question is whether the wasteful parallel if branches will be faster or slower than the locking non-parallel increments.
And for the second idea, adding two more fields to every object might be considered expensive by some in terms of memory.
In the worst case, it would double the size of an object (technically, a minimal Python instance can consist of a refcount and a type pointer). In the case of an integer, it would increase its space use by 2/3.
Also, how would this scenario be handled: object foo is created in thread A (does it have a local-thread refcount of 1, a global of 1, or are both 1?),
I would say global 0, thread 1.
is passed to thread B, and then DECREF'ed in thread B as the object is no longer needed by anyone. If the local-thread refcount is 1 then this would not work as it would fail with the global refcount already at 0. But if
If the object is still being used in thread A, its thread refcount should be at least 1. If thread B decrefs the global refcount, and it becomes 0, then it can check the thread refcount and notice it is nonzero and not deallocate, or if it notices that it *is* zero, then since it already has the GIL (necessary to have decrefed the global refcount), it can pass the object to the deallocator. Now, if thread B is using the object, and the object's thread refcount drops to zero, and thread B passes the object to thread C, then thread C is free to use the thread refcount. It takes a bit of work, but the system could be made to work. However, I am fairly certain that though it would remove the need to have the GIL during some object reference passing, specifically for objects whose whole lifetime is within a single thread, the larger definitions necessary for increfs and decrefs would put more pressure on processor cache, and regardless of locking, cache coherency requirements could ruin performance when two threads were running on different processors (due to cache line alignment). I ran a microbenchmark, but all it seemed to tell me was that dealing with the GIL is slow in multiple threads, but I didn't get conclusive results either way (either positive or negative). - Josiah