Ideas towards GIL removal
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier. 1) Permanent objects In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away. So perhaps there could be a way of marking such objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read. 2) Objects owned by a thread Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking. To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.) An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated. The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining. I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical. -- Greg
On 4/12/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away.
In reality this is true, but obviously not technically true. You could delete a class if you really wanted to. But obviously it rarely happens. So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking.
To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.)
An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated.
The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining.
I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical.
I wonder what the overhead is going to be. If for every INCREF or DECREF you have to check that an object is immortal or whether it is a thread-owned object is going to incur at least an 'if' check, if not more. I wonder what the performance hit is going to be. And for the second idea, adding two more fields to every object might be considered expensive by some in terms of memory. Also, how would this scenario be handled: object foo is created in thread A (does it have a local-thread refcount of 1, a global of 1, or are both 1?), is passed to thread B, and then DECREF'ed in thread B as the object is no longer needed by anyone. If the local-thread refcount is 1 then this would not work as it would fail with the global refcount already at 0. But if objects start with a global refcount of 1 but a local refcount of 0 and it is DECREF'ed locally then wouldn't that fail for the same reason? I guess I am wondering how refcounts would be handled when objects cross between threads. -Brett
"Brett Cannon" <brett@python.org> wrote:
On 4/12/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away.
In reality this is true, but obviously not technically true. You could delete a class if you really wanted to. But obviously it rarely happens.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking.
To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.)
An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated.
The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining.
I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical.
I wonder what the overhead is going to be. If for every INCREF or DECREF you have to check that an object is immortal or whether it is a thread-owned object is going to incur at least an 'if' check, if not more. I wonder what the performance hit is going to be.
The real question is whether the wasteful parallel if branches will be faster or slower than the locking non-parallel increments.
And for the second idea, adding two more fields to every object might be considered expensive by some in terms of memory.
In the worst case, it would double the size of an object (technically, a minimal Python instance can consist of a refcount and a type pointer). In the case of an integer, it would increase its space use by 2/3.
Also, how would this scenario be handled: object foo is created in thread A (does it have a local-thread refcount of 1, a global of 1, or are both 1?),
I would say global 0, thread 1.
is passed to thread B, and then DECREF'ed in thread B as the object is no longer needed by anyone. If the local-thread refcount is 1 then this would not work as it would fail with the global refcount already at 0. But if
If the object is still being used in thread A, its thread refcount should be at least 1. If thread B decrefs the global refcount, and it becomes 0, then it can check the thread refcount and notice it is nonzero and not deallocate, or if it notices that it *is* zero, then since it already has the GIL (necessary to have decrefed the global refcount), it can pass the object to the deallocator. Now, if thread B is using the object, and the object's thread refcount drops to zero, and thread B passes the object to thread C, then thread C is free to use the thread refcount. It takes a bit of work, but the system could be made to work. However, I am fairly certain that though it would remove the need to have the GIL during some object reference passing, specifically for objects whose whole lifetime is within a single thread, the larger definitions necessary for increfs and decrefs would put more pressure on processor cache, and regardless of locking, cache coherency requirements could ruin performance when two threads were running on different processors (due to cache line alignment). I ran a microbenchmark, but all it seemed to tell me was that dealing with the GIL is slow in multiple threads, but I didn't get conclusive results either way (either positive or negative). - Josiah
Josiah Carlson wrote:
If thread B decrefs the global refcount, and it becomes 0, then it can check the thread refcount and notice it is nonzero and not deallocate, or if it notices that it *is* zero, then since it already has the GIL (necessary to have decrefed the global refcount), it can pass the object to the deallocator.
The problem with that is the owning thread needs to be able to manipulate the local refcount without holding any kind of lock. That's the whole point of it. -- Greg
Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Josiah Carlson wrote:
If thread B decrefs the global refcount, and it becomes 0, then it can check the thread refcount and notice it is nonzero and not deallocate, or if it notices that it *is* zero, then since it already has the GIL (necessary to have decrefed the global refcount), it can pass the object to the deallocator.
The problem with that is the owning thread needs to be able to manipulate the local refcount without holding any kind of lock. That's the whole point of it.
Certainly, but thread B isn't the owning thread, thread A was the owning thread, and by virtue of decrefing its thread count to zero, acquiring the GIL, and checking the global refcount to make sure that either someone else is responsible for its deallocation (global refcount > 0), or that thread A is responsible for its deallocation (global refcount == 0). - Josiah
Josiah Carlson wrote:
Certainly, but thread B isn't the owning thread, thread A was the owning thread, and by virtue of decrefing its thread count to zero, acquiring the GIL, and checking the global refcount to make sure that either someone else is responsible for its deallocation (global refcount > 0), or that thread A is responsible for its deallocation (global refcount == 0).
Thread B holding the GIL doesn't help, because the local refcount is not covered by the GIL. Thread A must be able to assume it has total ownership of the local refcount, otherwise there's no benefit in the scheme. -- Greg
Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Josiah Carlson wrote:
Certainly, but thread B isn't the owning thread, thread A was the owning thread, and by virtue of decrefing its thread count to zero, acquiring the GIL, and checking the global refcount to make sure that either someone else is responsible for its deallocation (global refcount > 0), or that thread A is responsible for its deallocation (global refcount == 0).
Thread B holding the GIL doesn't help, because the local refcount is not covered by the GIL. Thread A must be able to assume it has total ownership of the local refcount, otherwise there's no benefit in the scheme.
I seem to not be explaining myself well enough. What you describe is precisely what I described earlier. I don't believe we have a disagreement about the execution semantics of threads on an object with a local thread count. I was only mentioning A acquiring the GIL if/when it becomes finished with the object, to determine if the object could be sent to the standard Python deallocation rutines, and/or if thread A should send it (as opposed to thread B in the case if thread B was passed the object and was using it beyond the time that A did). - Josiah
Josiah Carlson wrote:
I was only mentioning A acquiring the GIL if/when it becomes finished with the object, to determine if the object could be sent to the standard Python deallocation rutines
Oh, yes, that part is fine. The problem is what happens if thread A stuffs a reference into another object that lives beyond A's interest in matters. Then another thread can see an object that still has local ref counts, even though the owning thread no longer cares about it and is never going to get rid of those local refcounts itself. I haven't thought of a non-expensive way of fixing that yet. -- Greg
Brett Cannon wrote:
In reality this is true, but obviously not technically true. You could delete a class if you really wanted to. But obviously it rarely happens.
And if it does, the worst that will happen is that the original version will hang around, tying up a small amount of memory.
I wonder what the overhead is going to be. If for every INCREF or DECREF you have to check that an object is immortal or whether it is a thread-owned object is going to incur at least an 'if' check, if not more.
Clearly, there will be a small increase in overhead. But it may be worth it if it avoids the need for a rather expensive lock/unlock. It was pointed out earlier that, even using the special instructions provided by some processors, this can take a great many times longer than a normal memory access or two.
And for the second idea, adding two more fields to every object might be considered expensive by some in terms of memory.
Again, it's a tradeoff. If it enables removal of the GIL and massive threading on upcoming multi- core CPUs, it might be considered worth the cost.
Also, how would this scenario be handled: object foo is created in thread A ... is passed to thread B, and then DECREF'ed in thread B as the object is no longer needed by anyone.
I'll have to think about that. If a thread gives away a reference to another thread, it really needs to be a global reference rather than a local one. The tricky part is telling when this happens.
But if objects start with a global refcount of 1 but a local refcount of 0 and it is DECREF'ed locally then wouldn't that fail for the same reason?
That one's easier -- if the local refcount is 0 on a decref, you need to lock the object and decrement the global refcount. -- Greg
Greg Ewing wrote:
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier.
(omitted) I'm thinking along similar lines, but my approach is to eliminate refcounting entirely. (Note that the incref and decref macros could still exist for backwards compatibility, but would do nothing.) Garbage collection for concurrent systems is an active area of research, however it appears that many of the research systems out there have settled on a few basic design parameters. Most of them use a copying collector for the "young" generation (with a separate heap per thread, for exactly the reasons you suggest), and a shared mark-and-sweep heap store for the older generation that uses a traditional free list. For my own amusement and curiosity, I've been playing around with implementing such a collector, using a heap allocator design that's loosely based on the one from dlmalloc, which is an open source malloc implementation with very good overall performance. The idea is to build a stand-alone garbage collection library, similar to the popular Boehm collector, but parallel by design and intended for "cooperative" language interpreters rather than uncooperative languages such as C and C++. Of course, this is purely a hobby-level effort, and I admit that I really have no clue as to what I am doing here - the real point of the exercise is to educate myself about the problem space, not to actually produce a working library, although that might be a possible side effect (equally likely is that I'll never finish it.) I doubt that an untutored amateur such as myself can actually create a robust, efficient parallel implementation, given how hard such programming actually is, and how inexperienced I am. But it's interesting and enjoyable to work on, and that's the only reason I need. (And also produces some interesting side effects - I wasn't happy with the various graphical front-ends to gdb, so I took a couple of days off on wrote one using wxPython :) Now, all that being said, even if such a GC library were to exist, that is a long way from removal of the GIL, although it is a necessary step. For example, take the case of a dictionary in which more than one thread is inserting values. Clearly, that will require a lock or some other mechanism to prevent corruption of the hash table as it is updated. I think we want to avoid the Java situation where every object has its own lock. Instead, we'd have to require the user to provide a lock around that insertion operation. But what about dictionaries that the user isn't aware of, such as class methods and module contents? In a world without a GIL, what kind of steps need to be taken to insure that shared data structures can be updated without creating chaos? -- Talin
On Fri, 13 Apr 2007 08:51:11 +0200, Talin <talin@acm.org> wrote:
Now, all that being said, even if such a GC library were to exist, that is a long way from removal of the GIL, although it is a necessary step. For example, take the case of a dictionary in which more than one thread is inserting values. Clearly, that will require a lock or some other mechanism to prevent corruption of the hash table as it is updated. I think we want to avoid the Java situation where every object has its own lock. Instead, we'd have to require the user to provide a lock around that insertion operation. But what about dictionaries that the user isn't aware of, such as class methods and module contents? In a world without a GIL, what kind of steps need to be taken to insure that shared data structures can be updated without creating chaos?
In the case of hashtables, a nonblocking variant could perhaps be an option. There was a nice article on reddit some time ago: http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html , the guy claims that it's competitive in speed to non-lock protected (so thread unsafe) implementations. Nonblocking algorithms don't exist for all data structures, but perhaps they exist for most ones that are used in the python interpreter? - Jan
Talin wrote:
I'm thinking along similar lines, but my approach is to eliminate refcounting entirely.
That's a possibility, although refcounting does have some nice properties -- it's cache-friendly, and it's usually fairly easy to get it to work with other libraries that have their own scheme for managing memory and don't know about Python's one.
For example, take the case of a dictionary in which more than one thread is inserting values. .. I think we want to avoid the Java situation where every object has its own lock.
Having to lock dictionaries mightn't be so bad, as long as it can be done using special instructions. It's still a much larger-grained locking unit than an incref or decref. But I'm wondering whether the problem might get solved for us from the hardware end if we wait long enough. If we start seeing massively-multicore CPUs, I expect there will be a lot of pressure to come up with more efficient ways of doing fine- grained locking in order to make effective use of them. Maybe a special lump of high-speed multi-port memory used just for locks, with surrounding hardware designed for using it as such. -- Greg
Talin wrote:
I'm thinking along similar lines, but my approach is to eliminate refcounting entirely. (Note that the incref and decref macros could still exist for backwards compatibility, but would do nothing.)
There was a PyPy talk at this year's EuroPython about "writing code for different GCs". Certainly worth reading in this context. Bottom-line being that it's always worth thinking about the effect of ref-counting on code and if the code would still work if there was no ref-counting (as in Jython, PyPy and IronPython). Sometimes not as obvious as it might seem... Stefan
Greg Ewing wrote:
2) Objects owned by a thread
Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking.
While we are on the topic of reference counting, I'd like to direct your attention to Recycler, an IBM research project: "The Recycler is a concurrent multiprocessor garbage collector with extremely low pause times (maximum of 6 milliseconds over eight benchmarks) while remaining competitive with the best throughput- oriented collectors in end-to-end execution times. This paper describes the overall architecture of the Recycler, including its use of reference counting and concurrent cycle collection, and presents extensive measurements of the system comparing it to a parallel, stop-the-world mark-and-sweep collector." There are a bunch of research papers describing Recycler which can be found at the following link: http://www.research.ibm.com/people/d/dfb/publications.html I'd start with the papers entitled "Java without the Coffee Breaks: A Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle Collection in Reference Counted Systems". Let me describe a bit about how the Recycler works and how it relates to what you've proposed. The basic idea is that for each thread, there is a set of thread local data (TLD) that contains a pair of "refcount buffers", one buffer for increfs and one buffer for decrefs. Each refcount buffer is a flat array of pointers which starts empty and gradually fills up. The "incref" operation does not actually touch the reference count field of the object. Instead, an "incref" appends a pointer to the object to the end of the incref buffer for that thread. Similarly, a decref operation appends a pointer to the object to the decref buffer. Since the refcount buffers are thread-local, there is no need for locking or synchronization. When one of the buffers gets full, both buffers are swapped out for new ones, and the old buffers are placed on a queue which is processed by the collector thread. The collector thread is the only thread which is allowed to actually touch the reference counts of the individual objects, and its the only thread which is allowed to delete objects. Processing the buffers is relatively simple: First, the incref buffer is processed. The collector thread scans through each pointer in the buffer, and increments the refcount of each object. Then the decref buffer is processed in a similar way, decrementing the refcount. However, it also needs to process the buffers for the other threads before the memory can be reclaimed. Recycler defines a series of "epochs" (i.e. intervals between collections). Within a refcount buffer, each epoch is represented as a contiguous range of values within the array -- all of the increfs and decrefs which occurred during that epoch. An auxiliary array records the high water mark for each epoch. Using this information, the collector thread is able to process only those increfs and decrefs for all threads which occurred before the current epoch. This does mean that objects whose refcount falls to zero during the current epoch will not be deleted until the next collection cycle. Recycler also handles cyclic garbage via cycle detection, which is described in the paper. It does not use a "mark and sweep" type of algorithm, but is instead able to detect cycles locally without scanning the entire heap. Thus, the Recycler's use of refcount buffers achieves what you were trying to achieve, which is refcounting without locking. However, it does require access to thread-local data for each incref / release operation. The performance of this scheme will greatly depend on how quickly the code can get access to thread local data. The fastest possible method would be to dedicate a register, but that's infeasible on most systems. Another idea is for large functions to look up the TLD and stuff it in a local variable at the beginning of the function. For older source code the existing, backwards-compatible incref and decref macros could each individually get access to the TLD, but these would be slower than the more optimized methods in which the TLD was supplied as a parameter. -- Talin
On 4/15/07, Talin <talin@acm.org> wrote:
Greg Ewing wrote:
2) Objects owned by a thread
Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking.
While we are on the topic of reference counting, I'd like to direct your attention to Recycler, an IBM research project:
"The Recycler is a concurrent multiprocessor garbage collector with extremely low pause times (maximum of 6 milliseconds over eight benchmarks) while remaining competitive with the best throughput- oriented collectors in end-to-end execution times. This paper describes the overall architecture of the Recycler, including its use of reference counting and concurrent cycle collection, and presents extensive measurements of the system comparing it to a parallel, stop-the-world mark-and-sweep collector."
There are a bunch of research papers describing Recycler which can be found at the following link:
http://www.research.ibm.com/people/d/dfb/publications.html
I'd start with the papers entitled "Java without the Coffee Breaks: A Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle Collection in Reference Counted Systems".
Let me describe a bit about how the Recycler works and how it relates to what you've proposed.
The basic idea is that for each thread, there is a set of thread local data (TLD) that contains a pair of "refcount buffers", one buffer for increfs and one buffer for decrefs. Each refcount buffer is a flat array of pointers which starts empty and gradually fills up.
The "incref" operation does not actually touch the reference count field of the object. Instead, an "incref" appends a pointer to the object to the end of the incref buffer for that thread. Similarly, a decref operation appends a pointer to the object to the decref buffer. Since the refcount buffers are thread-local, there is no need for locking or synchronization.
When one of the buffers gets full, both buffers are swapped out for new ones, and the old buffers are placed on a queue which is processed by the collector thread. The collector thread is the only thread which is allowed to actually touch the reference counts of the individual objects, and its the only thread which is allowed to delete objects.
Processing the buffers is relatively simple: First, the incref buffer is processed. The collector thread scans through each pointer in the buffer, and increments the refcount of each object. Then the decref buffer is processed in a similar way, decrementing the refcount.
However, it also needs to process the buffers for the other threads before the memory can be reclaimed. Recycler defines a series of "epochs" (i.e. intervals between collections). Within a refcount buffer, each epoch is represented as a contiguous range of values within the array -- all of the increfs and decrefs which occurred during that epoch. An auxiliary array records the high water mark for each epoch.
Huh, interesting idea. I downloaded the papers and plan to give them a read. The one isssue I can see with this is that because of these epochs and using a buffer instead of actually manipulating the refcount means a delay. And I know some people love the (mostly) instantaneous garbage collection when the refcount should be at 0. Anyway, I will give the paper a read before I make any more ignorant statements about the design. =) -Brett
Brett Cannon wrote:
And I know some people love the (mostly) instantaneous garbage collection when the refcount should be at 0.
Yeah, and even a 6 millisecond pause could be too long in some applications, such as high frame rate animation. 6 milliseconds is a *long* time for today's GHz processors. -- Greg
On Mon, Apr 16, 2007, Greg Ewing wrote:
Brett Cannon wrote:
And I know some people love the (mostly) instantaneous garbage collection when the refcount should be at 0.
Yeah, and even a 6 millisecond pause could be too long in some applications, such as high frame rate animation. 6 milliseconds is a *long* time for today's GHz processors.
Hrm. Assuming that really is the maximum, I don't think it's an issue for Python applications: 6ms slices still allow for >150 frames/sec. If overall object creation/destruction throughput is increased or this allows greater throughput on multi-core machines without dramatically decreasing performance for single-CPU machines, it may well be worthwhile. But my primary concern remains the issue of Python being a glue language with external libraries: the GIL is one of our best weapons for making it easy. This approach seems like it would be easier to integrate than other options I've seen. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet
Aahz wrote:
On Mon, Apr 16, 2007, Greg Ewing wrote:
even a 6 millisecond pause could be too long in some applications, such as high frame rate animation.
6ms slices still allow for >150 frames/sec.
Only if you don't want to do anything else during your frame. At 50fps, 6ms is nearly one-third of your frame time. However, I tend to agree that this approach has the greatest chance of succeeding of any suggestion I've seen so far. -- Greg
On 4/12/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier.
1) Permanent objects
I have some vague memory (but couldn't find the references) that someone tried and it was too expensive. INCREF and DECREF on something in the header of an object you need anyhow were just too small to beat once you added any logic. (That said, the the experiment was pretty old, and the results may have changed.)
2) Objects owned by a thread
[Create a owner-refcount separate from the global count] Some distributed systems already take advantage of the fact that the actual count is irrelevant. They use weights, so that other stores don't need to be updated until the (local) weight hits zero. While it would be reasonable for a thread to only INCREF once, and then keep its internal refcount elsewhere ... it is really hard to beat "(add1 to/subtract 1 from) an int already at a known location in cache." Also note that Mark Miller and Ping Yee http://www.eros-os.org/pipermail/e-lang/1999-May/002590.html suggested a way to mark objects as "expensive" (==> release as soon as possible). Combining this, today's python looks only at an object's size when determining which memory pool to use. There might be some value in also categorizing types based on their instances typical memory usage. Examples: (1) Default pool, like today. (2) Permanent Pool: Expected to be small and permanent. Maybe skip the refcount entirely? Or at least ignore it going to zero, so you don't need to lock for updates? (3) Thread-local. Has an "external refcount" field that would normally be zero. (4) Expensive: Going to a rooted GC won't release it fast enough. -jJ
Jim Jewett wrote:
I have some vague memory (but couldn't find the references) that someone tried and it was too expensive.
Too expensive compared to what? The question isn't whether it's more expensive than the current scheme, but whether it helps when there's no GIL and you have to lock the object to update the refcount. -- Greg
participants (8)
-
Aahz
-
Brett Cannon
-
Greg Ewing
-
Jan Kanis
-
Jim Jewett
-
Josiah Carlson
-
Stefan Behnel
-
Talin