
A few days ago the idea of untracking, or somehow removing, immutable and provably uncyclic tuples from garbage collection was bounced about - at the time the idea was to untrack a tuple once it was noticed that it contained no cyclic objects... this proves annoying to implement because of unfilled tuples, and C-api calls that mutate the tuple (highly unusual it seems). Lazily tracking the tuples seems to be a much simpler proposition: a tuple can only be a member of a cycle once an element of ob_item is set to an object that could contain a cycle, presumably items are set in a tuple far less frequently than they are accessed (by normal code or the collector), so I imagine this method is also more efficient. The changes required are fairly small, PyObject_GC_IS_TRACKED is added to determine if an object is tracked, the gc track and untrack operations in tupleobject.c are modified to account for the tuple not always being tracked, and PyTuple_SET_ITEM and PyTuple_SetItem are modified to begin tracking the tuple if a) it is not already tracked, and b) the item being set _could_ contain a cycle. At the moment, I use PyObject_IS_GC as a conserative approximation to '_could_ contain a cycle', which works well except for nested tuples. In theory if v is a tuple, _SetItem(o,i,v) should only be called when v has been completely filled, so if v is untracked then we know it also cannot be member of a cycle, which solves the nested tuple problem. In practice making the nested tuple change did not cause any change in behavior on the test suite, but I have left it out for simplicity. Large trees of atomic data represented as tuples seems like a case where it could possibly make a difference. A 2.23 patch for the changes outlined above are at: http://www.geocities.com/evilzr/lazytuplegc/ (I wasn't sure if such things are appropriate for the SF-patch manager - yes/no?) If you want to actually test the changes you need a recent checkout, there was a minor collector bug that the lazy tracking exploited heavily. Some stats - cvs | lazytuplegc --------------------------------------- iterzip_test, N=500000 juststore: 1.46 | 1.43 gc size 1143 | 788 justtups: 1.40 | 1.30 gc size 1143 | 788 storetups: 8.48 | 4.11 gc size 501143 | 788 base gc set size python: 1081 | 742 idle: 7754 | 3818 pybench 1.0 avg time (n=10,warp=20) 69813.40 | 69708.04 The iterzip_test is only slightly munged from the one Tim Peters posted on the iterzip() thread. The pybench averages are almost identical, however there are large fluctuations on the individual tests, see http://www.geocities.com/evilzr/lazytuplegc/cmp.txt The base gc set numbers are obtained by starting python (or idle) and immediatly running a collection with gc.DEBUG_STATS on. All gc size numbers are size(gen0)+size(gen1)+size(gen2). this patch causes one regrtest to fail (test_gc, in test_staticmethod, for reasons I do not yet understand (help!)). ... make of it what you will :) Oh and, not to make this post longer, but heres the first-pydev-list-posting-info: I'm Daniel Dunbar, my introduction to Python was embedding it into the Blender 3D package (www.blender3d.com, but we went out of business - oops). Since then I have used it for lots of esoteric little tasks, but only recently after wrapping some of my C-based libraries have I started writing complex (ie. applications not utilities) programs with it. My main interest related to python development is making it go faster (so I can optimize my code less)... but thats probably not rare. ===== daniel dunbar evilzr@yahoo.com __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com

Daniel Dunbar <evilzr@yahoo.com> writes:
Lazily tracking the tuples seems to be a much simpler proposition
This proposal sounds good, but there are a few issues.
and PyTuple_SET_ITEM and PyTuple_SetItem are modified
This is a problem: PyTuple_SET_ITEM is compiled into the extension module, so you need to bump the API version (or else old modules would be used that do not have your change).
I think it is safe to make this assumption. The worst case is that you get cyclic garbage that won't be collected. Given that it is a bug to refer to an "incomplete" tuple, I think this consequence of this bug is acceptable.
If you think this patch is reasonably complete, and you wish that it is included in Python, then yes, put it onto SF. Regards, Martin

I'm not sure I like this approach. Slowing down PyTuple_SET_ITEM to optimize a micro-benchmark seems dubious. However, dropping the number of tracked objects by 50% _does_ seem worthwhile. Couldn't we get the same effect by checking and untracking the appropriate tuples after finishing a gen1 collection? Each tuple would be checked once and short lived tuples will probably not be checked at all. Neil

[Neil Schemenauer, on Daniel Dunbar's interesting tuple-untrack scheme]
I agree with Neil here. Do you really mean after a gen1 collection? Or after a gen0 collection? Or before a gen2 collection <wink>? The micro-optimizer in me would like to combine testing for "safe to untrack" objects with the whole-list crawl being done anyway in update_refs() at the start of a collection. Offhand I guess I'd do that at the start of a gen0 collection. If that untracks a tuple that was going to die soon anyway, it's not really a waste of time, because untracking at the very start saves the rest of the collection from crawling over it twice (subtract_refs won't see it if it gets untracked, and ditto move_roots). So that seems a net win. OTOH, it that decides a tuple isn't safe to untrack, and the tuple was going to die soon anyway, it's an "extra" peek at the tuple guts. That's a net loss. But I expect an overwhelming majority of tuples are safe to untrack, so playing a frequent net win against an infrequent net loss seems a net win.

[martin@v.loewis.de]
To start with, sure, and maybe to end with. Tuples are the only type where we have actual evidence that it can make a lick of difference. I don't want to delay getting an instant win there. If somebody cares enough to invent a Thoroughly General Protocol, and can convince Guido it's not just YAGNI, fine. The 6 lines of tuple-checking code can be reworked easily to use the protocol when it exists. Believe me <wink>: we already spent more time typing about this than it would have taken to write the tuple code ten times over, and tuples are still the only specific type anyone has mentioned as being a potential winner for this trick.

[Tim]
Do you really mean after a gen1 collection?
[Neil]
Yes. By that time the tuple was probably around for a while. Also, the chances that it will be mutated will be smaller.
I won't bother arguing <wink>, but then why not at the start of a gen1 collection? Nothing that transpires within a gen1 collection is going to change a given tuple's trackable-vs-untrackable nature, and if untrackable tuples are indeed much more likely, there seems nothing to gain (but something to lose) by making gen1 scan untrackable tuples multiple times.

(replying to Neil & Tim) --- Tim Peters <tim_one@email.msn.com> wrote:
That was basically the opinion I reached as well - changing PyTuple_SET_ITEM seems nasty in principle, and the performance results were dubious at best... the 50% drop in tracked objects also seemed like the most interesting thing to me, however, its not clear that that percentage would remain under the dynamic conditions of a real application.
The other thing to note is that untracking such tuples is a very small win regardless - if they were not untracked the tuple visit routine will quickly scan down the tuple (which I guess is likely to be small to start with), and the GC visitor will quickly reject adding any of the contained objects because they are not PyObject_IS_GC(). That suggests that the only case it would make a difference is a) when there are soooo many untrackable tuples that simply traversing them takes a large amount of time (ie. the iterzip case), or b) when there is a massive nesting of untrackable tuples - an large AST or binary-tree might benefit from this, but there are perhaps better solutions to that problem (see below) < snip - Tim talks about issues with removing tuples in the collector > I still don't understand how you 'check the tuple' and then untrack it - even if it has no NULL slots it still isn't safe to remove. I don't think I agree that special casing tuples in the collector is a good idea - this whole issue just came up because of the iterzip test, where it was noticed that the collector really didn't have to check all the objects, which is just a special case solution to the problem that many of the objects in the gen* lists are not going to be participating in a cycle. A solution to that general problem would be much more beneficial, although presumably much more work ;) If the collector could somehow prioritize objects, it seems like various heuristics could be used to gauge the likelyhood of an object participating in a cycle vs. the 'good' of collecting that cycle. As an aside: are there some nice established tests for estimating the benefit of modifications to the collector? Something that needs the collector (so getting cycles early keeps the memory consumption down, improving performance) and also has dynamic object allocation properties that mimic some ideal of real-applications? ===== daniel dunbar evilzr@yahoo.com __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com

[Daniel Dunbar]
I don't know that it's small. It's a function call per tuple scan, and apps slinging millions of tuples aren't rare: just touching the memory to scan all of them is expensive.
Mabye.
It's very likely safe, and if we do this I can easily guarantee it's 100% safe for the core distribution. It's not possible to create a tuple with NULL slots from Python code, so it's just a matter of checking the core C code that creates tuples. Most of it is utterly and obviously safe. A little bit isn't. For example, there's a very small hole in PySequence_Tuple() then, when the allocated tuple turns out not to be big enough: then invoking the input sequence iterator can trigger gc, and that might erroneously untrack the tuple being constructed. But that hole is easily plugged, and I expect that all such holes are just as easily plugged. Extension modules aren't supposed to call _PyTuple_Resize() at all (it's part of the private API), and it's very unlikely that anything not calling that routine is vulnerable. It's only called from 5 places in the core, and 2 of them are in PySequence_Tuple. Still, it would be a new restriction, and may fool something *somewhere*. If it does, and they're tuples that turn out to be in a cycle (itself rare, especially for tuples that grow), they may get a leak. BFD <wink>.
As above, apps slinging millions of tuples aren't rare, and it's not doing anyone any good in cases where they survive an unbounded number of collections. They may be important not because it's pretty clear how to exempt them, but because they may be important <0.9 wink>.
Neil has since generalized the gc module to make it easy to add more generations. That's a general and effective approach to reducing useless scans. We're never going to have anything like, say, write barriers in Python, and OS-specific and platform-specific tricks are right out. GC in Python lives within many severe constraints.
Not that I know of. The cycle gc has performed (IMO), overall, very well since its inception. Improvements are driven more by the rare pathological cases that pop up than by systematic experimentation. Neil seems to use one of his large apps to "try things out on", and that's as close to an ideal real-life app as I expect anyone is going to hand Neil. Glitches tend to show up as timing anomalies (iterzip was typical in that respect).

On Wed, 8 May 2002, Tim Peters wrote:
Well, I've found one case where the garbage collector _does_ cause serious performance problems, and coincidentally enough it is due to the creation of zillions of cycle-less tuples. Our current solution has been to disable GC in some parts of our code, and then manually trigger collection at the correct points. When I can find some free time, I will definitly test drive any available tuple untracking patches to see if they make any substantial difference in our code. Regards, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs]
Well, I'm told it's a 3-day weekend <wink>, and Neil has a short patch you can try: http://www.python.org/sf/558745 If you can, please do. I view this as a diasaster-insurance change, not as a generally applicable speedup (it's almost certainly not, and especially not when using a distinct pass to weed out tuples -- I'd combine it with the copy-refcounts pass), so this lives or dies on whether it helps people suffering pathologies in real life. That's you. yes-you-really-are-the-center-of-the-universe-ly y'rs - tim

On Fri, 24 May 2002, Tim Peters wrote:
It doesn't seem to make much difference in our app. There seems to be some speedup, but nothing dramatic. Turning GC on and off at the right times is still slightly faster. The good news is that another (unrelated) part of our code just became about 20-40% faster with this patch, though I need to do some fairly major surgery to isolate why this is so. So more sleuthing required, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs, on Neil's tuple-untracking patch]
I'm confused. You earlier said: I've found one case where the garbage collector _does_ cause serious performance problems, and coincidentally enough it is due to the creation of zillions of cycle-less tuples. Our current solution has been to disable GC in some parts of our code, and then manually trigger collection at the correct points. Now I'm hearing that turning GC on and off is only "slightly faster", and that the speedup is so small you're not even sure there is one ("turning GC on and off" is "slightly faster" than a trial where there "seems to be some speedup"). This is not "serious performance problems" except to nanosecond counters like me <wink>.
Keep us informed! I suspect you're suffering from an app that's more than 20 lines long; I try to stay away from those. most-days-it's-hard-enough-out-thinking-one-line-ly y'rs - tim

On Tue, 28 May 2002, Tim Peters wrote:
Sorry, I wasn't very clear here. The patch _does_ fix the performance problem by untracking cycle-less tuples when we use the naive version of our code (i.e., the one that does not play with the garbage collector). However, the performance of the patched GC when compared to our GC-tuned code is very similar.
Try 20Kloc for this one. Our unit tests tend to be too fine-grained to detect these kinds of interactions, so it may be a while before I isolate exactly why this is. Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs, on Neil's tuple-untracking patch]
Then Neil's patch is doing all that we could wish of it in this case (you seem to have counted it as a strike against the patch that it didn't do better than you can by turning off gc by hand, but that's unrealistic if so), and then some:
That makes it a winner if it doesn't slow non-pathological cases "too much" (counting your cases as pathological, just because they are <wink>).

On Mon, 3 Jun 2002, Tim Peters wrote:
I didn't count it as a strike against the patch -- I had just hoped that untracking tuples would result in faster execution than turning GC off and letting my heap swell obscenely. One extreme case could happend if I turn off GC, run my code, and let it fill all of my real memory with tuples, and start swapping to disk. Clearly, keeping GC enabled with the tuple untracking patch would result in huge performance gains. This is not the situation I was dealing with, though I was hoping for a relatively smaller improvement from having a more compact and (hopefully) less fragmented heap. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs]
Kevin, keep in mind that we haven't run your app: the only things we know about it are what you've told us. That your heap swells obscenely when gc is off is news to me. *Does* your heap swell obscenely when turning gc off, but does not swell obscenely if you keep gc on? If so, you should keep in mind too that the best way to speed gc is not to create cyclic trash to begin with.
Sorry, this isn't clear at all. If your tuples aren't actually in cycles, then whether GC is on or off is irrelevant to how long they live, and to how much memory they consume. It doesn't cost any extra memory (not even one byte) for a tuple to live in a gc list; on the flip side, no memory is saved by Neil's patch.
Neil's patch should have no effect on memory use or fragmentation. It's only aiming at reducing the *time* spent uselessly scanning and rescanning and rescanning and ... tuples in bad cases. So long as your tuples stay alive, they're going to consume the same amount of memory with or without the patch, and whether or not you disable gc.

On Mon, 3 Jun 2002, Tim Peters wrote:
This part of my app allocates a mix of several kinds of objects. Some are cyclic tuple trees, many are acyclic tuples, and others are complex class instances. To make things worse, the object lifetimes vary from ephemeral to very long-lived. Due to this mix, turning GC off *does* cause the heap to swell, and we have to very carefully monitor the algorithm to relieve the swelling frequently enough to prevent eating too much system memory, but infrequently enough that we don't spend all of our time in the GC tracking through the many acyclic tuples. Also, due to the dynamic nature of the algorithm, it is not trivial to avoid creating cyclic trash.
But I do generate cyclic trash, and it does build up when I turn off GC. So, if it runs long enough with GC turned off, the machine will run out of real memory. This is all that I am saying.
With Neil's patch and GC turned ON: Python untracks many, many tuples that give for several generations that would otherwise have to scanned to dispose of the accumulating cyclic garbage. Some GC overhead is observed, due to normal periodic scanning and the extra work needed to untrack acyclic tuples. without Neil's patch and automatic GC turned OFF: Python does not spend any time scanning for garbage, and things are very fast until we run out of core memory, or the patterns of allocations result in an extremely fragmented heap due to interleaved allocation of objects with very heterogeneous lifetimes. At certain intervals, GC would be triggered manually to release the cyclic trash. My hope was that the more compact and hopefully less fragmented heap would more than offset the overhead of automatic GC scans and the tuple untracking sweep. Hopefully my comments are now a little clearer... -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

Daniel Dunbar <evilzr@yahoo.com> writes:
Lazily tracking the tuples seems to be a much simpler proposition
This proposal sounds good, but there are a few issues.
and PyTuple_SET_ITEM and PyTuple_SetItem are modified
This is a problem: PyTuple_SET_ITEM is compiled into the extension module, so you need to bump the API version (or else old modules would be used that do not have your change).
I think it is safe to make this assumption. The worst case is that you get cyclic garbage that won't be collected. Given that it is a bug to refer to an "incomplete" tuple, I think this consequence of this bug is acceptable.
If you think this patch is reasonably complete, and you wish that it is included in Python, then yes, put it onto SF. Regards, Martin

I'm not sure I like this approach. Slowing down PyTuple_SET_ITEM to optimize a micro-benchmark seems dubious. However, dropping the number of tracked objects by 50% _does_ seem worthwhile. Couldn't we get the same effect by checking and untracking the appropriate tuples after finishing a gen1 collection? Each tuple would be checked once and short lived tuples will probably not be checked at all. Neil

[Neil Schemenauer, on Daniel Dunbar's interesting tuple-untrack scheme]
I agree with Neil here. Do you really mean after a gen1 collection? Or after a gen0 collection? Or before a gen2 collection <wink>? The micro-optimizer in me would like to combine testing for "safe to untrack" objects with the whole-list crawl being done anyway in update_refs() at the start of a collection. Offhand I guess I'd do that at the start of a gen0 collection. If that untracks a tuple that was going to die soon anyway, it's not really a waste of time, because untracking at the very start saves the rest of the collection from crawling over it twice (subtract_refs won't see it if it gets untracked, and ditto move_roots). So that seems a net win. OTOH, it that decides a tuple isn't safe to untrack, and the tuple was going to die soon anyway, it's an "extra" peek at the tuple guts. That's a net loss. But I expect an overwhelming majority of tuples are safe to untrack, so playing a frequent net win against an infrequent net loss seems a net win.

[martin@v.loewis.de]
To start with, sure, and maybe to end with. Tuples are the only type where we have actual evidence that it can make a lick of difference. I don't want to delay getting an instant win there. If somebody cares enough to invent a Thoroughly General Protocol, and can convince Guido it's not just YAGNI, fine. The 6 lines of tuple-checking code can be reworked easily to use the protocol when it exists. Believe me <wink>: we already spent more time typing about this than it would have taken to write the tuple code ten times over, and tuples are still the only specific type anyone has mentioned as being a potential winner for this trick.

[Tim]
Do you really mean after a gen1 collection?
[Neil]
Yes. By that time the tuple was probably around for a while. Also, the chances that it will be mutated will be smaller.
I won't bother arguing <wink>, but then why not at the start of a gen1 collection? Nothing that transpires within a gen1 collection is going to change a given tuple's trackable-vs-untrackable nature, and if untrackable tuples are indeed much more likely, there seems nothing to gain (but something to lose) by making gen1 scan untrackable tuples multiple times.

(replying to Neil & Tim) --- Tim Peters <tim_one@email.msn.com> wrote:
That was basically the opinion I reached as well - changing PyTuple_SET_ITEM seems nasty in principle, and the performance results were dubious at best... the 50% drop in tracked objects also seemed like the most interesting thing to me, however, its not clear that that percentage would remain under the dynamic conditions of a real application.
The other thing to note is that untracking such tuples is a very small win regardless - if they were not untracked the tuple visit routine will quickly scan down the tuple (which I guess is likely to be small to start with), and the GC visitor will quickly reject adding any of the contained objects because they are not PyObject_IS_GC(). That suggests that the only case it would make a difference is a) when there are soooo many untrackable tuples that simply traversing them takes a large amount of time (ie. the iterzip case), or b) when there is a massive nesting of untrackable tuples - an large AST or binary-tree might benefit from this, but there are perhaps better solutions to that problem (see below) < snip - Tim talks about issues with removing tuples in the collector > I still don't understand how you 'check the tuple' and then untrack it - even if it has no NULL slots it still isn't safe to remove. I don't think I agree that special casing tuples in the collector is a good idea - this whole issue just came up because of the iterzip test, where it was noticed that the collector really didn't have to check all the objects, which is just a special case solution to the problem that many of the objects in the gen* lists are not going to be participating in a cycle. A solution to that general problem would be much more beneficial, although presumably much more work ;) If the collector could somehow prioritize objects, it seems like various heuristics could be used to gauge the likelyhood of an object participating in a cycle vs. the 'good' of collecting that cycle. As an aside: are there some nice established tests for estimating the benefit of modifications to the collector? Something that needs the collector (so getting cycles early keeps the memory consumption down, improving performance) and also has dynamic object allocation properties that mimic some ideal of real-applications? ===== daniel dunbar evilzr@yahoo.com __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com

[Daniel Dunbar]
I don't know that it's small. It's a function call per tuple scan, and apps slinging millions of tuples aren't rare: just touching the memory to scan all of them is expensive.
Mabye.
It's very likely safe, and if we do this I can easily guarantee it's 100% safe for the core distribution. It's not possible to create a tuple with NULL slots from Python code, so it's just a matter of checking the core C code that creates tuples. Most of it is utterly and obviously safe. A little bit isn't. For example, there's a very small hole in PySequence_Tuple() then, when the allocated tuple turns out not to be big enough: then invoking the input sequence iterator can trigger gc, and that might erroneously untrack the tuple being constructed. But that hole is easily plugged, and I expect that all such holes are just as easily plugged. Extension modules aren't supposed to call _PyTuple_Resize() at all (it's part of the private API), and it's very unlikely that anything not calling that routine is vulnerable. It's only called from 5 places in the core, and 2 of them are in PySequence_Tuple. Still, it would be a new restriction, and may fool something *somewhere*. If it does, and they're tuples that turn out to be in a cycle (itself rare, especially for tuples that grow), they may get a leak. BFD <wink>.
As above, apps slinging millions of tuples aren't rare, and it's not doing anyone any good in cases where they survive an unbounded number of collections. They may be important not because it's pretty clear how to exempt them, but because they may be important <0.9 wink>.
Neil has since generalized the gc module to make it easy to add more generations. That's a general and effective approach to reducing useless scans. We're never going to have anything like, say, write barriers in Python, and OS-specific and platform-specific tricks are right out. GC in Python lives within many severe constraints.
Not that I know of. The cycle gc has performed (IMO), overall, very well since its inception. Improvements are driven more by the rare pathological cases that pop up than by systematic experimentation. Neil seems to use one of his large apps to "try things out on", and that's as close to an ideal real-life app as I expect anyone is going to hand Neil. Glitches tend to show up as timing anomalies (iterzip was typical in that respect).

On Wed, 8 May 2002, Tim Peters wrote:
Well, I've found one case where the garbage collector _does_ cause serious performance problems, and coincidentally enough it is due to the creation of zillions of cycle-less tuples. Our current solution has been to disable GC in some parts of our code, and then manually trigger collection at the correct points. When I can find some free time, I will definitly test drive any available tuple untracking patches to see if they make any substantial difference in our code. Regards, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs]
Well, I'm told it's a 3-day weekend <wink>, and Neil has a short patch you can try: http://www.python.org/sf/558745 If you can, please do. I view this as a diasaster-insurance change, not as a generally applicable speedup (it's almost certainly not, and especially not when using a distinct pass to weed out tuples -- I'd combine it with the copy-refcounts pass), so this lives or dies on whether it helps people suffering pathologies in real life. That's you. yes-you-really-are-the-center-of-the-universe-ly y'rs - tim

On Fri, 24 May 2002, Tim Peters wrote:
It doesn't seem to make much difference in our app. There seems to be some speedup, but nothing dramatic. Turning GC on and off at the right times is still slightly faster. The good news is that another (unrelated) part of our code just became about 20-40% faster with this patch, though I need to do some fairly major surgery to isolate why this is so. So more sleuthing required, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs, on Neil's tuple-untracking patch]
I'm confused. You earlier said: I've found one case where the garbage collector _does_ cause serious performance problems, and coincidentally enough it is due to the creation of zillions of cycle-less tuples. Our current solution has been to disable GC in some parts of our code, and then manually trigger collection at the correct points. Now I'm hearing that turning GC on and off is only "slightly faster", and that the speedup is so small you're not even sure there is one ("turning GC on and off" is "slightly faster" than a trial where there "seems to be some speedup"). This is not "serious performance problems" except to nanosecond counters like me <wink>.
Keep us informed! I suspect you're suffering from an app that's more than 20 lines long; I try to stay away from those. most-days-it's-hard-enough-out-thinking-one-line-ly y'rs - tim

On Tue, 28 May 2002, Tim Peters wrote:
Sorry, I wasn't very clear here. The patch _does_ fix the performance problem by untracking cycle-less tuples when we use the naive version of our code (i.e., the one that does not play with the garbage collector). However, the performance of the patched GC when compared to our GC-tuned code is very similar.
Try 20Kloc for this one. Our unit tests tend to be too fine-grained to detect these kinds of interactions, so it may be a while before I isolate exactly why this is. Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs, on Neil's tuple-untracking patch]
Then Neil's patch is doing all that we could wish of it in this case (you seem to have counted it as a strike against the patch that it didn't do better than you can by turning off gc by hand, but that's unrealistic if so), and then some:
That makes it a winner if it doesn't slow non-pathological cases "too much" (counting your cases as pathological, just because they are <wink>).

On Mon, 3 Jun 2002, Tim Peters wrote:
I didn't count it as a strike against the patch -- I had just hoped that untracking tuples would result in faster execution than turning GC off and letting my heap swell obscenely. One extreme case could happend if I turn off GC, run my code, and let it fill all of my real memory with tuples, and start swapping to disk. Clearly, keeping GC enabled with the tuple untracking patch would result in huge performance gains. This is not the situation I was dealing with, though I was hoping for a relatively smaller improvement from having a more compact and (hopefully) less fragmented heap. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs]
Kevin, keep in mind that we haven't run your app: the only things we know about it are what you've told us. That your heap swells obscenely when gc is off is news to me. *Does* your heap swell obscenely when turning gc off, but does not swell obscenely if you keep gc on? If so, you should keep in mind too that the best way to speed gc is not to create cyclic trash to begin with.
Sorry, this isn't clear at all. If your tuples aren't actually in cycles, then whether GC is on or off is irrelevant to how long they live, and to how much memory they consume. It doesn't cost any extra memory (not even one byte) for a tuple to live in a gc list; on the flip side, no memory is saved by Neil's patch.
Neil's patch should have no effect on memory use or fragmentation. It's only aiming at reducing the *time* spent uselessly scanning and rescanning and rescanning and ... tuples in bad cases. So long as your tuples stay alive, they're going to consume the same amount of memory with or without the patch, and whether or not you disable gc.

On Mon, 3 Jun 2002, Tim Peters wrote:
This part of my app allocates a mix of several kinds of objects. Some are cyclic tuple trees, many are acyclic tuples, and others are complex class instances. To make things worse, the object lifetimes vary from ephemeral to very long-lived. Due to this mix, turning GC off *does* cause the heap to swell, and we have to very carefully monitor the algorithm to relieve the swelling frequently enough to prevent eating too much system memory, but infrequently enough that we don't spend all of our time in the GC tracking through the many acyclic tuples. Also, due to the dynamic nature of the algorithm, it is not trivial to avoid creating cyclic trash.
But I do generate cyclic trash, and it does build up when I turn off GC. So, if it runs long enough with GC turned off, the machine will run out of real memory. This is all that I am saying.
With Neil's patch and GC turned ON: Python untracks many, many tuples that give for several generations that would otherwise have to scanned to dispose of the accumulating cyclic garbage. Some GC overhead is observed, due to normal periodic scanning and the extra work needed to untrack acyclic tuples. without Neil's patch and automatic GC turned OFF: Python does not spend any time scanning for garbage, and things are very fast until we run out of core memory, or the patterns of allocations result in an extremely fragmented heap due to interleaved allocation of objects with very heterogeneous lifetimes. At certain intervals, GC would be triggered manually to release the cyclic trash. My hope was that the more compact and hopefully less fragmented heap would more than offset the overhead of automatic GC scans and the tuple untracking sweep. Hopefully my comments are now a little clearer... -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com
participants (6)
-
Daniel Dunbar
-
Kevin Jacobs
-
martin@v.loewis.de
-
Neil Schemenauer
-
Tim Peters
-
Tim Peters