recursive closures - reference leak
Hello there. Consider this code: def factorial(n): def helper(n): if n: return n*helper(n-1) else: return 1 return helper(n) Ok, this is a bit contrived, but there are other valid reasons for calling a closure recursively. The problem with this is that once you have called factorial() once, you end up with a recursive cycle. "factorial" has become a cell object, referencing the "helper" function, which again refers to the outer cell object. This requires "gc" to clean up. Also, it is entirely non-obvious. the problem becomes worse if the inner function also refers to some large, temporary variable, since it will get caught up in the reference loop. A workaround is this. Replace the final line with: try: return helper(n) finally: helper = None This clears the cell before function exit. Note that it is not possible to "del" it, you get a syntax error for that. Now, the reason I'm posting this here, and not to python-lang, is this: Is there a way to implement this differently so that this pattern doesn't result in a cycle? Possible ideas would be: 1) clear "helper" on function exit. Not good if the internal function is exported, though. 2) Use have 'helper' store a weakref to the cell object Cheers, Kristján
Kristján Valur Jónsson wrote:
The problem with this is that once you have called factorial() once, you end up with a recursive cycle. „factorial“ has become a cell object, referencing the „helper“ function, which again refers to the outer cell object. This requires „gc“ to clean up. Also, it is entirely non-obvious. the problem becomes worse if the inner function also refers to some large, temporary variable, since it will get caught up in the reference loop.
What problem are you referring to? Python has a gc exactly to deal with situations like this one. Surely you are aware that the cycle collector is invoked automatically without requiring user intervention. What specific issue are you trying to work around?
-----Original Message----- From: Hrvoje Niksic [mailto:hrvoje.niksic@avl.com] Sent: 8. desember 2009 13:52 To: Kristján Valur Jónsson Cc: python-dev@python.org Subject: Re: [Python-Dev] recursive closures - reference leak What problem are you referring to? Python has a gc exactly to deal with situations like this one. Surely you are aware that the cycle collector is invoked automatically without requiring user intervention. What specific issue are you trying to work around?
Ah, yes. In my particular case, I'm running a cluster of hundreds of nodes, supporting 50.000 players in a real-time space simulation. We disable GC because of its unpredictable performance impact and are careful to avoid reference cycles. We use gc from time to time to _find_ those cases that our programmers have missed, and fix them. This is how I stumbled upon this particular reference cycle that even a high level programmer would not have expected to have created. This is, IMHO the best use you can make of "gc": Help you code well, not let you cope with sloppy code :) K
Ah, yes. In my particular case, I'm running a cluster of hundreds of nodes, supporting 50.000 players in a real-time space simulation. We disable GC because of its unpredictable performance impact and are careful to avoid reference cycles. We use gc from time to time to _find_ those cases that our programmers have missed, and fix them. This is how I stumbled upon this particular reference cycle that even a high level programmer would not have expected to have created. This is, IMHO the best use you can make of "gc": Help you code well, not let you cope with sloppy code :)
K
Then it is a bit your fault. There is nothing particularly wrong with creating reference cycles (ie you can't avoid having a gc running in Java or Jython or anything else basically). Note that disabling gc does not mean that you will not have unpredictable pauses. Consider for example that if you loose a reference to a very long chain of objects, you can have arbitrarily many frees being called before anything else can happen. Cheers, fijal
2009/12/8 Maciej Fijalkowski
Note that disabling gc does not mean that you will not have unpredictable pauses. Consider for example that if you loose a reference to a very long chain of objects, you can have arbitrarily many frees being called before anything else can happen.
That strikes me as a *predictable* long pause. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Yes. Just –in-time releaseing of objects is much preferable to delayed release.
They tend to be in the cache, the individual runs of releases are smaller and have less of an individual impact.
a gc.collect() cycle visits a large amount of objects that it won‘t release causing cache thrashing.
There is a reason we disabled ‚gc‘, and it is simply because we get lower cpu and smoother execution.
K
From: daniel.stutzbach@gmail.com [mailto:daniel.stutzbach@gmail.com] On Behalf Of Daniel Stutzbach
Sent: 8. desember 2009 15:04
To: Maciej Fijalkowski
Cc: Kristján Valur Jónsson; python-dev@python.org
Subject: Re: [Python-Dev] recursive closures - reference leak
2009/12/8 Maciej Fijalkowski
Kristján Valur Jónsson
a gc.collect() cycle visits a large amount of objects that it won‘t release causing cache thrashing.
There is a reason we disabled ‚gc‘, and it is simply because we get lower cpu and smoother execution.
Could you try to enable the gc with Python trunk and report whether it does fewer and/or shorter gc collections? The gc has been optimized a bit compared to 2.6 and earlier. Thank you Antoine.
Kristján Valur Jónsson
The problem with this is that once you have called factorial() once, you end up with a recursive cycle.
You don't need a closure to exhibit a reference cycle. A global function is enough:
def helper(n): ... if n: ... return n*helper(n-1) ... else: ... return 1 ... helper.func_globals['helper'] is helper True
If you really want to avoid this you can prevent the function from depending on its outside environment:
from functools import partial def helper2(rec, n): ... if n: ... return n*rec(rec, n-1) ... else: ... return 1 ... factorial = partial(helper2, helper2) "helper2" in factorial.func.func_globals True del helper2 "helper2" in factorial.func.func_globals False factorial(3) 6
Regards Antoine.
-----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com@python.org [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] On Behalf Of Antoine Pitrou Sent: 8. desember 2009 14:55 To: python-dev@python.org Subject: Re: [Python-Dev] recursive closures - reference cycle
Kristján Valur Jónsson
writes: The problem with this is that once you have called factorial() once, you end up with a recursive cycle.
You don't need a closure to exhibit a reference cycle. A global function is enough:
def helper(n): ... if n: ... return n*helper(n-1) ... else: ... return 1 ... helper.func_globals['helper'] is helper True
yes, because: func_globals is globals() == True You don't need recursion for this to be true. And as soon as you delete "helper" from globals() it goes away. w = wearref.weakref(helper) del helper w() == False.
If you really want to avoid this you can prevent the function from depending on its outside environment:
from functools import partial def helper2(rec, n): ... if n: ... return n*rec(rec, n-1) ... else: ... return 1 ... factorial = partial(helper2, helper2) "helper2" in factorial.func.func_globals True del helper2 "helper2" in factorial.func.func_globals False factorial(3) 6
Interesting, pass itself in as an argument. Yes, there are ways around this, I know. But you have to agree that it is unexpected, no? Somethign for the "reference cycle FAQ." Anyway, I´ll return to my lair now, thanks for your time python-dev :) K
You could use a class: class factorial(): def fac(self, n): if n == 0: return 1 else: return n * self.fac(n - 1) def __call__(self, n): return self.fac(n) factorial = factorial() print factorial(5) -- Greg
-----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com@python.org [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] On Behalf Of Greg Ewing Sent: 8. desember 2009 22:09 To: python-dev@python.org Subject: Re: [Python-Dev] recursive closures - reference cycle
You could use a class:
Yes, and a number of different workarounds. That's not really the issue. The issue is that what looks like a perfectly safe idiom (calling a function recursively) happens to create a reference cycle if that function is a closure. This is a non-obvious "gotcha" that I must now educate my team about. Note that at least parts of the library strive to avoid reference cycles even though "gc" exists. An example being xlm.minidom.Document.unlink() Cheers, Kristján
Kristján Valur Jónsson wrote:
Yes, and a number of different workarounds. That's not really the issue. The issue is that what looks like a perfectly safe idiom (calling a function recursively) happens to create a reference cycle if that function is a closure. This is a non-obvious "gotcha" that I must now educate my team about.
Documentation seems to be about the only thing that can be done about this. Recursion inherently involve self-reference, and that's going to create reference cycles somewhere, somehow. You get a reference cycle with top-level recursion too, the only difference is that just one cycle is created when the module is imported, rather than one every time you call the function. This is only a problem if you care about avoiding cycles, and if you do, this is just one of many subtle ways of creating them. It's hard to tell where abouts in the docs to put information about this, however. It really relates to a certain pattern of using language features rather than to any particular feature. Maybe there should be a section devoted to avoidance of reference cycles where all of these known pitfalls can be pointed out. -- Greg
Yes, and a number of different workarounds. That's not really the issue. The issue is that what looks like a perfectly safe idiom (calling a function recursively) happens to create a reference cycle if that function is a closure. This is a non-obvious "gotcha" that I must now educate my team about.
I'm not sure why you call that non-obvious. As Antoine(?) pointed out, *any* recursive function calls a reference cycle, and that is not surprising, because the function refers to itself, so the function definition is *obviously* cyclic (i.e. from within the function body, you can get back to the function object). I know you dismissed the global function example as non-relevant, because the cycle is through the globals dictionary (i.e. any global function is in a cycle, whether it is recursive or not), and because it is possible to break the cycle (by deleting the function from the globals). However, that's beside the point, because a) it is an optimization that closures are only cyclic when they need to be (i.e. a nested function only keeps references to those name bindings that are mentioned in its body, rather than referencing the entire closure), and b) it is also possible to break the cycle in the closure case, by setting the cell target to None (say), and c) the actual problem that you are having is not that the nested functions create cycles, but that you get a fresh one every time you call the outer function (so that you get the impression of a memory leak).
Note that at least parts of the library strive to avoid reference cycles even though "gc" exists. An example being xlm.minidom.Document.unlink()
That doesn't avoid reference cycles. Instead, it helps breaking them (as a minidom tree *is* cyclic). minidom has this explicit API for breaking cycles because it predated the introduction of cyclic GC in Python. Regards, Martin
participants (7)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Daniel Stutzbach
-
Greg Ewing
-
Hrvoje Niksic
-
Kristján Valur Jónsson
-
Maciej Fijalkowski