Hi all, I'm trying again to look at the gc-del branch, but I'm kind of failing. The current state is: http://buildbot.pypy.org/summary?branch=gc-del The failures are all more or less obscure. The issues all may have to do with subtly broken things in the finalizer ordering code. But now I believe that the attempted implementation was a bad idea. For reference: you call "rgc.register_finalizer(method)" on objects where you want a finalizer to be called (instead of "def __del__()", which is now limited to "lightweight" destructors). It's a good idea by itself because (1) you can register a finalizer only after the object is ready, and not have to care for seeing half-initialized objects in the finalizer; and (2) you can register a finalizer at any time, or not at all, on various instances of the same class, which is very convenient for implementing app-level objects that may or may not have a user-level __del__ method. The issue is that the finalizers (particularly the app-level ones) should only be called at known points in time, e.g. between bytecodes, and not from random places. I tried to do this: if a finalizer raises "FinalizeLater", then it suspends calling finalizers --- both the current one and all future ones. Later, between bytecodes, we call rgc.progress_through_finalizer_queue() to resume. This looks innocent enough, but it's a major mess. First in test runs: because the finalizer queue is not attached to any space, and raising FinalizeLater suspends calling any finalizer from there, in order to maintain the correct order. Also, I suspect, in real runs. We're getting cases where the app-level __del__() is not called and I'm really unsure why, but I suspect that some finalizer raises FinalizeLater without making sure that we'll later call rgc.progress_through_finalizer_queue(). Basically, this API looks broken now. It becomes messy when you consider that some helpers in rpython.rlib really need rgc.register_finalizer(), like rmmap. These helpers don't know about the space. The hack of the previous paragraph was an attempt to still have rmmap objects finalized in the globally correct order. Help! :-) Any cleaner ideas welcome! Would it make sense to *not* call automatically any finalizer, and instead require that the RPython program calls rgc.flush_finalizer_queue() regularly? Maybe with some minimal signal-like event from the GC to mean "there are finalizers now"? It would seem closer to what e.g. Java does, which just puts finalizable objects on a queue from the GC. A bientôt, Armin.
Re-hi, Update after discussion on IRC (thanks cfbolz). First, there are two problems that should be separated. One is the change in interface for RPython programs, calling rgc.register_finalizer() instead of having regular __del__()s. The other is making these finalizers be called more quickly in case of a chain of objects that all have finalizers. Let's focus on the second problem. We found a more reasonable way to address it --- or at least, to reduce its bad effects to a large extent. Recall that finalizers are only run after major collections. Why? Because there would be little point in running them after minor collections. Consider the case of two objects A -> B, both with finalizers. Even if we would discover during a minor collection that they are dead, they still need to be both promoted to old objects, after which we can call A.__del__ --- but not B.__del__. The latter needs to wait until the following collection --- the following *major* collection, because at this point B is an old object. The proposed fix is to re-make these objects young. In other words, when we find that A and B are not reachable, all objects that are referenced only from A or B still survive of course, but they all become young again. Then the next *minor* collection will deal with them again. In details: * we run a major or minor collection and find that A -> B are not reachable * A and B and objects only reachable from them are made young again * we immediately schedule A.__del__ to be called * at the next *minor* collection we're again likely to find that B is not reachable * B and objects only reachable from it are made young again * we schedule B.__del__ to be called ...and so on if we have a full chain of objects with finalizers rather than just two. They would be finalized at the rythm of one per minor collection. Even if not perfect, this is much, much better than the current rythm, which is one per major collection. The advantage of this approach is that it's done without RPython changes, just by tweaks in the GC. A bientôt, Armin.
On 12 August 2013 17:38, Armin Rigo <arigo@tunes.org> wrote:
The advantage of this approach is that it's done without RPython changes, just by tweaks in the GC.
Do you know what changes to the GC interface you expect to make, if any? -- William Leslie Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely may reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to deny you those rights would be illegal without prior contractual agreement.
Hi William, On Mon, Aug 12, 2013 at 10:00 AM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:
On 12 August 2013 17:38, Armin Rigo <arigo@tunes.org> wrote:
The advantage of this approach is that it's done without RPython changes, just by tweaks in the GC.
Do you know what changes to the GC interface you expect to make, if any?
What do you call the GC interface? The way the GC internally interacts with the rest of the RPython program? Or the API exposed by rpython.rlib.rgc? Or are you asking from the point of view of a PyPy user, how we will change the following sentence on http://pypy.readthedocs.org/en/latest/cpython_differences.html? """Note that if you have a long chain of objects, each with a reference to the next one, and each with a __del__, PyPy’s GC will perform badly.""" A bientôt, Armin.
participants (2)
-
Armin Rigo
-
William ML Leslie