[Python-3000] __close__ method

Giovanni Bajo rasky at develer.com
Sun Sep 24 02:04:36 CEST 2006


Micheal,

many thanks for your interesting mail, which pointed out the outcome of the
previous thread. Let me trying to answer some questions of yours about
__close__.

> But I'm hearing general agreement (at least among those contributing
> to this thread) that it might be wise to change the status quo.

Status quo of __del_:

Pros:
- Easy syntax: very simple to use in easy situations.
- Easy semantic: familiar to beginners (similarity with other programming
languages), and being the "opposite" of __init__ makes it easy to teach.

Cons:
- Makes reference loops uncollectable -> people learn fast to avoid it in most
classes
- Allow resurrection, which is a headache for Python developers


> The two kinds of solutions I'm hearing are (1) those that are based
> around making a helper object that gets stored as an attribute in
> the object, or a list of weakrefs, or something like that, and (2)
> the __close__ proposal (or perhaps keep the name __del__ but change
> the semantics.
>
> The difficulties with (1) that have been acknowledged so far are
> that the way you code things becomes somewhat less obvious, and
> that there is the possibility of accidentally creating immortal
> objects through reference loops.

Exactly. To be able to code correctly these finalizers, you need to be much
more Python savvy than you need to use __del__, because you need to understand
and somehow master:

- weakrefs
- early binding of default arguments of functions

which are not exactly the two brightest areas of Python.

[ (2) the __close__ proposal ]
> I would like to hear someone address the weaknesses of (2).
> The first I know of is that the code in your __close__ method (or
> __del__) must assume that it might have been in a reference loop
> which was broken in some arbitrary place. As a result, it cannot
> assume that all references it holds are still valid. To avoid
> crashing the system, we'd probably have to set the broken
> references to None (is that the best choice?), but can people
> really write code that has to run assuming that its references
> might be invalid?

I might be wrong, but given the constraint that __close__ could be called
multiple times for the same objects, and I don't see how this situation might
appear. The cyclic GC could:

1) call __close__ on the instances *BEFORE* dropping the references. The code
in __close__ could break the cycle itself.
2) only after that, assume that __close__ did not dispose anything related to
the loop itself, and thus drop a random reference in the chain. This would
cause other calls to __close__ on the instances, which should result in
basically no-ops since they have been already executed.

BTW: would it be possible to "nullify" the __close__ method after it has been
executed once somehow, so that it won't get executed twice on the same
instance? A single bit in the instance (with the meaning of "already closed")
should be sufficient. If this is possible, then the above algorithm is easier
to implement, and it also makes __close__ methods easier to implement.

> A second problem I know of is, what if the code stores a reference
> to self someplace? The ability for __del__ methods to resurrect
> the object being finalized is one of the major sources of
> complexity in the GC module, and changing the semantics to
> __close__ doesn't fix this.

I don't think __close__ can solve this problem, in fact. I don't specifically
consider it a weakness of __close__, strictly speaking, though.


> -------- examples only below this line --------
>
> class MyClass2(object):
>     def __init__(self, resource1_name, resource2_name):
>         self.resource1 = acquire_resource(resource1_name)
>         self.resource2 = acquire_resource(resource2_name)
>     def flush(self):
>         self.resource1.flush()
>         self.resource2.flush()
>         if hasattr(self, 'next'):
>             self.next.flush()
>     def close(self):
>         self.resource1.release()
>         self.resource2.release()
>     def __close__(self):
>         self.flush()
>         self.close()
>
> x = MyClass2('db1', 'db2')
> y = MyClass2('db3', 'db4')
> x.next = y
> y.next = x
>
> This version will encounter a problem. When the GC sees
> the x <--> y loop it will break it somewhere... without
> loss of generality, let us say it breaks the y -> x link
> by setting y.next to None. Now y will be freed, so
> __close__ will be called. __close__ will invoke self.flush()
> which will then try to invoke self.next.flush(). But
> self.next is None, so we'll get an exception and never
> make it to invoking self.close().

With my algorithm, the following things will happen:

0) I assume that the resources can be flushed() even after having been
released() without causing weird exceptions... Otherwise the code should be
more defensive, and delete the references to the resources after disposal.
1) GC will first call __close__ on either instance (let's say x). This would
close the instance by releasing the resources. x is marked as "already closed".
y.flush() is invoked.
2) GC will then call __close__ on y. This would release y's resources, and
invoke x.flush(). x.flush() would either have no side-effects, or being
defensively coded against resource1/resource2 being None (since the resources
of x have been already disposed at step 1).
3) The loop was not broken, so GC will drop a random reference. Let's say it
breaks the y -> x link. This causes x to be disposed. x is marked as "already
closed" so __close__ is not invoked. During disposal, the reference to y held
in x.next is dropped.
4) y is disposed. It's marked as "already closed" so __close__ is not invoked.



> ------
>
> The other problem I discussed is illustrated by the following
> malicious code:
>
> evil_list = []
>
> class MyEvilClass(object):
>     def __close__(self):
>         evil_list.append(self)
>
> Do the proponents of __close__ propose a way of prohibiting
> this behavior? Or do we continue to include complicated
> logic the GC module to support it? I don't think anyone
> cares how this code behaves so long as it doesn't segfault.

I can see how this can confuse the GC, but I really don't know the details. I
don't have any proposal as how to avoid this situation.

Giovanni Bajo



More information about the Python-3000 mailing list