[Python-Dev] Fun with ExitStack
Martin Panter
vadmium+py at gmail.com
Mon Jul 18 21:57:50 EDT 2016
On 18 July 2016 at 23:40, Barry Warsaw <barry at python.org> wrote:
> I was trying to debug a problem in some work code and I ran into some
> interesting oddities with contextlib.ExitStack and other context managers in
> Python 3.5.
>
> This program creates a temporary directory, and I wanted to give it a --keep
> flag to not automatically delete the tempdir at program exit. I was using an
> ExitStack to manage a bunch of resources, and the temporary directory is the
> first thing pushed into the ExitStack. At that point in the program, I check
> the value of --keep and if it's set, I use ExitStack.pop_all() to clear the
> stack, and thus, presumably, prevent the temporary directory from being
> deleted.
>
> There's this relevant quote in the contextlib documentation:
>
> """
> Each instance [of an ExitStack] maintains a stack of registered callbacks that
> are called in reverse order when the instance is closed (either explicitly or
> implicitly at the end of a with statement). Note that callbacks are not
> invoked implicitly when the context stack instance is garbage collected.
> """
>
> However if I didn't save the reference to the pop_all'd ExitStack, the tempdir
> would be immediately deleted. If I did save a reference to the pop_all'd
> ExitStack, the tempdir would live until the saved reference went out of scope
> and got refcounted away.
>
> As best I can tell this happens because TemporaryDirectory.__init__() creates
> a weakref finalizer which ends up calling the _cleanup() function. Although
> it's rather difficult to trace, it does appear that when the ExitStack is
> gc'd, this finalizer gets triggered (via weakref.detach()), thus causing the
> _cleanup() method to be called and the tmpdir to get deleted.
This seems to be missing from the documentation, but when you delete a
TemporaryDirectory instance without using a context manager or
cleanup(), it complains, and then cleans up anyway:
>>> d = TemporaryDirectory()
>>> del d
/usr/lib/python3.5/tempfile.py:797: ResourceWarning: Implicitly
cleaning up <TemporaryDirectory '/tmp/tmpfzf4g96l'>
_warnings.warn(warn_message, ResourceWarning)
Perhaps you will have to use the lower-level mkdtemp() function
instead if you want the option of making the “temporary” directory
long-lived.
> I "fix" this by
> doing:
>
> def __init__(self):
> tmpdir = TemporaryDirectory()
> self._tmpdir = (tmpdir.name if keep
> else self.resources.enter_context(tmpdir))
>
> There must be more to the story because when __init__() exits in the --keep
> case, tmpdir should have gotten refcounted away and the directory deleted, but
> it doesn't. I haven't dug down deep enough to figure that out.
>
> Now, while I was debugging that behavior, I ran across more interesting bits.
> I put this in a file to drive some tests:
>
> ------snip snip-----
> with ExitStack() as resources:
> print('enter context')
> tmpdir = resources.enter_context(X())
> resources.pop_all()
> print('exit context')
> ------snip snip-----
>
> Let's say X is:
>
> class X:
> def __enter__(self):
> print('enter Foo')
> return self
>
> def __exit__(self, *args, **kws):
> print('exit Foo')
> return False
>
> the output is:
>
> enter context
> enter Foo
> exit context
>
> So far so good. A fairly standard context manager class doesn't get its
> __exit__() called even when the program exits. Let's try this:
>
> @contextmanager
> def X():
> print('enter bar')
> yield
> print('exit bar')
>
> still good:
>
> enter context
> enter bar
> exit context
>
> Let's modify X a little bit to be a more common idiom:
>
> @contextmanager
> def X():
> print('enter foo')
> try:
> yield
> finally:
> print('exit foo')
>
> enter context
> enter foo
> exit foo
> exit context
>
> Ah, the try-finally changes the behavior! There's probably some documentation
> somewhere that defines how a generator gets finalized, and that triggers the
> finally clause, whereas in the previous example, nothing after the yield gets
> run. I just can't find that anything that would describe the observed
> behavior.
I suspect the documentation doesn’t spell everything out, but my
understanding is that garbage collection of a generator instance
effectively calls its close() method, triggering any “finally” and
__exit__() handlers.
IMO, in some cases if a generator would execute these handlers and
gets garbage collected, it is a programming error because the
programmer should have explicitly called generator.close(). In these
cases, it would be nice to emit a ResourceWarning, just like
forgetting to close a file, or delete your temporay directory above.
But maybe there are other cases where there is no valid reason to emit
a warning. I have hesitated in suggesting this change in the past, but
I don’t remember why. One reason is it might be an annoyance with code
that also wants to handle non-generator iterators that don’t have a
close() method. In a world before “yield from”, imagine having to
change:
# Rough equivalent of “yield from g(...)”
for item in g(...):
yield item
to:
with contextlib.closing(g(...)) as subgen:
for item in subgen:
yield item
It would be even worse if you had to support g() returning some other
iterator not supporting close().
> It's all very twisty, and I'm not sure Python is doing anything wrong, but I'm
> also not sure it's *not* doing anything wrong. ;)
>
> In any case, the contextlib documentation quoted above should probably be more
> liberally sprinkled with salty caveats. Just calling .pop_all() isn't
> necessarily enough to ensure that resources managed by an ExitStack will
> survive its garbage collection.
More information about the Python-Dev
mailing list