Draft PEP on protecting finally clauses

Hi, I've finally made a PEP. Any feedback is appreciated. -- Paul PEP: XXX Title: Protecting cleanup statements from interruptions Version: $Revision$ Last-Modified: $Date$ Author: Paul Colomiets <paul@colomiets.name> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 06-Apr-2012 Python-Version: 3.3 Abstract ======== This PEP proposes a way to protect python code from being interrupted inside finally statement or context manager. Rationale ========= Python has two nice ways to do cleanup. One is a ``finally`` statement and the other is context manager (or ``with`` statement). Although, neither of them is protected from ``KeyboardInterrupt`` or ``generator.throw()``. For example:: lock.acquire() try: print('starting') do_someting() finally: print('finished') lock.release() If ``KeyboardInterrupt`` occurs just after ``print`` function is executed, lock will not be released. Similarly the following code using ``with`` statement is affected:: from threading import Lock class MyLock: def __init__(self): self._lock_impl = lock def __enter__(self): self._lock_impl.acquire() print("LOCKED") def __exit__(self): print("UNLOCKING") self._lock_impl.release() lock = MyLock() with lock: do_something If ``KeyboardInterrupt`` occurs near any of the ``print`` statements, lock will never be released. Coroutine Use Case ------------------ Similar case occurs with coroutines. Usually coroutine libraries want to interrupt coroutine with a timeout. There is a ``generator.throw()`` method for this use case, but there are no method to know is it currently yielded from inside a ``finally``. Example that uses yield-based coroutines follows. Code looks similar using any of the popular coroutine libraries Monocle [1]_, Bluelet [2]_, or Twisted [3]_. :: def run_locked() yield connection.sendall('LOCK') try: yield do_something() yield do_something_else() finally: yield connection.sendall('UNLOCK') with timeout(5): yield run_locked() In the example above ``yield something`` means pause executing current coroutine and execute coroutine ``something`` until it finished execution. So that library keeps stack of generators itself. The ``connection.sendall`` waits until socket is writable and does thing similar to what ``socket.sendall`` does. The ``with`` statement ensures that all that code is executed within 5 seconds timeout. It does so by registering a callback in main loop, which calls ``generator.throw()`` to the top-most frame in the coroutine stack when timeout happens. The ``greenlets`` extension works in similar way, except it doesn't need ``yield`` to enter new stack frame. Otherwise considerations are similar. Specification ============= Frame Flag 'f_in_cleanup' ------------------------- A new flag on frame object is proposed. It is set to ``True`` if this frame is currently in the ``finally`` suite. Internally it must be implemented as a counter of nested finally statements currently executed. The internal counter is also incremented when entering ``WITH_SETUP`` bytecode and ``WITH_CLEANUP`` bytecode, and is decremented when leaving that bytecode. This allows to protect ``__enter__`` and ``__exit__`` methods too. Function 'sys.setcleanuphook' ----------------------------- A new function for the ``sys`` module is proposed. This function sets a callback which is executed every time ``f_in_cleanup`` becomes ``False``. Callbacks gets ``frame`` as it's sole argument so it can get some evindence where it is called from. The setting is thread local and is stored inside ``PyThreadState`` structure. Inspect Module Enhancements --------------------------- Two new functions are proposed for ``inspect`` module: ``isframeincleanup`` and ``getcleanupframe``. ``isframeincleanup`` given ``frame`` object or ``generator`` object as sole argument returns the value of ``f_in_cleanup`` attribute of a frame itself or of the ``gi_frame`` attribute of a generator. ``getcleanupframe`` given ``frame`` object as sole argument returns the innermost frame which has true value of ``f_in_cleanup`` or ``None`` if no frames in the stack has the attribute set. It starts to inspect from specified frame and walks to outer frames using ``f_back`` pointers, just like ``getouterframes`` does. Example ======= Example implementation of ``SIGINT`` handler that interrupts safely might look like:: import inspect, sys, functools def sigint_handler(sig, frame) if inspect.getcleanupframe(frame) is None: raise KeyboardInterrupt() sys.setcleanuphook(functools.partial(sigint_handler, 0)) Coroutine example is out of scope of this document, because it's implemention depends very much on a trampoline (or main loop) used by coroutine library. Unresolved Issues ================= Interruption Inside With Statement Expression --------------------------------------------- Given the statement:: with open(filename): do_something() Python can be interrupted after ``open`` is called, but before ``SETUP_WITH`` bytecode is executed. There are two possible decisions: * Protect expression inside ``with`` statement. This would need another bytecode, since currently there is no delimiter at the start of ``with`` expression * Let user write a wrapper if he considers it's important for his use-case. Safe wrapper code might look like the following:: class FileWrapper(object): def __init__(self, filename, mode): self.filename = filename self.mode = mode def __enter__(self): self.file = open(self.filename, self.mode) def __exit__(self): self.file.close() Alternatively it can be written using context manager:: @contextmanager def open_wrapper(filename, mode): file = open(filename, mode) try: yield file finally: file.close() This code is safe, as first part of generator (before yield) is executed inside ``WITH_SETUP`` bytecode of caller Exception Propagation --------------------- Sometimes ``finally`` block or ``__enter__/__exit__`` method can be exited with an exception. Usually it's not a problem, since more important exception like ``KeyboardInterrupt`` or ``SystemExit`` should be thrown instead. But it may be nice to be able to keep original exception inside a ``__context__`` attibute. So cleanup hook signature may grow an exception argument:: def sigint_handler(sig, frame) if inspect.getcleanupframe(frame) is None: raise KeyboardInterrupt() sys.setcleanuphook(retry_sigint) def retry_sigint(frame, exception=None): if inspect.getcleanupframe(frame) is None: raise KeyboardInterrupt() from exception .. note:: No need to have three arguments like in ``__exit__`` method since we have a ``__traceback__`` attribute in exception in Python 3.x Although, this will set ``__cause__`` for the exception, which is not exactly what's intended. So some hidden interpeter logic may be used to put ``__context__`` attribute on every exception raised in cleanup hook. Interruption Between Acquiring Resource and Try Block ----------------------------------------------------- Example from the first section is not totally safe. Let's look closer:: lock.acquire() try: do_something() finally: lock.release() There is no way it can be fixed without modifying the code. The actual fix of this code depends very much on use case. Usually code can be fixed using a ``with`` statement:: with lock: do_something() Although, for coroutines you usually can't use ``with`` statement because you need to ``yield`` for both aquire and release operations. So code might be rewritten as following:: try: yield lock.acquire() do_something() finally: yield lock.release() The actual lock code might need more code to support this use case, but implementation is usually trivial, like check if lock has been acquired and unlock if it is. Setting Interruption Context Inside Finally Itself -------------------------------------------------- Some coroutine libraries may need to set a timeout for the finally clause itself. For example:: try: do_something() finally: with timeout(0.5): try: yield do_slow_cleanup() finally: yield do_fast_cleanup() With current semantics timeout will either protect the whole ``with`` block or nothing at all, depending on the implementation of a library. What the author is intended is to treat ``do_slow_cleanup`` as an ordinary code, and ``do_fast_cleanup`` as a cleanup (non-interruptible one). Similar case might occur when using greenlets or tasklets. This case can be fixed by exposing ``f_in_cleanup`` as a counter, and by calling cleanup hook on each decrement. Corouting library may then remember the value at timeout start, and compare it on each hook execution. But in practice example is considered to be too obscure to take in account. Alternative Python Implementations Support ========================================== We consider ``f_in_cleanup`` and implementation detail. The actual implementation may have some fake frame-like object passed to signal handler, cleanup hook and returned from ``getcleanupframe``. The only requirement is that ``inspect`` module functions work as expected on that objects. For this reason we also allow to pass a ``generator`` object to a ``isframeincleanup`` function, this disables need to use ``gi_frame`` attribute. It may need to be specified that ``getcleanupframe`` must return the same object that will be passed to cleanup hook at next invocation. Alternative Names ================= Original proposal had ``f_in_finally`` flag. The original intention was to protect ``finally`` clauses. But as it grew up to protecting ``__enter__`` and ``__exit__`` methods too, the ``f_in_cleanup`` method seems better. Although ``__enter__`` method is not a cleanup routine, it at least relates to cleanup done by context managers. ``setcleanuphook``, ``isframeincleanup`` and ``getcleanupframe`` can be unobscured to ``set_cleanup_hook``, ``is_frame_in_cleanup`` and ``get_cleanup_frame``, althought they follow convention of their respective modules. Alternative Proposals ===================== Propagating 'f_in_cleanup' Flag Automatically ----------------------------------------------- This can make ``getcleanupframe`` unnecessary. But for yield based coroutines you need to propagate it yourself. Making it writable leads to somewhat unpredictable behavior of ``setcleanuphook`` Add Bytecodes 'INCR_CLEANUP', 'DECR_CLEANUP' -------------------------------------------- These bytecodes can be used to protect expression inside ``with`` statement, as well as making counter increments more explicit and easy to debug (visible inside a disassembly). Some middle ground might be chosen, like ``END_FINALLY`` and ``SETUP_WITH`` imlicitly decrements counter (``END_FINALLY`` is present at end of ``with`` suite). Although, adding new bytecodes must be considered very carefully. Expose 'f_in_cleanup' as a Counter ---------------------------------- The original intention was to expose minimum needed functionality. Although, as we consider frame flag ``f_in_cleanup`` as an implementation detail, we may expose it as a counter. Similarly, if we have a counter we may need to have cleanup hook called on every counter decrement. It's unlikely have much performance impact as nested finally clauses are unlikely common case. Add code object flag 'CO_CLEANUP' --------------------------------- As an alternative to set flag inside ``WITH_SETUP``, and ``WITH_CLEANUP`` bytecodes we can introduce a flag ``CO_CLEANUP``. When interpreter starts to execute code with ``CO_CLEANUP`` set, it sets ``f_in_cleanup`` for the whole function body. This flag is set for code object of ``__enter__`` and ``__exit__`` special methods. Technically it might be set on functions called ``__enter__`` and ``__exit__``. This seems to be less clear solution. It also covers the case where ``__enter__`` and ``__exit__`` are called manually. This may be accepted either as feature or as a unnecessary side-effect (unlikely as a bug). It may also impose a problem when ``__enter__`` or ``__exit__`` function are implemented in C, as there usually no frame to check for ``f_in_cleanup`` flag. Have Cleanup Callback on Frame Object Itself ---------------------------------------------- Frame may be extended to have ``f_cleanup_callback`` which is called when ``f_in_cleanup`` is reset to 0. It would help to register different callbacks to different coroutines. Despite apparent beauty. This solution doesn't add anything. As there are two primary use cases: * Set callback in signal handler. The callback is inherently single one for this case * Use single callback per loop for coroutine use case. And in almost all cases there is only one loop per thread No Cleanup Hook --------------- Original proposal included no cleanup hook specification. As there are few ways to achieve the same using current tools: * Use ``sys.settrace`` and ``f_trace`` callback. It may impose some problem to debugging, and has big performance impact (although, interrupting doesn't happen very often) * Sleep a bit more and try again. For coroutine library it's easy. For signals it may be achieved using ``alert``. Both methods are considered too impractical and a way to catch exit from ``finally`` statement is proposed. References ========== .. [1] Monocle https://github.com/saucelabs/monocle .. [2] Bluelet https://github.com/sampsyo/bluelet .. [3] Twisted: inlineCallbacks http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.defer.html .. [4] Original discussion http://mail.python.org/pipermail/python-ideas/2012-April/014705.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

I've published this PEP as PEP-419: http://www.python.org/dev/peps/pep-0419/ Thank you, Paul. On Sat, Apr 7, 2012 at 12:04 AM, Paul Colomiets <paul@colomiets.name> wrote:
-- Thanks, Andrew Svetlov

I've published this PEP as PEP-419: http://www.python.org/dev/peps/pep-0419/ Thank you, Paul. On Sat, Apr 7, 2012 at 12:04 AM, Paul Colomiets <paul@colomiets.name> wrote:
-- Thanks, Andrew Svetlov
participants (3)
-
Andrew Svetlov
-
Georg Brandl
-
Paul Colomiets