[Python-checkins] peps: Add PEP 512 by Nathaniel Smith

berker.peksag python-checkins at python.org
Thu Jun 9 05:08:50 EDT 2016


https://hg.python.org/peps/rev/c244d09c7874
changeset:   6359:c244d09c7874
user:        Berker Peksag <berker.peksag at gmail.com>
date:        Thu Jun 09 12:08:48 2016 +0300
summary:
  Add PEP 512 by Nathaniel Smith

files:
  pep-0521.txt |  387 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 387 insertions(+), 0 deletions(-)


diff --git a/pep-0521.txt b/pep-0521.txt
new file mode 100644
--- /dev/null
+++ b/pep-0521.txt
@@ -0,0 +1,387 @@
+PEP: 521
+Title: Managing global context via 'with' blocks in generators and coroutines
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nathaniel J. Smith <njs at pobox.com>
+Status: Deferred
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 27-Apr-2015
+Python-Version: 3.6
+Post-History: 29-Apr-2015
+
+
+Abstract
+========
+
+While we generally try to avoid global state when possible, there
+nonetheless exist a number of situations where it is agreed to be the
+best approach.  In Python, the standard way of handling such cases is
+to store the global state in global or thread-local storage, and then
+use ``with`` blocks to limit modifications of this global state to a
+single dynamic scope. Examples where this pattern is used include the
+standard library's ``warnings.catch_warnings`` and
+``decimal.localcontext``, NumPy's ``numpy.errstate`` (which exposes
+the error-handling settings provided by the IEEE 754 floating point
+standard), and the handling of logging context or HTTP request context
+in many server application frameworks.
+
+However, there is currently no ergonomic way to manage such local
+changes to global state when writing a generator or coroutine. For
+example, this code::
+
+  def f():
+      with warnings.catch_warnings():
+          for x in g():
+              yield x
+
+may or may not successfully catch warnings raised by ``g()``, and may
+or may not inadverdantly swallow warnings triggered elsewhere in the
+code.  The context manager, which was intended to apply only to ``f``
+and its callees, ends up having a dynamic scope that encompasses
+arbitrary and unpredictable parts of its call\ **ers**. This problem
+becomes particularly acute when writing asynchronous code, where
+essentially all functions become coroutines.
+
+Here, we propose to solve this problem by notifying context managers
+whenever execution is suspended or resumed within their scope,
+allowing them to restrict their effects appropriately.
+
+
+Specification
+=============
+
+Two new, optional, methods are added to the context manager protocol:
+``__suspend__`` and ``__resume__``.  If present, these methods will be
+called whenever a frame's execution is suspended or resumed from
+within the context of the ``with`` block.
+
+More formally, consider the following code::
+
+  with EXPR as VAR:
+      PARTIAL-BLOCK-1
+      f((yield foo))
+      PARTIAL-BLOCK-2
+
+Currently this is equivalent to the following code copied from PEP 343::
+
+  mgr = (EXPR)
+  exit = type(mgr).__exit__  # Not calling it yet
+  value = type(mgr).__enter__(mgr)
+  exc = True
+  try:
+      try:
+          VAR = value  # Only if "as VAR" is present
+          PARTIAL-BLOCK-1
+          f((yield foo))
+          PARTIAL-BLOCK-2
+      except:
+          exc = False
+          if not exit(mgr, *sys.exc_info()):
+              raise
+  finally:
+      if exc:
+          exit(mgr, None, None, None)
+
+This PEP proposes to modify ``with`` block handling to instead become::
+
+  mgr = (EXPR)
+  exit = type(mgr).__exit__  # Not calling it yet
+  ### --- NEW STUFF ---
+  if the_block_contains_yield_points:  # known statically at compile time
+      suspend = getattr(type(mgr), "__suspend__", lambda: None)
+      resume = getattr(type(mgr), "__resume__", lambda: None)
+  ### --- END OF NEW STUFF ---
+  value = type(mgr).__enter__(mgr)
+  exc = True
+  try:
+      try:
+          VAR = value  # Only if "as VAR" is present
+          PARTIAL-BLOCK-1
+          ### --- NEW STUFF ---
+          suspend()
+          tmp = yield foo
+          resume()
+          f(tmp)
+          ### --- END OF NEW STUFF ---
+          PARTIAL-BLOCK-2
+      except:
+          exc = False
+          if not exit(mgr, *sys.exc_info()):
+              raise
+  finally:
+      if exc:
+          exit(mgr, None, None, None)
+
+Analogous suspend/resume calls are also wrapped around the ``yield``
+points embedded inside the ``yield from``, ``await``, ``async with``,
+and ``async for`` constructs.
+
+
+Nested blocks
+-------------
+
+Given this code::
+
+  def f():
+      with OUTER:
+          with INNER:
+              yield VALUE
+
+then we perform the following operations in the following sequence::
+
+  INNER.__suspend__()
+  OUTER.__suspend__()
+  yield VALUE
+  OUTER.__resume__()
+  INNER.__resume__()
+
+Note that this ensures that the following is a valid refactoring::
+
+  def f():
+      with OUTER:
+          yield from g()
+
+  def g():
+      with INNER
+          yield VALUE
+
+Similarly, ``with`` statements with multiple context managers suspend
+from right to left, and resume from left to right.
+
+
+Other changes
+-------------
+
+``__suspend__`` and ``__resume__`` methods are added to
+``warnings.catch_warnings`` and ``decimal.localcontext``.
+
+
+Rationale
+=========
+
+In the abstract, we gave an example of plausible but incorrect code::
+
+  def f():
+      with warnings.catch_warnings():
+          for x in g():
+              yield x
+
+To make this correct in current Python, we need to instead write
+something like::
+
+  def f():
+      with warnings.catch_warnings():
+          it = iter(g())
+      while True:
+          with warnings.catch_warnings():
+              try:
+                  x = next(it)
+              except StopIteration:
+                  break
+          yield x
+
+OTOH, if this PEP is accepted then the original code will become
+correct as-is.  Or if this isn't convincing, then here's another
+example of broken code; fixing it requires even greater gyrations, and
+these are left as an exercise for the reader::
+
+  def f2():
+      with warnings.catch_warnings(record=True) as w:
+          for x in g():
+              yield x
+      assert len(w) == 1
+      assert "xyzzy" in w[0].message
+
+And notice that this last example isn't artificial at all -- if you
+squint, it turns out to be exactly how you write a test that an
+asyncio-using coroutine ``g`` correctly raises a warning.  Similar
+issues arise for pretty much any use of ``warnings.catch_warnings``,
+``decimal.localcontext``, or ``numpy.errstate`` in asyncio-using code.
+So there's clearly a real problem to solve here, and the growing
+prominence of async code makes it increasingly urgent.
+
+
+Alternative approaches
+----------------------
+
+The main alternative that has been proposed is to create some kind of
+"task-local storage", analogous to "thread-local storage"
+[#yury-task-local-proposal]_. In essence, the idea would be that the
+event loop would take care to allocate a new "task namespace" for each
+task it schedules, and provide an API to at any given time fetch the
+namespace corresponding to the currently executing task.  While there
+are many details to be worked out [#task-local-challenges]_, the basic
+idea seems doable, and it is an especially natural way to handle the
+kind of global context that arises at the top-level of async
+application frameworks (e.g., setting up context objects in a web
+framework).  But it also has a number of flaws:
+
+* It only solves the problem of managing global state for coroutines
+  that ``yield`` back to an asynchronous event loop.  But there
+  actually isn't anything about this problem that's specific to
+  asyncio -- as shown in the examples above, simple generators run
+  into exactly the same issue.
+
+* It creates an unnecessary coupling between event loops and code that
+  needs to manage global state. Obviously an async web framework needs
+  to interact with some event loop API anyway, so it's not a big deal
+  in that case. But it's weird that ``warnings`` or ``decimal`` or
+  NumPy should have to call into an async library's API to access
+  their internal state when they themselves involve no async code.
+  Worse, since there are multiple event loop APIs in common use, it
+  isn't clear how to choose which to integrate with.  (This could be
+  somewhat mitigated by CPython providing a standard API for creating
+  and switching "task-local domains" that asyncio, Twisted, tornado,
+  etc. could then work with.)
+
+* It's not at all clear that this can be made acceptably fast.  NumPy
+  has to check the floating point error settings on every single
+  arithmetic operation.  Checking a piece of data in thread-local
+  storage is absurdly quick, because modern platforms have put massive
+  resources into optimizing this case (e.g. dedicating a CPU register
+  for this purpose); calling a method on an event loop to fetch a
+  handle to a namespace and then doing lookup in that namespace is
+  much slower.
+
+  More importantly, this extra cost would be paid on *every* access to
+  the global data, even for programs which are not otherwise using an
+  event loop at all.  This PEP's proposal, by contrast, only affects
+  code that actually mixes ``with`` blocks and ``yield`` statements,
+  meaning that the users who experience the costs are the same users
+  who also reap the benefits.
+
+On the other hand, such tight integration between task context and the
+event loop does potentially allow other features that are beyond the
+scope of the current proposal.  For example, an event loop could note
+which task namespace was in effect when a task called ``call_soon``,
+and arrange that the callback when run would have access to the same
+task namespace.  Whether this is useful, or even well-defined in the
+case of cross-thread calls (what does it mean to have task-local
+storage accessed from two threads simultaneously?), is left as a
+puzzle for event loop implementors to ponder -- nothing in this
+proposal rules out such enhancements as well.  It does seem though
+that such features would be useful primarily for state that already
+has a tight integration with the event loop -- while we might want a
+request id to be preserved across ``call_soon``, most people would not
+expect::
+
+  with warnings.catch_warnings():
+      loop.call_soon(f)
+
+to result in ``f`` being run with warnings disabled, which would be
+the result if ``call_soon`` preserved global context in general.
+
+
+Backwards compatibility
+=======================
+
+Because ``__suspend__`` and ``__resume__`` are optional and default to
+no-ops, all existing context managers continue to work exactly as
+before.
+
+Speed-wise, this proposal adds additional overhead when entering a
+``with`` block (where we must now check for the additional methods;
+failed attribute lookup in CPython is rather slow, since it involves
+allocating an ``AttributeError``), and additional overhead at
+suspension points.  Since the position of ``with`` blocks and
+suspension points is known statically, the compiler can
+straightforwardly optimize away this overhead in all cases except
+where one actually has a ``yield`` inside a ``with``.
+
+
+Interaction with PEP 492
+========================
+
+PEP 492 added new asynchronous context managers, which are like
+regular context managers but instead of having regular methods
+``__enter__`` and ``__exit__`` they have coroutine methods
+``__aenter__`` and ``__aexit__``.
+
+There are a few options for how to handle these:
+
+1) Add ``__asuspend__`` and ``__aresume__`` coroutine methods.
+
+   One potential difficulty here is that this would add a complication
+   to an already complicated part of the bytecode
+   interpreter. Consider code like::
+
+     async def f():
+         async with MGR:
+             await g()
+
+     @types.coroutine
+     def g():
+          yield 1
+
+   In 3.5, ``f`` gets desugared to something like::
+
+     @types.coroutine
+     def f():
+         yield from MGR.__aenter__()
+         try:
+             yield from g()
+         finally:
+             yield from MGR.__aexit__()
+
+   With the addition of ``__asuspend__`` / ``__aresume__``, the
+   ``yield from`` would have to replaced by something like::
+
+         for SUBVALUE in g():
+             yield from MGR.__asuspend__()
+             yield SUBVALUE
+             yield from MGR.__aresume__()
+
+   Notice that we've had to introduce a new temporary ``SUBVALUE`` to
+   hold the value yielded from ``g()`` while we yield from
+   ``MGR.__asuspend__()``. Where does this temporary go? Currently
+   ``yield from`` is a single bytecode that doesn't modify the stack
+   while looping. Also, the above code isn't even complete, because it
+   skips over the issue of how to direct ``send``/``throw`` calls to
+   the right place at the right time...
+
+2) Add plain ``__suspend__`` and ``__resume__`` methods.
+
+3) Leave async context managers alone for now until we have more
+   experience with them.
+
+It isn't entirely clear what use cases even exist in which an async
+context manager would need to set coroutine-local-state (= like
+thread-local-state, but for a coroutine stack instead of an OS
+thread), and couldn't do so via coordination with the coroutine
+runner. So this draft tentatively goes with option (3) and punts on
+this question until later.
+
+
+References
+==========
+
+.. [#yury-task-local-proposal] https://groups.google.com/forum/#!topic/python-tulip/zix5HQxtElg
+   https://github.com/python/asyncio/issues/165
+
+.. [#task-local-challenges] For example, we would have to decide
+   whether there is a single task-local namespace shared by all users
+   (in which case we need a way for multiple third-party libraries to
+   adjudicate access to this namespace), or else if there are multiple
+   task-local namespaces, then we need some mechanism for each library
+   to arrange for their task-local namespaces to be created and
+   destroyed at appropriate moments.  The preliminary patch linked
+   from the github issue above doesn't seem to provide any mechanism
+   for such lifecycle management.
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+

+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:

-- 
Repository URL: https://hg.python.org/peps


More information about the Python-checkins mailing list