Python-ideas
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
August 2017
- 66 participants
- 34 discussions
Hi,
This is a new PEP to implement Execution Contexts in Python.
The PEP is in-flight to python.org, and in the meanwhile can
be read on GitHub:
https://github.com/python/peps/blob/master/pep-0550.rst
(it contains a few diagrams and charts, so please read it there.)
Thank you!
Yury
PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury(a)magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2017
…
[View More]Python-Version: 3.7
Post-History: 11-Aug-2017
Abstract
========
This PEP proposes a new mechanism to manage execution state--the
logical environment in which a function, a thread, a generator,
or a coroutine executes in.
A few examples of where having a reliable state storage is required:
* Context managers like decimal contexts, ``numpy.errstate``,
and ``warnings.catch_warnings``;
* Storing request-related data such as security tokens and request
data in web applications;
* Profiling, tracing, and logging in complex and large code bases.
The usual solution for storing state is to use a Thread-local Storage
(TLS), implemented in the standard library as ``threading.local()``.
Unfortunately, TLS does not work for isolating state of generators or
asynchronous code because such code shares a single thread.
Rationale
=========
Traditionally a Thread-local Storage (TLS) is used for storing the
state. However, the major flaw of using the TLS is that it works only
for multi-threaded code. It is not possible to reliably contain the
state within a generator or a coroutine. For example, consider
the following generator::
def calculate(precision, ...):
with decimal.localcontext() as ctx:
# Set the precision for decimal calculations
# inside this block
ctx.prec = precision
yield calculate_something()
yield calculate_something_else()
Decimal context is using a TLS to store the state, and because TLS is
not aware of generators, the state can leak. The above code will
not work correctly, if a user iterates over the ``calculate()``
generator with different precisions in parallel::
g1 = calculate(100)
g2 = calculate(50)
items = list(zip(g1, g2))
# items[0] will be a tuple of:
# first value from g1 calculated with 100 precision,
# first value from g2 calculated with 50 precision.
#
# items[1] will be a tuple of:
# second value from g1 calculated with 50 precision,
# second value from g2 calculated with 50 precision.
An even scarier example would be using decimals to represent money
in an async/await application: decimal calculations can suddenly
lose precision in the middle of processing a request. Currently,
bugs like this are extremely hard to find and fix.
Another common need for web applications is to have access to the
current request object, or security context, or, simply, the request
URL for logging or submitting performance tracing data::
async def handle_http_request(request):
context.current_http_request = request
await ...
# Invoke your framework code, render templates,
# make DB queries, etc, and use the global
# 'current_http_request' in that code.
# This isn't currently possible to do reliably
# in asyncio out of the box.
These examples are just a few out of many, where a reliable way to
store context data is absolutely needed.
The inability to use TLS for asynchronous code has lead to
proliferation of ad-hoc solutions, limited to be supported only by
code that was explicitly enabled to work with them.
Current status quo is that any library, including the standard
library, that uses a TLS, will likely not work as expected in
asynchronous code or with generators (see [3]_ as an example issue.)
Some languages that have coroutines or generators recommend to
manually pass a ``context`` object to every function, see [1]_
describing the pattern for Go. This approach, however, has limited
use for Python, where we have a huge ecosystem that was built to work
with a TLS-like context. Moreover, passing the context explicitly
does not work at all for libraries like ``decimal`` or ``numpy``,
which use operator overloading.
.NET runtime, which has support for async/await, has a generic
solution of this problem, called ``ExecutionContext`` (see [2]_).
On the surface, working with it is very similar to working with a TLS,
but the former explicitly supports asynchronous code.
Goals
=====
The goal of this PEP is to provide a more reliable alternative to
``threading.local()``. It should be explicitly designed to work with
Python execution model, equally supporting threads, generators, and
coroutines.
An acceptable solution for Python should meet the following
requirements:
* Transparent support for code executing in threads, coroutines,
and generators with an easy to use API.
* Negligible impact on the performance of the existing code or the
code that will be using the new mechanism.
* Fast C API for packages like ``decimal`` and ``numpy``.
Explicit is still better than implicit, hence the new APIs should only
be used when there is no option to pass the state explicitly.
With this PEP implemented, it should be possible to update a context
manager like the below::
_local = threading.local()
@contextmanager
def context(x):
old_x = getattr(_local, 'x', None)
_local.x = x
try:
yield
finally:
_local.x = old_x
to a more robust version that can be reliably used in generators
and async/await code, with a simple transformation::
@contextmanager
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
Specification
=============
This proposal introduces a new concept called Execution Context (EC),
along with a set of Python APIs and C APIs to interact with it.
EC is implemented using an immutable mapping. Every modification
of the mapping produces a new copy of it. To illustrate what it
means let's compare it to how we work with tuples in Python::
a0 = ()
a1 = a0 + (1,)
a2 = a1 + (2,)
# a0 is an empty tuple
# a1 is (1,)
# a2 is (1, 2)
Manipulating an EC object would be similar::
a0 = EC()
a1 = a0.set('foo', 'bar')
a2 = a1.set('spam', 'ham')
# a0 is an empty mapping
# a1 is {'foo': 'bar'}
# a2 is {'foo': 'bar', 'spam': 'ham'}
In CPython, every thread that can execute Python code has a
corresponding ``PyThreadState`` object. It encapsulates important
runtime information like a pointer to the current frame, and is
being used by the ceval loop extensively. We add a new field to
``PyThreadState``, called ``exec_context``, which points to the
current EC object.
We also introduce a set of APIs to work with Execution Context.
In this section we will only cover two functions that are needed to
explain how Execution Context works. See the full list of new APIs
in the `New APIs`_ section.
* ``sys.get_execution_context_item(key, default=None)``: lookup
``key`` in the EC of the executing thread. If not found,
return ``default``.
* ``sys.set_execution_context_item(key, value)``: get the
current EC of the executing thread. Add a ``key``/``value``
item to it, which will produce a new EC object. Set the
new object as the current one for the executing thread.
In pseudo-code::
tstate = PyThreadState_GET()
ec = tstate.exec_context
ec2 = ec.set(key, value)
tstate.exec_context = ec2
Note, that some important implementation details and optimizations
are omitted here, and will be covered in later sections of this PEP.
Now let's see how Execution Contexts work with regular multi-threaded
code, generators, and coroutines.
Regular & Multithreaded Code
----------------------------
For regular Python code, EC behaves just like a thread-local. Any
modification of the EC object produces a new one, which is immediately
set as the current one for the thread state.
.. figure:: pep-0550/functions.png
:align: center
:width: 90%
Figure 1. Execution Context flow in a thread.
As Figure 1 illustrates, if a function calls
``set_execution_context_item()``, the modification of the execution
context will be visible to all subsequent calls and to the caller::
def set_foo():
set_execution_context_item('foo', 'spam')
set_execution_context_item('foo', 'bar')
print(get_execution_context_item('foo'))
set_foo()
print(get_execution_context_item('foo'))
# will print:
# bar
# spam
Coroutines
----------
Python :pep:`492` coroutines are used to implement cooperative
multitasking. For a Python end-user they are similar to threads,
especially when it comes to sharing resources or modifying
the global state.
An event loop is needed to schedule coroutines. Coroutines that
are explicitly scheduled by the user are usually called Tasks.
When a coroutine is scheduled, it can schedule other coroutines using
an ``await`` expression. In async/await world, awaiting a coroutine
can be viewed as a different calling convention: Tasks are similar to
threads, and awaiting on coroutines within a Task is similar to
calling functions within a thread.
By drawing a parallel between regular multithreaded code and
async/await, it becomes apparent that any modification of the
execution context within one Task should be visible to all coroutines
scheduled within it. Any execution context modifications, however,
must not be visible to other Tasks executing within the same thread.
To achieve this, a small set of modifications to the coroutine object
is needed:
* When a coroutine object is instantiated, it saves a reference to
the current execution context object to its ``cr_execution_context``
attribute.
* Coroutine's ``.send()`` and ``.throw()`` methods are modified as
follows (in pseudo-C)::
if coro->cr_isolated_execution_context:
# Save a reference to the current execution context
old_context = tstate->execution_context
# Set our saved execution context as the current
# for the current thread.
tstate->execution_context = coro->cr_execution_context
try:
# Perform the actual `Coroutine.send()` or
# `Coroutine.throw()` call.
return coro->send(...)
finally:
# Save a reference to the updated execution_context.
# We will need it later, when `.send()` or `.throw()`
# are called again.
coro->cr_execution_context = tstate->execution_context
# Restore thread's execution context to what it was before
# invoking this coroutine.
tstate->execution_context = old_context
else:
# Perform the actual `Coroutine.send()` or
# `Coroutine.throw()` call.
return coro->send(...)
* ``cr_isolated_execution_context`` is a new attribute on coroutine
objects. Set to ``True`` by default, it makes any execution context
modifications performed by coroutine to stay visible only to that
coroutine.
When Python interpreter sees an ``await`` instruction, it flips
``cr_isolated_execution_context`` to ``False`` for the coroutine
that is about to be awaited. This makes any changes to execution
context made by nested coroutine calls within a Task to be visible
throughout the Task.
Because the top-level coroutine (Task) cannot be scheduled with
``await`` (in asyncio you need to call ``loop.create_task()`` or
``asyncio.ensure_future()`` to schedule a Task), all execution
context modifications are guaranteed to stay within the Task.
* We always work with ``tstate->exec_context``. We use
``coro->cr_execution_context`` only to store coroutine's execution
context when it is not executing.
Figure 2 below illustrates how execution context mutations work with
coroutines.
.. figure:: pep-0550/coroutines.png
:align: center
:width: 90%
Figure 2. Execution Context flow in coroutines.
In the above diagram:
* When "coro1" is created, it saves a reference to the current
execution context "2".
* If it makes any change to the context, it will have its own
execution context branch "2.1".
* When it awaits on "coro2", any subsequent changes it does to
the execution context are visible to "coro1", but not outside
of it.
In code::
async def inner_foo():
print('inner_foo:', get_execution_context_item('key'))
set_execution_context_item('key', 2)
async def foo():
print('foo:', get_execution_context_item('key'))
set_execution_context_item('key', 1)
await inner_foo()
print('foo:', get_execution_context_item('key'))
set_execution_context_item('key', 'spam')
print('main:', get_execution_context_item('key'))
asyncio.get_event_loop().run_until_complete(foo())
print('main:', get_execution_context_item('key'))
which will output::
main: spam
foo: spam
inner_foo: 1
foo: 2
main: spam
Generator-based coroutines (generators decorated with
``types.coroutine`` or ``asyncio.coroutine``) behave exactly as
native coroutines with regards to execution context management:
their ``yield from`` expression is semantically equivalent to
``await``.
Generators
----------
Generators in Python, while similar to Coroutines, are used in a
fundamentally different way. They are producers of data, and
they use ``yield`` expression to suspend/resume their execution.
A crucial difference between ``await coro`` and ``yield value`` is
that the former expression guarantees that the ``coro`` will be
executed to the end, while the latter is producing ``value`` and
suspending the generator until it gets iterated again.
Generators share 99% of their implementation with coroutines, and
thus have similar new attributes ``gi_execution_context`` and
``gi_isolated_execution_context``. Similar to coroutines, generators
save a reference to the current execution context when they are
instantiated. The have the same implementation of ``.send()`` and
``.throw()`` methods.
The only difference is that
``gi_isolated_execution_context`` is always set to ``True``, and
is never modified by the interpreter. ``yield from o`` expression in
regular generators that are not decorated with ``types.coroutine``,
is semantically equivalent to ``for v in o: yield v``.
.. figure:: pep-0550/generators.png
:align: center
:width: 90%
Figure 3. Execution Context flow in a generator.
In the above diagram:
* When "gen1" is created, it saves a reference to the current
execution context "2".
* If it makes any change to the context, it will have its own
execution context branch "2.1".
* When "gen2" is created, it saves a reference to the current
execution context for it -- "2.1".
* Any subsequent execution context updated in "gen2" will only
be visible to "gen2".
* Likewise, any context changes that "gen1" will do after it
created "gen2" will not be visible to "gen2".
In code::
def inner_foo():
for i in range(3):
print('inner_foo:', get_execution_context_item('key'))
set_execution_context_item('key', i)
yield i
def foo():
set_execution_context_item('key', 'spam')
print('foo:', get_execution_context_item('key'))
inner = inner_foo()
while True:
val = next(inner, None)
if val is None:
break
yield val
print('foo:', get_execution_context_item('key'))
set_execution_context_item('key', 'spam')
print('main:', get_execution_context_item('key'))
list(foo())
print('main:', get_execution_context_item('key'))
which will output::
main: ham
foo: spam
inner_foo: spam
foo: spam
inner_foo: 0
foo: spam
inner_foo: 1
foo: spam
main: ham
As we see, any modification of the execution context in a generator
is visible only to the generator itself.
There is one use-case where it is desired for generators to affect
the surrounding execution context: ``contextlib.contextmanager``
decorator. To make the following work::
@contextmanager
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
we modified ``contextmanager`` to flip
``gi_isolated_execution_context`` flag to ``False`` on its generator.
Greenlets
---------
Greenlet is an alternative implementation of cooperative
scheduling for Python. Although greenlet package is not part of
CPython, popular frameworks like gevent rely on it, and it is
important that greenlet can be modified to support execution
contexts.
In a nutshell, greenlet design is very similar to design of
generators. The main difference is that for generators, the stack
is managed by the Python interpreter. Greenlet works outside of the
Python interpreter, and manually saves some ``PyThreadState``
fields and pushes/pops the C-stack. Since Execution Context is
implemented on top of ``PyThreadState``, it's easy to add
transparent support of it to greenlet.
New APIs
========
Even though this PEP adds a number of new APIs, please keep in mind,
that most Python users will likely ever use only two of them:
``sys.get_execution_context_item()`` and
``sys.set_execution_context_item()``.
Python
------
1. ``sys.get_execution_context_item(key, default=None)``: lookup
``key`` for the current Execution Context. If not found,
return ``default``.
2. ``sys.set_execution_context_item(key, value)``: set
``key``/``value`` item for the current Execution Context.
If ``value`` is ``None``, the item will be removed.
3. ``sys.get_execution_context()``: return the current Execution
Context object: ``sys.ExecutionContext``.
4. ``sys.set_execution_context(ec)``: set the passed
``sys.ExecutionContext`` instance as a current one for the current
thread.
5. ``sys.ExecutionContext`` object.
Implementation detail: ``sys.ExecutionContext`` wraps a low-level
``PyExecContextData`` object. ``sys.ExecutionContext`` has a
mutable mapping API, abstracting away the real immutable
``PyExecContextData``.
* ``ExecutionContext()``: construct a new, empty, execution
context.
* ``ec.run(func, *args)`` method: run ``func(*args)`` in the
``ec`` execution context.
* ``ec[key]``: lookup ``key`` in ``ec`` context.
* ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``.
* ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and
``ec.copy()`` are similar to that of ``dict`` object.
C API
-----
C API is different from the Python one because it operates directly
on the low-level immutable ``PyExecContextData`` object.
1. New ``PyThreadState->exec_context`` field, pointing to a
``PyExecContextData`` object.
2. ``PyThreadState_SetExecContextItem`` and
``PyThreadState_GetExecContextItem`` similar to
``sys.set_execution_context_item()`` and
``sys.get_execution_context_item()``.
3. ``PyThreadState_GetExecContext``: similar to
``sys.get_execution_context()``. Always returns an
``PyExecContextData`` object. If ``PyThreadState->exec_context``
is ``NULL`` an new and empty one will be created and assigned
to ``PyThreadState->exec_context``.
4. ``PyThreadState_SetExecContext``: similar to
``sys.set_execution_context()``.
5. ``PyExecContext_New``: create a new empty ``PyExecContextData``
object.
6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``.
The exact layout ``PyExecContextData`` is private, which allows
to switch it to a different implementation later. More on that
in the `Implementation Details`_ section.
Modifications in Standard Library
=================================
* ``contextlib.contextmanager`` was updated to flip the new
``gi_isolated_execution_context`` attribute on the generator.
* ``asyncio.events.Handle`` object now captures the current
execution context when it is created, and uses the saved
execution context to run the callback (with
``ExecutionContext.run()`` method.) This makes
``loop.call_soon()`` to run callbacks in the execution context
they were scheduled.
No modifications in ``asyncio.Task`` or ``asyncio.Future`` were
necessary.
Some standard library modules like ``warnings`` and ``decimal``
can be updated to use new execution contexts. This will be considered
in separate issues if this PEP is accepted.
Backwards Compatibility
=======================
This proposal preserves 100% backwards compatibility.
Performance
===========
Implementation Details
----------------------
The new ``PyExecContextData`` object is wrapping a ``dict`` object.
Any modification requires creating a shallow copy of the dict.
While working on the reference implementation of this PEP, we were
able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for
details.
.. figure:: pep-0550/dict_copy.png
:align: center
:width: 100%
Figure 4.
Figure 4 shows that the performance of immutable dict implemented
with shallow copying is expectedly O(n) for the ``set()`` operation.
However, this is tolerable until dict has more than 100 items
(1 ``set()`` takes about a microsecond.)
Judging by the number of modules that need EC in Standard Library
it is likely that real world Python applications will use
significantly less than 100 execution context variables.
The important point is that the cost of accessing a key in
Execution Context is always O(1).
If the ``set()`` operation performance is a major concern, we discuss
alternative approaches that have O(1) or close ``set()`` performance
in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and
`Copy-on-write Execution Context`_ sections.
Generators and Coroutines
-------------------------
Using a microbenchmark for generators and coroutines from :pep:`492`
([12]_), it was possible to observe 0.5 to 1% performance degradation.
asyncio echoserver microbechmarks from the uvloop project [13]_
showed 1-1.5% performance degradation for asyncio code.
asyncpg benchmarks [14]_, that execute more code and are closer to a
real-world application did not exhibit any noticeable performance
change.
Overall Performance Impact
--------------------------
The total number of changed lines in the ceval loop is 2 -- in the
``YIELD_FROM`` opcode implementation. Only performance of generators
and coroutines can be affected by the proposal.
This was confirmed by running Python Performance Benchmark Suite
[15]_, which demonstrated that there is no difference between
3.7 master branch and this PEP reference implementation branch
(full benchmark results can be found here [16]_.)
Design Considerations
=====================
Alternative Immutable Dict Implementation
-----------------------------------------
Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
to implement high performance immutable collections [5]_, [6]_.
Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
performance for both ``set()`` and ``get()`` operations, which will
be essentially O(1) for relatively small mappings in EC.
To assess if HAMT can be used for Execution Context, we implemented
it in CPython [7]_.
.. figure:: pep-0550/hamt_vs_dict.png
:align: center
:width: 100%
Figure 5. Benchmark code can be found here: [9]_.
Figure 5 shows that HAMT indeed displays O(1) performance for all
benchmarked dictionary sizes. For dictionaries with less than 100
items, HAMT is a bit slower than Python dict/shallow copy.
.. figure:: pep-0550/lookup_hamt.png
:align: center
:width: 100%
Figure 6. Benchmark code can be found here: [10]_.
Figure 6 below shows comparison of lookup costs between Python dict
and an HAMT immutable mapping. HAMT lookup time is 30-40% worse
than Python dict lookups on average, which is a very good result,
considering how well Python dicts are optimized.
Note, that according to [8]_, HAMT design can be further improved.
The bottom line is that the current approach with implementing
an immutable mapping with shallow-copying dict will likely perform
adequately in real-life applications. The HAMT solution is more
future proof, however.
The proposed API is designed in such a way that the underlying
implementation of the mapping can be changed completely without
affecting the Execution Context `Specification`_, which allows
us to switch to HAMT at some point if necessary.
Copy-on-write Execution Context
-------------------------------
The implementation of Execution Context in .NET is different from
this PEP. .NET uses copy-on-write mechanism and a regular mutable
mapping.
One way to implement this in CPython would be to have two new
fields in ``PyThreadState``:
* ``exec_context`` pointing to the current Execution Context mapping;
* ``exec_context_copy_on_write`` flag, set to ``0`` initially.
The idea is that whenever we are modifying the EC, the copy-on-write
flag is checked, and if it is set to ``1``, the EC is copied.
Modifications to Coroutine and Generator ``.send()`` and ``.throw()``
methods described in the `Coroutines`_ section will be almost the
same, except that in addition to the ``gi_execution_context`` they
will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine
or a generator starts, the flag will be set to ``1``. This will
ensure that any modification of the EC performed within a coroutine
or a generator will be isolated.
This approach has one advantage:
* For Execution Context that contains a large number of items,
copy-on-write is a more efficient solution than the shallow-copy
dict approach.
However, we believe that copy-on-write disadvantages are more
important to consider:
* Copy-on-write behaviour for generators and coroutines makes
EC semantics less predictable.
With immutable EC approach, generators and coroutines always
execute in the EC that was current at the moment of their
creation. Any modifications to the outer EC while a generator
or a coroutine is executing are not visible to them::
def generator():
yield 1
print(get_execution_context_item('key'))
yield 2
set_execution_context_item('key', 'spam')
gen = iter(generator())
next(gen)
set_execution_context_item('key', 'ham')
next(gen)
The above script will always print 'spam' with immutable EC.
With a copy-on-write approach, the above script will print 'ham'.
Now, consider that ``generator()`` was refactored to call some
library function, that uses Execution Context::
def generator():
yield 1
some_function_that_uses_decimal_context()
print(get_execution_context_item('key'))
yield 2
Now, the script will print 'spam', because
``some_function_that_uses_decimal_context`` forced the EC to copy,
and ``set_execution_context_item('key', 'ham')`` line did not
affect the ``generator()`` code after all.
* Similarly to the previous point, ``sys.ExecutionContext.run()``
method will also become less predictable, as
``sys.get_execution_context()`` would still return a reference to
the current mutable EC.
We can't modify ``sys.get_execution_context()`` to return a shallow
copy of the current EC, because this would seriously harm
performance of ``asyncio.call_soon()`` and similar places, where
it is important to propagate the Execution Context.
* Even though copy-on-write requires to shallow copy the execution
context object less frequently, copying will still take place
in coroutines and generators. In which case, HAMT approach will
perform better for medium to large sized execution contexts.
All in all, we believe that the copy-on-write approach introduces
very subtle corner cases that could lead to bugs that are
exceptionally hard to discover and fix.
The immutable EC solution in comparison is always predictable and
easy to reason about. Therefore we believe that any slight
performance gain that the copy-on-write solution might offer is not
worth it.
Faster C API
------------
Packages like numpy and standard library modules like decimal need
to frequently query the global state for some local context
configuration. It is important that the APIs that they use is as
fast as possible.
The proposed ``PyThreadState_SetExecContextItem`` and
``PyThreadState_GetExecContextItem`` functions need to get the
current thread state with ``PyThreadState_GET()`` (fast) and then
perform a hash lookup (relatively slow). We can eliminate the hash
lookup by adding three additional C API functions:
* ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``:
a function similar to the existing ``_PyEval_RequestCodeExtraIndex``
introduced :pep:`523`. The idea is to request a unique index
that can later be used to lookup context items.
The ``key_name`` can later be used by ``sys.ExecutionContext`` to
introspect items added with this API.
* ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)``
and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)``
to request an item by its index, avoiding the cost of hash lookup.
Why setting a key to None removes the item?
-------------------------------------------
Consider a context manager::
@contextmanager
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
With ``set_execution_context_item(key, None)`` call removing the
``key``, the user doesn't need to write additional code to remove
the ``key`` if it wasn't in the execution context already.
An alternative design with ``del_execution_context_item()`` method
would look like the following::
@contextmanager
def context(x):
not_there = object()
old_x = get_execution_context_item('x', not_there)
set_execution_context_item('x', x)
try:
yield
finally:
if old_x is not_there:
del_execution_context_item('x')
else:
set_execution_context_item('x', old_x)
Can we fix ``PyThreadState_GetDict()``?
---------------------------------------
``PyThreadState_GetDict`` is a TLS, and some of its existing users
might depend on it being just a TLS. Changing its behaviour to follow
the Execution Context semantics would break backwards compatibility.
PEP 521
-------
:pep:`521` proposes an alternative solution to the problem:
enhance Context Manager Protocol with two new methods: ``__suspend__``
and ``__resume__``. To make it compatible with async/await,
the Asynchronous Context Manager Protocol will also need to be
extended with ``__asuspend__`` and ``__aresume__``.
This allows to implement context managers like decimal context and
``numpy.errstate`` for generators and coroutines.
The following code::
class Context:
def __enter__(self):
self.old_x = get_execution_context_item('x')
set_execution_context_item('x', 'something')
def __exit__(self, *err):
set_execution_context_item('x', self.old_x)
would become this::
class Context:
def __enter__(self):
self.old_x = get_execution_context_item('x')
set_execution_context_item('x', 'something')
def __suspend__(self):
set_execution_context_item('x', self.old_x)
def __resume__(self):
set_execution_context_item('x', 'something')
def __exit__(self, *err):
set_execution_context_item('x', self.old_x)
Besides complicating the protocol, the implementation will likely
negatively impact performance of coroutines, generators, and any code
that uses context managers, and will notably complicate the
interpreter implementation. It also does not solve the leaking state
problem for greenlet/gevent.
:pep:`521` also does not provide any mechanism to propagate state
in a local context, like storing a request object in an HTTP request
handler to have better logging.
Can Execution Context be implemented outside of CPython?
--------------------------------------------------------
Because async/await code needs an event loop to run it, an EC-like
solution can be implemented in a limited way for coroutines.
Generators, on the other hand, do not have an event loop or
trampoline, making it impossible to intercept their ``yield`` points
outside of the Python interpreter.
Reference Implementation
========================
The reference implementation can be found here: [11]_.
References
==========
.. [1] https://blog.golang.org/context
.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.…
.. [3] https://github.com/numpy/numpy/issues/9444
.. [4] http://bugs.python.org/issue31179
.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashma…
.. [7] https://github.com/1st1/cpython/tree/hamt
.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
.. [11] https://github.com/1st1/cpython/tree/pep550
.. [12] https://www.python.org/dev/peps/pep-0492/#async-await
.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.…
.. [14] https://github.com/MagicStack/pgbench
.. [15] https://github.com/python/performance
.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
Copyright
=========
This document has been placed in the public domain.
[View Less]
11
44
[replying to the list]
On Sun, Aug 13, 2017 at 6:14 AM, Nick Coghlan <ncoghlan(a)gmail.com> wrote:
> On 13 August 2017 at 16:01, Yury Selivanov <yselivanov.ml(a)gmail.com> wrote:
>> On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan <ncoghlan(a)gmail.com> wrote:
>> [..]
>>> As Nathaniel suggestion, getting/setting/deleting individual items in
>>> the current context would be implemented as methods on the ContextItem
>>> objects, allowing …
[View More]the return value of "get_context_items" to be a
>>> plain dictionary, rather than a special type that directly supported
>>> updates to the underlying context.
>>
>> The current PEP 550 design returns a "snapshot" of the current EC with
>> sys.get_execution_context().
>>
>> I.e. if you do
>>
>> ec = sys.get_execution_context()
>> ec['a'] = 'b'
>>
>> # sys.get_execution_context_item('a') will return None
>>
>> You did get a snapshot and you modified it -- but your modifications
>> are not visible anywhere. You can run a function in that modified EC
>> with `ec.run(function)` and that function will see that new 'a' key,
>> but that's it. There's no "magical" updates to the underlying context.
>
> In that case, I think "get_execution_context()" is quite misleading as
> a name, and is going to be prone to exactly the confusion we currently
> have with the mapping returned by locals(), which is that regardless
> of whether writes to it affect the target namespace or not, it's going
> to be surprising in at least some situations.
>
> So despite being initially in favour of exposing a mapping-like API at
> the Python level, I'm now coming around to Armin Ronacher's point of
> view: the copy-on-write semantics for the active context are
> sufficiently different from any other mapping type in Python that we
> should just avoid the use of __setitem__ and __delitem__ as syntactic
> sugar entirely.
I agree. I'll be redesigning the PEP to use the following API (please
ignore the naming peculiarities, there are so many proposals at this
point that I'll just stick to something I have in my head):
1. sys.new_execution_context_key('description') -> sys.ContextItem (or
maybe we should just expose the sys.ContextItem type and let people
instantiate it?)
A key (or "token") to use with the execution context. Besides
eliminating the names collision issue, it'll also have a slightly
better performance, because its __hash__ method will always return a
constant. (Strings cache their __hash__, but other types don't).
2. ContextItem.has(), ContextItem.get(), ContextItem.set(),
ContextItem.delete() -- pretty self-explanatory.
3. sys.get_active_context() -> sys.ExecutionContext -- an immutable
object, has no methods to modify the context.
3a. sys.ExecutionContext.run(callable, *args) -- run a callable(*args)
in some execution context.
3b. sys.ExecutionContext.items() -- an iterator of ContextItem ->
value for introspection and debugging purposes.
4. No sys.set_execution_context() method. At this point I'm not sure
it's a good idea to allow users to change the current execution
context to something else entirely. For use cases like enabling
concurrent.futures to run your function within the current EC, you
just use the sys.get_active_context()/ExecutionContext.run
combination. If anything, we can add this function later.
> Instead, we'd lay out the essential primitive operations that *only*
> the interpreter can provide and define procedural interfaces for
> those, and if anyone wanted to build a higher level object-oriented
> interface on top of those primitives, they'd be free to do so, with
> the procedural API acting as the abstraction layer that decouples "how
> interpreters actually implement it" (e.g. copy-on-write mappings) from
> "how libraries and frameworks model it for their own use" (e.g. rich
> application context objects). That way, each interpreter would also be
> free to define their *internal* object model in whichever way made the
> most sense for them, rather than enshrining a point-in-time snaphot of
> CPython's preferred implementation model as part of the language
> definition.
I agree. I like that this idea gives us more flexibility with the
exact implementation strategy.
[..]
> The essential capabilities for active context manipulation would then be:
>
> - get_active_context_token()
> - set_active_context(context_token)
As I mentioned above, at this point I'm not entirely sure that we even
need "set_active_context". The only useful thing for it that I can
imagine is creating a decorator that isolates any changes of the
context, but the only usecase for this I see is unittests.
But even for unittests, a better solution is to use a decorator that
detects keys that were added but not deleted during the test (leaks).
> - implicitly saving and reverting the active context around various operations
Usually we need to save/revert one particular context item, not the
whole context.
> - accessing the active context id for suspended coroutines and
> generators (so parent contexts can opt-in to seeing changes made in
> child contexts)
Yes, this might be useful, let's keep it.
>
> Running commands in a particular context *wouldn't* be a primitive
> operation given those building blocks, since you can implement that
> for yourself using the above primitives:
>
> def run_in_context(target_context_token, func, *args, **kwds):
> old_context_token = get_active_context_token()
> set_active_context(target_context_token)
> try:
> func(*args, **kwds)
> finally:
> set_active_context(old_context_token)
I'd still prefer to implement this as part of the spec. There are
some tricks that I want to use to make ExecutionContext.run() much
faster than a pure Python version. This is a highly performance
critical part of the PEP -- call_soon in asyncio is a VERY frequent
thing.
Besides, having ExecutionContext.run eliminates the need to
sys.set_active_context() -- again, we need to discuss this, but I see
less and less utility for it now.
>
> The public manipulation API here would be deliberately based on opaque
> tokens to make it clear that creating and mutating execution contexts
> is entirely within the realm of the interpreter implementation, and
> user level code can only control *which* execution context is active
> in the current thread, not create arbitrary new execution contexts of
> its own (at least, not without writing a CPython-specific C
> extension).
>
> For manipulation of values within the active context, looking at other
> comparable APIs, I think the main prior art within the language would
> be:
>
> 1. threading.local(), which uses the descriptor protocol to handle
> arbitrary attributes
> 2. Cell variable references in function `__closure__` attributes,
> which also uses the descriptor protocol by way of the "cell_contents"
> attribute
>
> In 3.7, those two examples are being brought closer by way of
> `cell_contents` becoming a read/write attribute:
>
> >>> def f(i):
> ... def g():
> ... nonlocal i
> ... return i
> ... return g
> ...
> >>> g = f(0)
> >>> g()
> 0
> >>> cell = g.__closure__[0]
> >>> cell.cell_contents
> 0
> >>> cell.cell_contents = 5
> >>> g()
> 5
> >>> del cell.cell_contents
> >>> g()
> Traceback (most recent call last):
> ...
> NameError: free variable 'i' referenced before assignment in enclosing scope
> >>> cell.cell_contents = 0
> >>> g()
> 0
>
> This is very similar to the way manipulation of entries within a
> thread local namespace works, but with each cell containing exactly
> one attribute.
>
> For context items, I agree with Nathaniel that the cell-style
> one-value-per-item approach is likely to be the way to go. To
> emphasise that changes to that attribute only affect the *active*
> context, I think "active_value" would be a good name:
>
> >>> request_id =
> sys.create_context_item("my_web_framework.request_id", "Request
> identifier for my_web_framework")
> >>> request_id.active_value
> Traceback (most recent call last):
> ...
> RuntimeError: Context item "my_web_framework.request" not set in
> context <context token>
> >>> request_id.active_value = "12345"
> >>> request_id.active_value
> '12345'
I myself prefer a functional API to to __getattr__. I don't like the
"del local.x" syntax. I don't think we are forced to follow the
threading.local() API here, aren't we?
Yury
>
> Finally, given opaque context tokens, and context items that worked
> like closure cells (only accessing the active context rather than
> lexically scoped variables), the one introspection primitive the
> *interpreter* would need to provide is either:
>
> 1. Given a context token, return a mapping from context items to their
> defined values in the given context
> 2. A way to get a listing of the context items defined in the active context
>
> Since either of those can be defined in terms of the other, my own
> preference goes to the first one, since using it to implement the
> second alternative just requires a simple
> `sys.get_active_context_token()` call, while implementing the first
> one in terms of the second one requires a helper like
> `run_in_context()` above to manipulate the active context in the
> current thread.
>
> The first one also makes it fairly straightforward to *diff* a given
> context against the active one - get the mappings for both contexts,
> check which keys they have in common, compare the values for the
> common keys, and then report on
>
> - keys that appear in one context but not the other
> - values which differ between them for common keys
> - (optionally) values which are the same for common keys
>
> Cheers,
> Nick.
[View Less]
2
2
Thank you for your consideration.
获取 Outlook for Android<https://aka.ms/ghei36>
发件人: python-ideas-request(a)python.org
发送时间: 8月14日星期一 03:14
主题: Python-ideas Digest, Vol 129, Issue 44
收件人: python-ideas(a)python.org
Send Python-ideas mailing list submissions to python-ideas(a)python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/python-ideas or, via email, send a message with subject or body 'help' to python-ideas-request(a)…
[View More]python.org You can reach the person managing the list at python-ideas-owner(a)python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Python-ideas digest..." Today's Topics: 1. How do you think about these language extensions? (?? ?) 2. Re: New PEP 550: Execution Context (Yury Selivanov) 3. Re: New PEP 550: Execution Context (Nathaniel Smith) ---------------------------------------------------------------------- Message: 1 Date: Sun, 13 Aug 2017 12:49:45 +0000 From: ?? ? To: "python-ideas(a)python.org" Subject: [Python-ideas] How do you think about these language extensions? Message-ID: Content-Type: text/plain; charset="iso-2022-jp" Hi all, I've just finished a language extension for CPython 3.6.x to support some additional grammars like Pattern Matching. And It's compatible with CPython. I'm looking for constructive advice, and I wonder if you will be interested in this one. ? the project address is https://github.com/thautwarm/flowpython) [https://avatars1.githubusercontent.com/u/22536460?v=4&s=400] thautwarm/flowpython github.com flowpython - tasty feature extensions for python(python3). Some examples here: # where syntax from math import pi r = 1 # the radius h = 10 # the height S = (2*S_top + S_side) where: S_top = pi*r**2 S_side = C * h where: C = 2*pi*r # lambda&curry : lambda x: lambda y: lambda z: ret where: ret = x+y ret -= z .x -> .y -> .z -> ret where: ret = x+y ret -= z as-with x def as y def as z def ret where: ret = x+y ret -= z # arrow transform (to avoid endless parentheses and try to be more readable. >> range(5) -> map(.x->x+2, _) -> list(_) >> [2,3,4,5,6] # pattern matching # use "condic" as keyword is for avoiding the conflictions against the standard libraries and packages from third party. "switch" and "match" both lead to conflictions. condic+(type) 1: case a:int => assert a == 1 and type(a) == 1 [>] case 0 => assert 1 > 0 [is not] case 1 => assert 1 is not 1 otherwise => print("nothing") condic+() [1,2,3]: case (a,*b)->b:list => sum(b) +[] case [] => print('empty list') +[==] case (a,b):(1,2) => print("the list is [1,2]") The grammars with more details and examples can be found in https://github.com/thautwarm/flowpython/wiki Does it interest you? If so, you can try it if you have CPython 3.6.x. pip install flowpython python -m flowpython -m enable/disable Here is an example to use flowpython, which gives the permutations of a sequence. from copy import deepcopy permutations = .seq -> seq_seq where: condic+[] seq: case (a, ) => seq_seq = [a,] case (a, b) => seq_seq = [[a,b],[b,a]] case (a,*b) => seq_seq = permutations(b) -> map(.x -> insertAll(x, a), _) -> sum(_, []) where: insertAll = . x, a -> ret where: ret = [ deepcopy(x) -> _.insert(i, a) or _ for i in (len(x) -> range(_+1)) ] If the object permutations are defined, try these codes in console: >> range(3) -> permutations(_) >> [[0, 1, 2], [1, 0, 2], [1, 2, 0], [0, 2, 1], [2, 0, 1], [2, 1, 0]] Does it seem to be interesting? Thanks, Thautwarm -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sun, 13 Aug 2017 14:44:24 -0400 From: Yury Selivanov To: Nick Coghlan Cc: Nathaniel Smith , "python-ideas(a)python.org" Subject: Re: [Python-ideas] New PEP 550: Execution Context Message-ID: Content-Type: text/plain; charset="UTF-8" I'll start a new thread to discuss is we want this specific semantics change soon (with some updates). Yury ------------------------------ Message: 3 Date: Sun, 13 Aug 2017 12:14:07 -0700 From: Nathaniel Smith To: Yury Selivanov Cc: Nick Coghlan , Python-Ideas Subject: Re: [Python-ideas] New PEP 550: Execution Context Message-ID: Content-Type: text/plain; charset="UTF-8" On Sun, Aug 13, 2017 at 9:57 AM, Yury Selivanov wrote: > 2. ContextItem.has(), ContextItem.get(), ContextItem.set(), > ContextItem.delete() -- pretty self-explanatory. It might make sense to simplify even further and declare that context items are initialized to None to start, and the only operations are set() and get(). And then get() can't fail, b/c there is no "value missing" state. -n -- Nathaniel J. Smith -- https://vorpus.org ------------------------------ Subject: Digest Footer _______________________________________________ Python-ideas mailing list Python-ideas(a)python.org https://mail.python.org/mailman/listinfo/python-ideas ------------------------------ End of Python-ideas Digest, Vol 129, Issue 44 *********************************************
[View Less]
1
0
Yury Selivanov wrote:
> This is a new PEP to implement Execution Contexts in Python.
The idea is of course great!
A couple of issues for decimal:
> Moreover, passing the context explicitly does not work at all for
> libraries like ``decimal`` or ``numpy``, which use operator overloading.
Instead of "with localcontext() ...", each coroutine can create a new
Context() and use its methods, without any loss of functionality.
All one loses is the inline operator syntax sugar.
I'm …
[View More]aware you know all this, but the entire decimal paragraph sounds a bit
as if this option did not exist.
> Fast C API for packages like ``decimal`` and ``numpy``.
_decimal relies on caching the most recently used thread-local context,
which gives a speedup of about 25% for inline operators:
https://github.com/python/cpython/blob/master/Modules/_decimal/_decimal.c#L…
Can this speed be achieved with the execution contexts? IOW, can the lookup
of an excecution context be as fast as PyThreadState_GET()?
Stefan Krah
[View Less]
2
1
So, I'm hardly an expert when it comes to things like this, but there are
two things about this that don't seem right to me. (Also, I'd love to
respond inline, but that's kind of difficult from a mobile phone.)
The first is how set/get_execution_context_item take strings. Inevitably,
people are going to do things like:
CONTEXT_ITEM_NAME = 'foo-bar'
...
sys.set_execution_context_item(CONTEXT_ITEM_NAME, 'stuff')
IMO it would be nicer if there could be a key object used instead, e.g.
my_key = …
[View More]sys.execution_context_key('name-here-for-debugging-purposes')
sys.set_execution_context_item(my_key, 'stuff')
The advantage here would be no need for string constants and no potential
naming conflicts (the string passed to the key creator would be used just
for debugging, kind of like Thread names).
Second thing is this:
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
If this would be done frequently, a context manager would be a *lot* more
Pythonic, e.g.:
with sys.temp_change_execution_context('x', new_x):
# ...
--
Ryan (ライアン)
Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone
elsehttp://refi64.com
On Aug 11, 2017 at 5:38 PM, <Yury Selivanov <yselivanov.ml(a)gmail.com>>
wrote:
Hi,
This is a new PEP to implement Execution Contexts in Python.
The PEP is in-flight to python.org, and in the meanwhile can
be read on GitHub:
https://github.com/python/peps/blob/master/pep-0550.rst
(it contains a few diagrams and charts, so please read it there.)
Thank you!
Yury
PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury(a)magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2017
Python-Version: 3.7
Post-History: 11-Aug-2017
Abstract
========
This PEP proposes a new mechanism to manage execution state--the
logical environment in which a function, a thread, a generator,
or a coroutine executes in.
A few examples of where having a reliable state storage is required:
* Context managers like decimal contexts, ``numpy.errstate``,
and ``warnings.catch_warnings``;
* Storing request-related data such as security tokens and request
data in web applications;
* Profiling, tracing, and logging in complex and large code bases.
The usual solution for storing state is to use a Thread-local Storage
(TLS), implemented in the standard library as ``threading.local()``.
Unfortunately, TLS does not work for isolating state of generators or
asynchronous code because such code shares a single thread.
Rationale
=========
Traditionally a Thread-local Storage (TLS) is used for storing the
state. However, the major flaw of using the TLS is that it works only
for multi-threaded code. It is not possible to reliably contain the
state within a generator or a coroutine. For example, consider
the following generator::
def calculate(precision, ...):
with decimal.localcontext() as ctx:
# Set the precision for decimal calculations
# inside this block
ctx.prec = precision
yield calculate_something()
yield calculate_something_else()
Decimal context is using a TLS to store the state, and because TLS is
not aware of generators, the state can leak. The above code will
not work correctly, if a user iterates over the ``calculate()``
generator with different precisions in parallel::
g1 = calculate(100)
g2 = calculate(50)
items = list(zip(g1, g2))
# items[0] will be a tuple of:
# first value from g1 calculated with 100 precision,
# first value from g2 calculated with 50 precision.
#
# items[1] will be a tuple of:
# second value from g1 calculated with 50 precision,
# second value from g2 calculated with 50 precision.
An even scarier example would be using decimals to represent money
in an async/await application: decimal calculations can suddenly
lose precision in the middle of processing a request. Currently,
bugs like this are extremely hard to find and fix.
Another common need for web applications is to have access to the
current request object, or security context, or, simply, the request
URL for logging or submitting performance tracing data::
async def handle_http_request(request):
context.current_http_request = request
await ...
# Invoke your framework code, render templates,
# make DB queries, etc, and use the global
# 'current_http_request' in that code.
# This isn't currently possible to do reliably
# in asyncio out of the box.
These examples are just a few out of many, where a reliable way to
store context data is absolutely needed.
The inability to use TLS for asynchronous code has lead to
proliferation of ad-hoc solutions, limited to be supported only by
code that was explicitly enabled to work with them.
Current status quo is that any library, including the standard
library, that uses a TLS, will likely not work as expected in
asynchronous code or with generators (see [3]_ as an example issue.)
Some languages that have coroutines or generators recommend to
manually pass a ``context`` object to every function, see [1]_
describing the pattern for Go. This approach, however, has limited
use for Python, where we have a huge ecosystem that was built to work
with a TLS-like context. Moreover, passing the context explicitly
does not work at all for libraries like ``decimal`` or ``numpy``,
which use operator overloading.
.NET runtime, which has support for async/await, has a generic
solution of this problem, called ``ExecutionContext`` (see [2]_).
On the surface, working with it is very similar to working with a TLS,
but the former explicitly supports asynchronous code.
Goals
=====
The goal of this PEP is to provide a more reliable alternative to
``threading.local()``. It should be explicitly designed to work with
Python execution model, equally supporting threads, generators, and
coroutines.
An acceptable solution for Python should meet the following
requirements:
* Transparent support for code executing in threads, coroutines,
and generators with an easy to use API.
* Negligible impact on the performance of the existing code or the
code that will be using the new mechanism.
* Fast C API for packages like ``decimal`` and ``numpy``.
Explicit is still better than implicit, hence the new APIs should only
be used when there is no option to pass the state explicitly.
With this PEP implemented, it should be possible to update a context
manager like the below::
_local = threading.local()
@contextmanager
def context(x):
old_x = getattr(_local, 'x', None)
_local.x = x
try:
yield
finally:
_local.x = old_x
to a more robust version that can be reliably used in generators
and async/await code, with a simple transformation::
@contextmanager
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
Specification
=============
This proposal introduces a new concept called Execution Context (EC),
along with a set of Python APIs and C APIs to interact with it.
EC is implemented using an immutable mapping. Every modification
of the mapping produces a new copy of it. To illustrate what it
means let's compare it to how we work with tuples in Python::
a0 = ()
a1 = a0 + (1,)
a2 = a1 + (2,)
# a0 is an empty tuple
# a1 is (1,)
# a2 is (1, 2)
Manipulating an EC object would be similar::
a0 = EC()
a1 = a0.set('foo', 'bar')
a2 = a1.set('spam', 'ham')
# a0 is an empty mapping
# a1 is {'foo': 'bar'}
# a2 is {'foo': 'bar', 'spam': 'ham'}
In CPython, every thread that can execute Python code has a
corresponding ``PyThreadState`` object. It encapsulates important
runtime information like a pointer to the current frame, and is
being used by the ceval loop extensively. We add a new field to
``PyThreadState``, called ``exec_context``, which points to the
current EC object.
We also introduce a set of APIs to work with Execution Context.
In this section we will only cover two functions that are needed to
explain how Execution Context works. See the full list of new APIs
in the `New APIs`_ section.
* ``sys.get_execution_context_item(key, default=None)``: lookup
``key`` in the EC of the executing thread. If not found,
return ``default``.
* ``sys.set_execution_context_item(key, value)``: get the
current EC of the executing thread. Add a ``key``/``value``
item to it, which will produce a new EC object. Set the
new object as the current one for the executing thread.
In pseudo-code::
tstate = PyThreadState_GET()
ec = tstate.exec_context
ec2 = ec.set(key, value)
tstate.exec_context = ec2
Note, that some important implementation details and optimizations
are omitted here, and will be covered in later sections of this PEP.
Now let's see how Execution Contexts work with regular multi-threaded
code, generators, and coroutines.
Regular & Multithreaded Code
----------------------------
For regular Python code, EC behaves just like a thread-local. Any
modification of the EC object produces a new one, which is immediately
set as the current one for the thread state.
.. figure:: pep-0550/functions.png
:align: center
:width: 90%
Figure 1. Execution Context flow in a thread.
As Figure 1 illustrates, if a function calls
``set_execution_context_item()``, the modification of the execution
context will be visible to all subsequent calls and to the caller::
def set_foo():
set_execution_context_item('foo', 'spam')
set_execution_context_item('foo', 'bar')
print(get_execution_context_item('foo'))
set_foo()
print(get_execution_context_item('foo'))
# will print:
# bar
# spam
Coroutines
----------
Python :pep:`492` coroutines are used to implement cooperative
multitasking. For a Python end-user they are similar to threads,
especially when it comes to sharing resources or modifying
the global state.
An event loop is needed to schedule coroutines. Coroutines that
are explicitly scheduled by the user are usually called Tasks.
When a coroutine is scheduled, it can schedule other coroutines using
an ``await`` expression. In async/await world, awaiting a coroutine
can be viewed as a different calling convention: Tasks are similar to
threads, and awaiting on coroutines within a Task is similar to
calling functions within a thread.
By drawing a parallel between regular multithreaded code and
async/await, it becomes apparent that any modification of the
execution context within one Task should be visible to all coroutines
scheduled within it. Any execution context modifications, however,
must not be visible to other Tasks executing within the same thread.
To achieve this, a small set of modifications to the coroutine object
is needed:
* When a coroutine object is instantiated, it saves a reference to
the current execution context object to its ``cr_execution_context``
attribute.
* Coroutine's ``.send()`` and ``.throw()`` methods are modified as
follows (in pseudo-C)::
if coro->cr_isolated_execution_context:
# Save a reference to the current execution context
old_context = tstate->execution_context
# Set our saved execution context as the current
# for the current thread.
tstate->execution_context = coro->cr_execution_context
try:
# Perform the actual `Coroutine.send()` or
# `Coroutine.throw()` call.
return coro->send(...)
finally:
# Save a reference to the updated execution_context.
# We will need it later, when `.send()` or `.throw()`
# are called again.
coro->cr_execution_context = tstate->execution_context
# Restore thread's execution context to what it was before
# invoking this coroutine.
tstate->execution_context = old_context
else:
# Perform the actual `Coroutine.send()` or
# `Coroutine.throw()` call.
return coro->send(...)
* ``cr_isolated_execution_context`` is a new attribute on coroutine
objects. Set to ``True`` by default, it makes any execution context
modifications performed by coroutine to stay visible only to that
coroutine.
When Python interpreter sees an ``await`` instruction, it flips
``cr_isolated_execution_context`` to ``False`` for the coroutine
that is about to be awaited. This makes any changes to execution
context made by nested coroutine calls within a Task to be visible
throughout the Task.
Because the top-level coroutine (Task) cannot be scheduled with
``await`` (in asyncio you need to call ``loop.create_task()`` or
``asyncio.ensure_future()`` to schedule a Task), all execution
context modifications are guaranteed to stay within the Task.
* We always work with ``tstate->exec_context``. We use
``coro->cr_execution_context`` only to store coroutine's execution
context when it is not executing.
Figure 2 below illustrates how execution context mutations work with
coroutines.
.. figure:: pep-0550/coroutines.png
:align: center
:width: 90%
Figure 2. Execution Context flow in coroutines.
In the above diagram:
* When "coro1" is created, it saves a reference to the current
execution context "2".
* If it makes any change to the context, it will have its own
execution context branch "2.1".
* When it awaits on "coro2", any subsequent changes it does to
the execution context are visible to "coro1", but not outside
of it.
In code::
async def inner_foo():
print('inner_foo:', get_execution_context_item('key'))
set_execution_context_item('key', 2)
async def foo():
print('foo:', get_execution_context_item('key'))
set_execution_context_item('key', 1)
await inner_foo()
print('foo:', get_execution_context_item('key'))
set_execution_context_item('key', 'spam')
print('main:', get_execution_context_item('key'))
asyncio.get_event_loop().run_until_complete(foo())
print('main:', get_execution_context_item('key'))
which will output::
main: spam
foo: spam
inner_foo: 1
foo: 2
main: spam
Generator-based coroutines (generators decorated with
``types.coroutine`` or ``asyncio.coroutine``) behave exactly as
native coroutines with regards to execution context management:
their ``yield from`` expression is semantically equivalent to
``await``.
Generators
----------
Generators in Python, while similar to Coroutines, are used in a
fundamentally different way. They are producers of data, and
they use ``yield`` expression to suspend/resume their execution.
A crucial difference between ``await coro`` and ``yield value`` is
that the former expression guarantees that the ``coro`` will be
executed to the end, while the latter is producing ``value`` and
suspending the generator until it gets iterated again.
Generators share 99% of their implementation with coroutines, and
thus have similar new attributes ``gi_execution_context`` and
``gi_isolated_execution_context``. Similar to coroutines, generators
save a reference to the current execution context when they are
instantiated. The have the same implementation of ``.send()`` and
``.throw()`` methods.
The only difference is that
``gi_isolated_execution_context`` is always set to ``True``, and
is never modified by the interpreter. ``yield from o`` expression in
regular generators that are not decorated with ``types.coroutine``,
is semantically equivalent to ``for v in o: yield v``.
.. figure:: pep-0550/generators.png
:align: center
:width: 90%
Figure 3. Execution Context flow in a generator.
In the above diagram:
* When "gen1" is created, it saves a reference to the current
execution context "2".
* If it makes any change to the context, it will have its own
execution context branch "2.1".
* When "gen2" is created, it saves a reference to the current
execution context for it -- "2.1".
* Any subsequent execution context updated in "gen2" will only
be visible to "gen2".
* Likewise, any context changes that "gen1" will do after it
created "gen2" will not be visible to "gen2".
In code::
def inner_foo():
for i in range(3):
print('inner_foo:', get_execution_context_item('key'))
set_execution_context_item('key', i)
yield i
def foo():
set_execution_context_item('key', 'spam')
print('foo:', get_execution_context_item('key'))
inner = inner_foo()
while True:
val = next(inner, None)
if val is None:
break
yield val
print('foo:', get_execution_context_item('key'))
set_execution_context_item('key', 'spam')
print('main:', get_execution_context_item('key'))
list(foo())
print('main:', get_execution_context_item('key'))
which will output::
main: ham
foo: spam
inner_foo: spam
foo: spam
inner_foo: 0
foo: spam
inner_foo: 1
foo: spam
main: ham
As we see, any modification of the execution context in a generator
is visible only to the generator itself.
There is one use-case where it is desired for generators to affect
the surrounding execution context: ``contextlib.contextmanager``
decorator. To make the following work::
@contextmanager
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
we modified ``contextmanager`` to flip
``gi_isolated_execution_context`` flag to ``False`` on its generator.
Greenlets
---------
Greenlet is an alternative implementation of cooperative
scheduling for Python. Although greenlet package is not part of
CPython, popular frameworks like gevent rely on it, and it is
important that greenlet can be modified to support execution
contexts.
In a nutshell, greenlet design is very similar to design of
generators. The main difference is that for generators, the stack
is managed by the Python interpreter. Greenlet works outside of the
Python interpreter, and manually saves some ``PyThreadState``
fields and pushes/pops the C-stack. Since Execution Context is
implemented on top of ``PyThreadState``, it's easy to add
transparent support of it to greenlet.
New APIs
========
Even though this PEP adds a number of new APIs, please keep in mind,
that most Python users will likely ever use only two of them:
``sys.get_execution_context_item()`` and
``sys.set_execution_context_item()``.
Python
------
1. ``sys.get_execution_context_item(key, default=None)``: lookup
``key`` for the current Execution Context. If not found,
return ``default``.
2. ``sys.set_execution_context_item(key, value)``: set
``key``/``value`` item for the current Execution Context.
If ``value`` is ``None``, the item will be removed.
3. ``sys.get_execution_context()``: return the current Execution
Context object: ``sys.ExecutionContext``.
4. ``sys.set_execution_context(ec)``: set the passed
``sys.ExecutionContext`` instance as a current one for the current
thread.
5. ``sys.ExecutionContext`` object.
Implementation detail: ``sys.ExecutionContext`` wraps a low-level
``PyExecContextData`` object. ``sys.ExecutionContext`` has a
mutable mapping API, abstracting away the real immutable
``PyExecContextData``.
* ``ExecutionContext()``: construct a new, empty, execution
context.
* ``ec.run(func, *args)`` method: run ``func(*args)`` in the
``ec`` execution context.
* ``ec[key]``: lookup ``key`` in ``ec`` context.
* ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``.
* ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and
``ec.copy()`` are similar to that of ``dict`` object.
C API
-----
C API is different from the Python one because it operates directly
on the low-level immutable ``PyExecContextData`` object.
1. New ``PyThreadState->exec_context`` field, pointing to a
``PyExecContextData`` object.
2. ``PyThreadState_SetExecContextItem`` and
``PyThreadState_GetExecContextItem`` similar to
``sys.set_execution_context_item()`` and
``sys.get_execution_context_item()``.
3. ``PyThreadState_GetExecContext``: similar to
``sys.get_execution_context()``. Always returns an
``PyExecContextData`` object. If ``PyThreadState->exec_context``
is ``NULL`` an new and empty one will be created and assigned
to ``PyThreadState->exec_context``.
4. ``PyThreadState_SetExecContext``: similar to
``sys.set_execution_context()``.
5. ``PyExecContext_New``: create a new empty ``PyExecContextData``
object.
6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``.
The exact layout ``PyExecContextData`` is private, which allows
to switch it to a different implementation later. More on that
in the `Implementation Details`_ section.
Modifications in Standard Library
=================================
* ``contextlib.contextmanager`` was updated to flip the new
``gi_isolated_execution_context`` attribute on the generator.
* ``asyncio.events.Handle`` object now captures the current
execution context when it is created, and uses the saved
execution context to run the callback (with
``ExecutionContext.run()`` method.) This makes
``loop.call_soon()`` to run callbacks in the execution context
they were scheduled.
No modifications in ``asyncio.Task`` or ``asyncio.Future`` were
necessary.
Some standard library modules like ``warnings`` and ``decimal``
can be updated to use new execution contexts. This will be considered
in separate issues if this PEP is accepted.
Backwards Compatibility
=======================
This proposal preserves 100% backwards compatibility.
Performance
===========
Implementation Details
----------------------
The new ``PyExecContextData`` object is wrapping a ``dict`` object.
Any modification requires creating a shallow copy of the dict.
While working on the reference implementation of this PEP, we were
able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for
details.
.. figure:: pep-0550/dict_copy.png
:align: center
:width: 100%
Figure 4.
Figure 4 shows that the performance of immutable dict implemented
with shallow copying is expectedly O(n) for the ``set()`` operation.
However, this is tolerable until dict has more than 100 items
(1 ``set()`` takes about a microsecond.)
Judging by the number of modules that need EC in Standard Library
it is likely that real world Python applications will use
significantly less than 100 execution context variables.
The important point is that the cost of accessing a key in
Execution Context is always O(1).
If the ``set()`` operation performance is a major concern, we discuss
alternative approaches that have O(1) or close ``set()`` performance
in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and
`Copy-on-write Execution Context`_ sections.
Generators and Coroutines
-------------------------
Using a microbenchmark for generators and coroutines from :pep:`492`
([12]_), it was possible to observe 0.5 to 1% performance degradation.
asyncio echoserver microbechmarks from the uvloop project [13]_
showed 1-1.5% performance degradation for asyncio code.
asyncpg benchmarks [14]_, that execute more code and are closer to a
real-world application did not exhibit any noticeable performance
change.
Overall Performance Impact
--------------------------
The total number of changed lines in the ceval loop is 2 -- in the
``YIELD_FROM`` opcode implementation. Only performance of generators
and coroutines can be affected by the proposal.
This was confirmed by running Python Performance Benchmark Suite
[15]_, which demonstrated that there is no difference between
3.7 master branch and this PEP reference implementation branch
(full benchmark results can be found here [16]_.)
Design Considerations
=====================
Alternative Immutable Dict Implementation
-----------------------------------------
Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
to implement high performance immutable collections [5]_, [6]_.
Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
performance for both ``set()`` and ``get()`` operations, which will
be essentially O(1) for relatively small mappings in EC.
To assess if HAMT can be used for Execution Context, we implemented
it in CPython [7]_.
.. figure:: pep-0550/hamt_vs_dict.png
:align: center
:width: 100%
Figure 5. Benchmark code can be found here: [9]_.
Figure 5 shows that HAMT indeed displays O(1) performance for all
benchmarked dictionary sizes. For dictionaries with less than 100
items, HAMT is a bit slower than Python dict/shallow copy.
.. figure:: pep-0550/lookup_hamt.png
:align: center
:width: 100%
Figure 6. Benchmark code can be found here: [10]_.
Figure 6 below shows comparison of lookup costs between Python dict
and an HAMT immutable mapping. HAMT lookup time is 30-40% worse
than Python dict lookups on average, which is a very good result,
considering how well Python dicts are optimized.
Note, that according to [8]_, HAMT design can be further improved.
The bottom line is that the current approach with implementing
an immutable mapping with shallow-copying dict will likely perform
adequately in real-life applications. The HAMT solution is more
future proof, however.
The proposed API is designed in such a way that the underlying
implementation of the mapping can be changed completely without
affecting the Execution Context `Specification`_, which allows
us to switch to HAMT at some point if necessary.
Copy-on-write Execution Context
-------------------------------
The implementation of Execution Context in .NET is different from
this PEP. .NET uses copy-on-write mechanism and a regular mutable
mapping.
One way to implement this in CPython would be to have two new
fields in ``PyThreadState``:
* ``exec_context`` pointing to the current Execution Context mapping;
* ``exec_context_copy_on_write`` flag, set to ``0`` initially.
The idea is that whenever we are modifying the EC, the copy-on-write
flag is checked, and if it is set to ``1``, the EC is copied.
Modifications to Coroutine and Generator ``.send()`` and ``.throw()``
methods described in the `Coroutines`_ section will be almost the
same, except that in addition to the ``gi_execution_context`` they
will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine
or a generator starts, the flag will be set to ``1``. This will
ensure that any modification of the EC performed within a coroutine
or a generator will be isolated.
This approach has one advantage:
* For Execution Context that contains a large number of items,
copy-on-write is a more efficient solution than the shallow-copy
dict approach.
However, we believe that copy-on-write disadvantages are more
important to consider:
* Copy-on-write behaviour for generators and coroutines makes
EC semantics less predictable.
With immutable EC approach, generators and coroutines always
execute in the EC that was current at the moment of their
creation. Any modifications to the outer EC while a generator
or a coroutine is executing are not visible to them::
def generator():
yield 1
print(get_execution_context_item('key'))
yield 2
set_execution_context_item('key', 'spam')
gen = iter(generator())
next(gen)
set_execution_context_item('key', 'ham')
next(gen)
The above script will always print 'spam' with immutable EC.
With a copy-on-write approach, the above script will print 'ham'.
Now, consider that ``generator()`` was refactored to call some
library function, that uses Execution Context::
def generator():
yield 1
some_function_that_uses_decimal_context()
print(get_execution_context_item('key'))
yield 2
Now, the script will print 'spam', because
``some_function_that_uses_decimal_context`` forced the EC to copy,
and ``set_execution_context_item('key', 'ham')`` line did not
affect the ``generator()`` code after all.
* Similarly to the previous point, ``sys.ExecutionContext.run()``
method will also become less predictable, as
``sys.get_execution_context()`` would still return a reference to
the current mutable EC.
We can't modify ``sys.get_execution_context()`` to return a shallow
copy of the current EC, because this would seriously harm
performance of ``asyncio.call_soon()`` and similar places, where
it is important to propagate the Execution Context.
* Even though copy-on-write requires to shallow copy the execution
context object less frequently, copying will still take place
in coroutines and generators. In which case, HAMT approach will
perform better for medium to large sized execution contexts.
All in all, we believe that the copy-on-write approach introduces
very subtle corner cases that could lead to bugs that are
exceptionally hard to discover and fix.
The immutable EC solution in comparison is always predictable and
easy to reason about. Therefore we believe that any slight
performance gain that the copy-on-write solution might offer is not
worth it.
Faster C API
------------
Packages like numpy and standard library modules like decimal need
to frequently query the global state for some local context
configuration. It is important that the APIs that they use is as
fast as possible.
The proposed ``PyThreadState_SetExecContextItem`` and
``PyThreadState_GetExecContextItem`` functions need to get the
current thread state with ``PyThreadState_GET()`` (fast) and then
perform a hash lookup (relatively slow). We can eliminate the hash
lookup by adding three additional C API functions:
* ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``:
a function similar to the existing ``_PyEval_RequestCodeExtraIndex``
introduced :pep:`523`. The idea is to request a unique index
that can later be used to lookup context items.
The ``key_name`` can later be used by ``sys.ExecutionContext`` to
introspect items added with this API.
* ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)``
and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)``
to request an item by its index, avoiding the cost of hash lookup.
Why setting a key to None removes the item?
-------------------------------------------
Consider a context manager::
@contextmanager
def context(x):
old_x = get_execution_context_item('x')
set_execution_context_item('x', x)
try:
yield
finally:
set_execution_context_item('x', old_x)
With ``set_execution_context_item(key, None)`` call removing the
``key``, the user doesn't need to write additional code to remove
the ``key`` if it wasn't in the execution context already.
An alternative design with ``del_execution_context_item()`` method
would look like the following::
@contextmanager
def context(x):
not_there = object()
old_x = get_execution_context_item('x', not_there)
set_execution_context_item('x', x)
try:
yield
finally:
if old_x is not_there:
del_execution_context_item('x')
else:
set_execution_context_item('x', old_x)
Can we fix ``PyThreadState_GetDict()``?
---------------------------------------
``PyThreadState_GetDict`` is a TLS, and some of its existing users
might depend on it being just a TLS. Changing its behaviour to follow
the Execution Context semantics would break backwards compatibility.
PEP 521
-------
:pep:`521` proposes an alternative solution to the problem:
enhance Context Manager Protocol with two new methods: ``__suspend__``
and ``__resume__``. To make it compatible with async/await,
the Asynchronous Context Manager Protocol will also need to be
extended with ``__asuspend__`` and ``__aresume__``.
This allows to implement context managers like decimal context and
``numpy.errstate`` for generators and coroutines.
The following code::
class Context:
def __enter__(self):
self.old_x = get_execution_context_item('x')
set_execution_context_item('x', 'something')
def __exit__(self, *err):
set_execution_context_item('x', self.old_x)
would become this::
class Context:
def __enter__(self):
self.old_x = get_execution_context_item('x')
set_execution_context_item('x', 'something')
def __suspend__(self):
set_execution_context_item('x', self.old_x)
def __resume__(self):
set_execution_context_item('x', 'something')
def __exit__(self, *err):
set_execution_context_item('x', self.old_x)
Besides complicating the protocol, the implementation will likely
negatively impact performance of coroutines, generators, and any code
that uses context managers, and will notably complicate the
interpreter implementation. It also does not solve the leaking state
problem for greenlet/gevent.
:pep:`521` also does not provide any mechanism to propagate state
in a local context, like storing a request object in an HTTP request
handler to have better logging.
Can Execution Context be implemented outside of CPython?
--------------------------------------------------------
Because async/await code needs an event loop to run it, an EC-like
solution can be implemented in a limited way for coroutines.
Generators, on the other hand, do not have an event loop or
trampoline, making it impossible to intercept their ``yield`` points
outside of the Python interpreter.
Reference Implementation
========================
The reference implementation can be found here: [11]_.
References
==========
.. [1] https://blog.golang.org/context
.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.…
.. [3] https://github.com/numpy/numpy/issues/9444
.. [4] http://bugs.python.org/issue31179
.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashma…
.. [7] https://github.com/1st1/cpython/tree/hamt
.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
.. [11] https://github.com/1st1/cpython/tree/pep550
.. [12] https://www.python.org/dev/peps/pep-0492/#async-await
.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.…
.. [14] https://github.com/MagicStack/pgbench
.. [15] https://github.com/python/performance
.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
Copyright
=========
This document has been placed in the public domain.
_______________________________________________
Python-ideas mailing list
Python-ideas(a)python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
[View Less]
2
1
Before I done my firesuit, I'd like to say that I much prefer python and I rail on JS whenever I can. However these days it is quite common to be doing work in both Python and Javascript. Harmonizing the two would help JS developers pick up the language as well as people like me that are stuck working in JS as well.
TIOBE has Python at 5 and JS at 8 https://www.tiobe.com/tiobe-index/
Redmonk: 1 and 1, respectively http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/
PYPL: 2 and 5 …
[View More]respectively http://pypl.github.io/PYPL.html
While JS is strongly for web (Node.JS, Browsers) and Python has a weak showing (Tornado, Flask), Python is very popular on everything else on the backend where JS isn't and isn't likely to be. The I'm making point is not to choose a 'winner', but to make the observation that: given that the tight clustering of the two languages there will be considerable overlap. People like me are asked to do both quite frequently. So I'd like a little more harmony to aid in my day-to-day. I have just as many python files as JS files open in my editor at this moment.
There are several annoyances that if removed, would go a long way.
1. Object literals: JS: {a:1} vs Python: {'a':1}
Making my fingers dance on ' or " is not a good use of keystrokes, and it decreases readability. However a counter argument here is what about when the a is a variable? JS allows o[a] as a way to assigned to a property that is a variable. Python of course offers functions that do this, but for simple objects, this would very much be appreciated.
The point here is this is
2. Join: JS: [].join(s) vs Python: s.join([])
I've read the justification for putting join on a string, and it makes sense. But I think we should put it on the list too.
3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python #
This one is pretty self-explanatory.
Some might want even more harmony, but I don't know the repercussions of all of that. I think the above could be implemented without breaking anything. What I do know is that 85% of my friction would be removed if the above were implemented.
[View Less]
10
25
The generator syntax, (x for x in i if c), currently always creates a
new generator. I find this quite inefficient:
{x for x in integers if 1000 <= x < 1000000} # never completes, because
it's trying to iterate over all integers
What if, somehow, object `integers` could hook the generator and produce
the equivalent of {x for x in range(1000, 1000000)}, which does complete?
What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar
for (x for x in range(1000, …
[View More]1000000))?
(I like mathy syntax. Do you like mathy syntax?)
[View Less]
13
38
Hi all,
First time emailer, so please be kind. Also, if this is not the right
mailing list for PyPA talk, I apologize. Please point me in the right
direction if so. The main reason I have emailed here is I believe it may be
PEP time to standardize the JSON metadata that PyPI makes available, like
what was done for the `'simple API` described in PEP503.
I've been doing a bit of work on `bandersnatch` (I didn't name it), which
is a PEP 381 mirroring package and wanted to enhance it to also …
[View More]mirror the
handy JSON metadata PyPI generates and makes available @
https://pypi.python.org/pypi/PKG_NAME/json.
I've done a PR on bandersnatch as a POC that mirrors both the PyPI
directory structure (URL/pypi/PKG_NAME/json) and created a standardizable
URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some
other proxy). I'm also contemplating naming the directory 'metadata' rather
than JSON so if some new hotness / we want to change the format down the
line we're not stuck with json as the dirname. This PR can be found here:
https://bitbucket.org/pypa/bandersnatch/pull-requests/33/save-json-metadata…
My main use case is to write a very simple async 'verifier' tool that will
crawl all the JSON files and then ensure the packages directory on each of
my internal mirrors (I have a mirror per region / datacenter) have all the
files they should. I sync centrally (to save resource on the PyPI
infrastructure) and then rsync out all the diffs to each region /
datacenter, and under some failure scenarios I could miss a file or many.
So I feel using JSON pulled down from the authoritative source will allow
an async job to verify the MD5 of all the package files on each mirror.
What are peoples thoughts here? Is it worth a PEP similar to PEP503 going
forward? Can people enhance / share some thoughts on this idea.
Thanks,
Cooper Lees
me(a)cooperlees.com <me(a)copperlees.com>
https://cooperlees.com/
[View Less]
3
2
Hi,
A friend and I have hit a funny situation with the `mimetypes.py` library
guessing the type for a '.json' file. Is there a reason why '.json' hasn't
been
added to the mapping?
Without `mailcap` installed:
[root@de169da8cc46 /]# python3 -m mimetypes build.json
I don't know anything about type build.json
With `mailcap` installed:
[root@de169da8cc46 /]# python3 -m mimetypes build.json
type: application/json encoding: None
We experimented with adding a mapping for '.json' to 'application/…
[View More]json' to
`mimetypes.py` and it seems to work fine for us. It looks like it has been
registered with IANA and everything.
Proposed diff:
ntangsurat@derigible ~/git/e4r7hbug.cpython/Lib master $ git diff
diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py
index 3d68694864..5919b45a9b 100644
--- a/Lib/mimetypes.py
+++ b/Lib/mimetypes.py
@@ -439,6 +439,7 @@ def _default_mime_types():
'.jpeg' : 'image/jpeg',
'.jpg' : 'image/jpeg',
'.js' : 'application/javascript',
+ '.json' : 'application/json',
'.ksh' : 'text/plain',
'.latex' : 'application/x-latex',
'.m1v' : 'video/mpeg',
Nate.
[View Less]
5
5
Hi folks
I was thinking about how sometimes, a function sometimes acts on classes, and
behaves very much like a method. Adding new methods to classes existing classes
is currently somewhat difficult, and having pseudo methods would make that
easier.
Code example: (The syntax can most likely be improved upon)
def has_vowels(self: str):
for vowel in ["a", "e,", "i", "o", "u"]:
if vowel in self: return True
This allows one to wring `string.has_vowels()` instead of `…
[View More]has_vowels(string)`,
which would make it easier to read, and would make it easier to add
functionality to existing classes, without having to extend them. This would be
useful for builtins or imported libraries, so one can fill in "missing" methods.
* Simple way to extend classes
* Improves readability
* Easy to understand
~Paul
[View Less]
8
11