There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
Hi folks,
I normally wouldn't bring something like this up here, except I think
that there is possibility of something to be done--a language
documentation clarification if nothing else, though possibly an actual
code change as well.
I've been having an argument with a colleague over the last couple
days over the proper way order of statements when setting up a
try/finally to perform cleanup of some action. On some level we're
both being stubborn I think, and I'm not looking for resolution as to
who's right/wrong or I wouldn't bring it to this list in the first
place. The original argument was over setting and later restoring
os.environ, but we ended up arguing over
threading.Lock.acquire/release which I think is a more interesting
example of the problem, and he did raise a good point that I do want
to bring up.
</prologue>
My colleague's contention is that given
lock = threading.Lock()
this is simply *wrong*:
lock.acquire()
try:
do_something()
finally:
lock.release()
whereas this is okay:
with lock:
do_something()
Ignoring other details of how threading.Lock is actually implemented,
assuming that Lock.__enter__ calls acquire() and Lock.__exit__ calls
release() then as far as I've known ever since Python 2.5 first came
out these two examples are semantically *equivalent*, and I can't find
any way of reading PEP 343 or the Python language reference that would
suggest otherwise.
However, there *is* a difference, and has to do with how signals are
handled, particularly w.r.t. context managers implemented in C (hence
we are talking CPython specifically):
If Lock.__enter__ is a pure Python method (even if it maybe calls some
C methods), and a SIGINT is handled during execution of that method,
then in almost all cases a KeyboardInterrupt exception will be raised
from within Lock.__enter__--this means the suite under the with:
statement is never evaluated, and Lock.__exit__ is never called. You
can be fairly sure the KeyboardInterrupt will be raised from somewhere
within a pure Python Lock.__enter__ because there will usually be at
least one remaining opcode to be evaluated, such as RETURN_VALUE.
Because of how delayed execution of signal handlers is implemented in
the pyeval main loop, this means the signal handler for SIGINT will be
called *before* RETURN_VALUE, resulting in the KeyboardInterrupt
exception being raised. Standard stuff.
However, if Lock.__enter__ is a PyCFunction things are quite
different. If you look at how the SETUP_WITH opcode is implemented,
it first calls the __enter__ method with _PyObjet_CallNoArg. If this
returns NULL (i.e. an exception occurred in __enter__) then "goto
error" is executed and the exception is raised. However if it returns
non-NULL the finally block is set up with PyFrame_BlockSetup and
execution proceeds to the next opcode. At this point a potentially
waiting SIGINT is handled, resulting in KeyboardInterrupt being raised
while inside the with statement's suite, and finally block, and hence
Lock.__exit__ are entered.
Long story short, because Lock.__enter__ is a C function, assuming
that it succeeds normally then
with lock:
do_something()
always guarantees that Lock.__exit__ will be called if a SIGINT was
handled inside Lock.__enter__, whereas with
lock.acquire()
try:
...
finally:
lock.release()
there is at last a small possibility that the SIGINT handler is called
after the CALL_FUNCTION op but before the try/finally block is entered
(e.g. before executing POP_TOP or SETUP_FINALLY). So the end result
is that the lock is held and never released after the
KeyboardInterrupt (whether or not it's handled somehow).
Whereas, again, if Lock.__enter__ is a pure Python function there's
less likely to be any difference (though I don't think the possibility
can be ruled out entirely).
At the very least I think this quirk of CPython should be mentioned
somewhere (since in all other cases the semantic meaning of the
"with:" statement is clear). However, I think it might be possible to
gain more consistency between these cases if pending signals are
checked/handled after any direct call to PyCFunction from within the
ceval loop.
Sorry for the tl;dr; any thoughts?
Hi,
For technical reasons, many functions of the Python standard libraries
implemented in C have positional-only parameters. Example:
-------
$ ./python
Python 3.7.0a0 (default, Feb 25 2017, 04:30:32)
>>> help(str.replace)
replace(self, old, new, count=-1, /) # <== notice "/" at the end
...
>>> "a".replace("x", "y") # ok
'a'
>>> "a".replace(old="x", new="y") # ERR!
TypeError: replace() takes at least 2 arguments (0 given)
-------
When converting the methods of the builtin str type to the internal
"Argument Clinic" tool (tool to generate the function signature,
function docstring and the code to parse arguments in C), I asked if
we should add support for keyword arguments in str.replace(). The
answer was quick: no! It's a deliberate design choice.
Quote of Yury Selivanov's message:
"""
I think Guido explicitly stated that he doesn't like the idea to
always allow keyword arguments for all methods. I.e. `str.find('aaa')`
just reads better than `str.find(needle='aaa')`. Essentially, the idea
is that for most of the builtins that accept one or two arguments,
positional-only parameters are better.
"""
http://bugs.python.org/issue29286#msg285578
I just noticed a module on PyPI to implement this behaviour on Python functions:
https://pypi.python.org/pypi/positional
My question is: would it make sense to implement this feature in
Python directly? If yes, what should be the syntax? Use "/" marker?
Use the @positional() decorator?
Do you see concrete cases where it's a deliberate choice to deny
passing arguments as keywords?
Don't you like writing int(x="123") instead of int("123")? :-) (I know
that Serhiy Storshake hates the name of the "x" parameter of the int
constructor ;-))
By the way, I read that "/" marker is unknown by almost all Python
developers, and [...] syntax should be preferred, but
inspect.signature() doesn't support this syntax. Maybe we should fix
signature() and use [...] format instead?
Replace "replace(self, old, new, count=-1, /)" with "replace(self,
old, new[, count=-1])" (or maybe even not document the default
value?).
Python 3.5 help (docstring) uses "S.replace(old, new[, count])".
Victor
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
Previously I posted PEP 560 two weeks ago, while several other PEPs were
also posted, so it didn't get much of attention. Here I post the PEP 560
again, now including the full text for convenience of commenting.
--
Ivan
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PEP: 560
Title: Core support for generic types
Author: Ivan Levkivskyi <levkivskyi(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 03-Sep-2017
Python-Version: 3.7
Post-History: 09-Sep-2017
Abstract
========
Initially PEP 484 was designed in such way that it would not introduce
*any* changes to the core CPython interpreter. Now type hints and
the ``typing`` module are extensively used by the community, e.g. PEP 526
and PEP 557 extend the usage of type hints, and the backport of ``typing``
on PyPI has 1M downloads/month. Therefore, this restriction can be removed.
It is proposed to add two special methods ``__class_getitem__`` and
``__subclass_base__`` to the core CPython for better support of
generic types.
Rationale
=========
The restriction to not modify the core CPython interpreter lead to some
design decisions that became questionable when the ``typing`` module started
to be widely used. There are three main points of concerns:
performance of the ``typing`` module, metaclass conflicts, and the large
number of hacks currently used in ``typing``.
Performance:
------------
The ``typing`` module is one of the heaviest and slowest modules in
the standard library even with all the optimizations made. Mainly this is
because subscripted generic types (see PEP 484 for definition of terms
used in this PEP) are class objects (see also [1]_). The three main ways how
the performance can be improved with the help of the proposed special
methods:
- Creation of generic classes is slow since the ``GenericMeta.__new__`` is
very slow; we will not need it anymore.
- Very long MROs for generic classes will be twice shorter; they are present
because we duplicate the ``collections.abc`` inheritance chain
in ``typing``.
- Time of instantiation of generic classes will be improved
(this is minor however).
Metaclass conflicts:
--------------------
All generic types are instances of ``GenericMeta``, so if a user uses
a custom metaclass, then it is hard to make a corresponding class generic.
This is particularly hard for library classes that a user doesn't control.
A workaround is to always mix-in ``GenericMeta``::
class AdHocMeta(GenericMeta, LibraryMeta):
pass
class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
...
but this is not always practical or even possible. With the help of the
proposed special attributes the ``GenericMeta`` metaclass will not be
needed.
Hacks and bugs that will be removed by this proposal:
-----------------------------------------------------
- ``_generic_new`` hack that exists since ``__init__`` is not called on
instances with a type differing form the type whose ``__new__`` was
called,
``C[int]().__class__ is C``.
- ``_next_in_mro`` speed hack will be not necessary since subscription will
not create new classes.
- Ugly ``sys._getframe`` hack, this one is particularly nasty, since it
looks
like we can't remove it without changes outside ``typing``.
- Currently generics do dangerous things with private ABC caches
to fix large memory consumption that grows at least as O(N\ :sup:`2`),
see [2]_. This point is also important because it was recently proposed to
re-implement ``ABCMeta`` in C.
- Problems with sharing attributes between subscripted generics,
see [3]_. Current solution already uses ``__getattr__`` and
``__setattr__``,
but it is still incomplete, and solving this without the current proposal
will be hard and will need ``__getattribute__``.
- ``_no_slots_copy`` hack, where we clean-up the class dictionary on every
subscription thus allowing generics with ``__slots__``.
- General complexity of the ``typing`` module, the new proposal will not
only allow to remove the above mentioned hacks/bugs, but also simplify
the implementation, so that it will be easier to maintain.
Specification
=============
The idea of ``__class_getitem__`` is simple: it is an exact analog of
``__getitem__`` with an exception that it is called on a class that
defines it, not on its instances, this allows us to avoid
``GenericMeta.__getitem__`` for things like ``Iterable[int]``.
The ``__class_getitem__`` is automatically a class method and
does not require ``@classmethod`` decorator (similar to
``__init_subclass__``) and is inherited like normal attributes.
For example::
class MyList:
def __getitem__(self, index):
return index + 1
def __class_getitem__(cls, item):
return f"{cls.__name__}[{item.__name__}]"
class MyOtherList(MyList):
pass
assert MyList()[0] == 1
assert MyList[int] == "MyList[int]"
assert MyOtherList()[0] == 1
assert MyOtherList[int] == "MyOtherList[int]"
Note that this method is used as a fallback, so if a metaclass defines
``__getitem__``, then that will have the priority.
If an object that is not a class object appears in the bases of a class
definition, the ``__subclass_base__`` is searched on it. If found,
it is called with the original tuple of bases as an argument. If the result
of the call is not ``None``, then it is substituted instead of this object.
Otherwise (if the result is ``None``), the base is just removed. This is
necessary to avoid inconsistent MRO errors, that are currently prevented by
manipulations in ``GenericMeta.__new__``. After creating the class,
the original bases are saved in ``__orig_bases__`` (currently this is also
done by the metaclass).
NOTE: These two method names are reserved for exclusive use by
the ``typing`` module and the generic types machinery, and any other use is
strongly discouraged. The reference implementation (with tests) can be found
in [4]_, the proposal was originally posted and discussed on
the ``typing`` tracker, see [5]_.
Backwards compatibility and impact on users who don't use ``typing``:
=====================================================================
This proposal may break code that currently uses the names
``__class_getitem__`` and ``__subclass_base__``.
This proposal will support almost complete backwards compatibility with
the current public generic types API; moreover the ``typing`` module is
still
provisional. The only two exceptions are that currently
``issubclass(List[int], List)`` returns True, with this proposal it will
raise
``TypeError``. Also ``issubclass(collections.abc.Iterable,
typing.Iterable)``
will return ``False``, which is probably desirable, since currently we have
a (virtual) inheritance cycle between these two classes.
With the reference implementation I measured negligible performance effects
(under 1% on a micro-benchmark) for regular (non-generic) classes.
References
==========
.. [1] Discussion following Mark Shannon's presentation at Language Summit
(https://github.com/python/typing/issues/432)
.. [2] Pull Request to implement shared generic ABC caches
(https://github.com/python/typing/pull/383)
.. [3] An old bug with setting/accessing attributes on generic types
(https://github.com/python/typing/issues/392)
.. [4] The reference implementation
(https://github.com/ilevkivskyi/cpython/pull/2/files)
.. [5] Original proposal
(https://github.com/python/typing/issues/468)
Copyright
=========
This document has been placed in the public domain.
I have written a short PEP as a complement/alternative to PEP 549.
I will be grateful for comments and suggestions. The PEP should
appear online soon.
--
Ivan
***********************************************************
PEP: 562
Title: Module __getattr__
Author: Ivan Levkivskyi <levkivskyi(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Sep-2017
Python-Version: 3.7
Post-History: 09-Sep-2017
Abstract
========
It is proposed to support ``__getattr__`` function defined on modules to
provide basic customization of module attribute access.
Rationale
=========
It is sometimes convenient to customize or otherwise have control over
access to module attributes. A typical example is managing deprecation
warnings. Typical workarounds are assigning ``__class__`` of a module object
to a custom subclass of ``types.ModuleType`` or substituting ``sys.modules``
item with a custom wrapper instance. It would be convenient to simplify this
procedure by recognizing ``__getattr__`` defined directly in a module that
would act like a normal ``__getattr__`` method, except that it will be
defined
on module *instances*. For example::
# lib.py
from warnings import warn
deprecated_names = ["old_function", ...]
def _deprecated_old_function(arg, other):
...
def __getattr__(name):
if name in deprecated_names:
warn(f"{name} is deprecated", DeprecationWarning)
return globals()[f"_deprecated_{name}"]
raise AttributeError(f"module {__name__} has no attribute {name}")
# main.py
from lib import old_function # Works, but emits the warning
There is a related proposal PEP 549 that proposes to support instance
properties for a similar functionality. The difference is this PEP proposes
a faster and simpler mechanism, but provides more basic customization.
An additional motivation for this proposal is that PEP 484 already defines
the use of module ``__getattr__`` for this purpose in Python stub files,
see [1]_.
Specification
=============
The ``__getattr__`` function at the module level should accept one argument
which is a name of an attribute and return the computed value or raise
an ``AttributeError``::
def __getattr__(name: str) -> Any: ...
This function will be called only if ``name`` is not found in the module
through the normal attribute lookup.
The reference implementation for this PEP can be found in [2]_.
Backwards compatibility and impact on performance
=================================================
This PEP may break code that uses module level (global) name
``__getattr__``.
The performance implications of this PEP are minimal, since ``__getattr__``
is called only for missing attributes.
References
==========
.. [1] PEP 484 section about ``__getattr__`` in stub files
(https://www.python.org/dev/peps/pep-0484/#stub-files)
.. [2] The reference implementation
(https://github.com/ilevkivskyi/cpython/pull/3/files)
Copyright
=========
This document has been placed in the public domain.
Hi all,
as promised, here is a draft PEP for context variable semantics and
implementation. Apologies for the slight delay; I had a not-so-minor
autosave accident and had to retype the majority of this first draft.
During the past years, there has been growing interest in something like
task-local storage or async-local storage. This PEP proposes an alternative
approach to solving the problems that are typically stated as motivation
for such concepts.
This proposal is based on sketches of solutions since spring 2015, with
some minor influences from the recent discussion related to PEP 550. I can
also see some potential implementation synergy between this PEP and PEP
550, even if the proposed semantics are quite different.
So, here it is. This is the first draft and some things are still missing,
but the essential things should be there.
-- Koos
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PEP: 999
Title: Context-local variables (contextvars)
Version: $Revision$
Last-Modified: $Date$
Author: Koos Zevenhoven
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: DD-Mmm-YYYY
Post-History: DD-Mmm-YYYY
Abstract
========
Sometimes, in special cases, it is desired that code can pass information
down the function call chain to the callees without having to explicitly
pass the information as arguments to each function in the call chain. This
proposal describes a construct which allows code to explicitly switch in
and out of a context where a certain context variable has a given value
assigned to it. This is a modern alternative to some uses of things like
global variables in traditional single-threaded (or thread-unsafe) code and
of thread-local storage in traditional *concurrency-unsafe* code (single-
or multi-threaded). In particular, the proposed mechanism can also be used
with more modern concurrent execution mechanisms such as asynchronously
executed coroutines, without the concurrently executed call chains
interfering with each other's contexts.
The "call chain" can consist of normal functions, awaited coroutines, or
generators. The semantics of context variable scope are equivalent in all
cases, allowing code to be refactored freely into *subroutines* (which here
refers to functions, sub-generators or sub-coroutines) without affecting
the semantics of context variables. Regarding implementation, this proposal
aims at simplicity and minimum changes to the CPython interpreter and to
other Python interpreters.
Rationale
=========
Consider a modern Python *call chain* (or call tree), which in this
proposal refers to any chained (nested) execution of *subroutines*, using
any possible combinations of normal function calls, or expressions using
``await`` or ``yield from``. In some cases, passing necessary *information*
down the call chain as arguments can substantially complicate the required
function signatures, or it can even be impossible to achieve in practice.
In these cases, one may search for another place to store this information.
Let us look at some historical examples.
The most naive option is to assign the value to a global variable or
similar, where the code down the call chain can access it. However, this
immediately makes the code thread-unsafe, because with multiple threads,
all threads assign to the same global variable, and another thread can
interfere at any point in the call chain.
A somewhat less naive option is to store the information as per-thread
information in thread-local storage, where each thread has its own "copy"
of the variable which other threads cannot interfere with. Although
non-ideal, this has been the best solution in many cases. However, thanks
to generators and coroutines, the execution of the call chain can be
suspended and resumed, allowing code in other contexts to run concurrently.
Therefore, using thread-local storage is *concurrency-unsafe*, because
other call chains in other contexts may interfere with the thread-local
variable.
Note that in the above two historical approaches, the stored information
has the *widest* available scope without causing problems. For a third
solution along the same path, one would first define an equivalent of a
"thread" for asynchronous execution and concurrency. This could be seen as
the largest amount of code and nested calls that is guaranteed to be
executed sequentially without ambiguity in execution order. This might be
referred to as concurrency-local or task-local storage. In this meaning of
"task", there is no ambiguity in the order of execution of the code within
one task. (This concept of a task is close to equivalent to a ``Task`` in
``asyncio``, but not exactly.) In such concurrency-locals, it is possible
to pass information down the call chain to callees without another code
path interfering with the value in the background.
Common to the above approaches is that they indeed use variables with a
wide but just-narrow-enough scope. Thread-locals could also be called
thread-wide globals---in single-threaded code, they are indeed truly
global. And task-locals could be called task-wide globals, because tasks
can be very big.
The issue here is that neither global variables, thread-locals nor
task-locals are really meant to be used for this purpose of passing
information of the execution context down the call chain. Instead of the
widest possible variable scope, the scope of the variables should be
controlled by the programmer, typically of a library, to have the desired
scope---not wider. In other words, task-local variables (and globals and
thread-locals) have nothing to do with the kind of context-bound
information passing that this proposal intends to enable, even if
task-locals can be used to emulate the desired semantics. Therefore, in the
following, this proposal describes the semantics and the outlines of an
implementation for *context-local variables* (or context variables,
contextvars). In fact, as a side effect of this PEP, an async framework can
use the proposed feature to implement task-local variables.
Proposal
========
Because the proposed semantics are not a direct extension to anything
already available in Python, this proposal is first described in terms of
semantics and API at a fairly high level. In particular, Python ``with``
statements are heavily used in the description, as they are a good match
with the proposed semantics. However, the underlying ``__enter__`` and
``__exit__`` methods correspond to functions in the lower-level
speed-optimized (C) API. For clarity of this document, the lower-level
functions are not explicitly named in the definition of the semantics.
After describing the semantics and high-level API, the implementation is
described, going to a lower level.
Semantics and higher-level API
------------------------------
Core concept
''''''''''''
A context-local variable is represented by a single instance of
``contextvars.Var``, say ``cvar``. Any code that has access to the ``cvar``
object can ask for its value with respect to the current context. In the
high-level API, this value is given by the ``cvar.value`` property::
cvar = contextvars.Var(default="the default value",
description="example context variable")
assert cvar.value == "the default value" # default still applies
# In code examples, all ``assert`` statements should
# succeed according to the proposed semantics.
No assignments to ``cvar`` have been applied for this context, so
``cvar.value`` gives the default value. Assigning new values to contextvars
is done in a highly scope-aware manner::
with cvar.assign(new_value):
assert cvar.value is new_value
# Any code here, or down the call chain from here, sees:
# cvar.value is new_value
# unless another value has been assigned in a
# nested context
assert cvar.value is new_value
# the assignment of ``cvar`` to ``new_value`` is no longer visible
assert cvar.value == "the default value"
Here, ``cvar.assign(value)`` returns another object, namely
``contextvars.Assignment(cvar, new_value)``. The essential part here is
that applying a context variable assignment (``Assignment.__enter__``) is
paired with a de-assignment (``Assignment.__exit__``). These operations set
the bounds for the scope of the assigned value.
Assignments to the same context variable can be nested to override the
outer assignment in a narrower context::
assert cvar.value == "the default value"
with cvar.assign("outer"):
assert cvar.value == "outer"
with cvar.assign("inner"):
assert cvar.value == "inner"
assert cvar.value == "outer"
assert cvar.value == "the default value"
Also multiple variables can be assigned to in a nested manner without
affecting each other::
cvar1 = contextvars.Var()
cvar2 = contextvars.Var()
assert cvar1.value is None # default is None by default
assert cvar2.value is None
with cvar1.assign(value1):
assert cvar1.value is value1
assert cvar2.value is None
with cvar2.assign(value2):
assert cvar1.value is value1
assert cvar2.value is value2
assert cvar1.value is value1
assert cvar2.value is None
assert cvar1.value is None
assert cvar2.value is None
Or with more convenient Python syntax::
with cvar1.assign(value1), cvar2.assign(value2):
assert cvar1.value is value1
assert cvar2.value is value2
In another *context*, in another thread or otherwise concurrently executed
task or code path, the context variables can have a completely different
state. The programmer thus only needs to worry about the context at hand.
Refactoring into subroutines
''''''''''''''''''''''''''''
Code using contextvars can be refactored into subroutines without affecting
the semantics. For instance::
assi = cvar.assign(new_value)
def apply():
assi.__enter__()
assert cvar.value == "the default value"
apply()
assert cvar.value is new_value
assi.__exit__()
assert cvar.value == "the default value"
Or similarly in an asynchronous context where ``await`` expressions are
used. The subroutine can now be a coroutine::
assi = cvar.assign(new_value)
async def apply():
assi.__enter__()
assert cvar.value == "the default value"
await apply()
assert cvar.value is new_value
assi.__exit__()
assert cvar.value == "the default value"
Or when the subroutine is a generator::
def apply():
yield
assi.__enter__()
which is called using ``yield from apply()`` or with calls to ``next`` or
``.send``. This is discussed further in later sections.
Semantics for generators and generator-based coroutines
'''''''''''''''''''''''''''''''''''''''''''''''''''''''
Generators, coroutines and async generators act as subroutines in much the
same way that normal functions do. However, they have the additional
possibility of being suspended by ``yield`` expressions. Assignment
contexts entered inside a generator are normally preserved across yields::
def genfunc():
with cvar.assign(new_value):
assert cvar.value is new_value
yield
assert cvar.value is new_value
g = genfunc()
next(g)
assert cvar.value == "the default value"
with cvar.assign(another_value):
next(g)
However, the outer context visible to the generator may change state across
yields::
def genfunc():
assert cvar.value is value2
yield
assert cvar.value is value1
yield
with cvar.assign(value3):
assert cvar.value is value3
with cvar.assign(value1):
g = genfunc()
with cvar.assign(value2):
next(g)
next(g)
next(g)
assert cvar.value is value1
Similar semantics apply to async generators defined by ``async def ...
yield ...`` ).
By default, values assigned inside a generator do not leak through yields
to the code that drives the generator. However, the assignment contexts
entered and left open inside the generator *do* become visible outside the
generator after the generator has finished with a ``StopIteration`` or
another exception::
assi = cvar.assign(new_value)
def genfunc():
yield
assi.__enter__():
yield
g = genfunc()
assert cvar.value == "the default value"
next(g)
assert cvar.value == "the default value"
next(g) # assi.__enter__() is called here
assert cvar.value == "the default value"
next(g)
assert cvar.value is new_value
assi.__exit__()
Special functionality for framework authors
-------------------------------------------
Frameworks, such as ``asyncio`` or third-party libraries, can use
additional functionality in ``contextvars`` to achieve the desired
semantics in cases which are not determined by the Python interpreter. Some
of the semantics described in this section are also afterwards used to
describe the internal implementation.
Leaking yields
''''''''''''''
Using the ``contextvars.leaking_yields`` decorator, one can choose to leak
the context through ``yield`` expressions into the outer context that
drives the generator::
@contextvars.leaking_yields
def genfunc():
assert cvar.value == "outer"
with cvar.assign("inner"):
yield
assert cvar.value == "inner"
assert cvar.value == "outer"
g = genfunc():
with cvar.assign("outer"):
assert cvar.value == "outer"
next(g)
assert cvar.value == "inner"
next(g)
assert cvar.value == "outer"
Capturing contextvar assignments
''''''''''''''''''''''''''''''''
Using ``contextvars.capture()``, one can capture the assignment contexts
that are entered by a block of code. The changes applied by the block of
code can then be reverted and subsequently reapplied, even in another
context::
assert cvar1.value is None # default
assert cvar2.value is None # default
assi1 = cvar1.assign(value1)
assi2 = cvar1.assign(value2)
with contextvars.capture() as delta:
assi1.__enter__()
with cvar2.assign("not captured"):
assert cvar2.value is "not captured"
assi2.__enter__()
assert cvar1.value is value2
delta.revert()
assert cvar1.value is None
assert cvar2.value is None
...
with cvar1.assign(1), cvar2.assign(2):
delta.reapply()
assert cvar1.value is value2
assert cvar2.value == 2
However, reapplying the "delta" if its net contents include deassignments
may not be possible (see also Implementation and Open Issues).
Getting a snapshot of context state
'''''''''''''''''''''''''''''''''''
The function ``contextvars.get_local_state()`` returns an object
representing the applied assignments to all context-local variables in the
context where the function is called. This can be seen as equivalent to
using ``contextvars.capture()`` to capture all context changes from the
beginning of execution. The returned object supports methods ``.revert()``
and ``reapply()`` as above.
Running code in a clean state
'''''''''''''''''''''''''''''
Although it is possible to revert all applied context changes using the
above primitives, a more convenient way to run a block of code in a clean
context is provided::
with context_vars.clean_context():
# here, all context vars start off with their default values
# here, the state is back to what it was before the with block.
Implementation
--------------
This section describes to a variable level of detail how the described
semantics can be implemented. At present, an implementation aimed at
simplicity but sufficient features is described. More details will be added
later.
Alternatively, a somewhat more complicated implementation offers minor
additional features while adding some performance overhead and requiring
more code in the implementation.
Data structures and implementation of the core concept
''''''''''''''''''''''''''''''''''''''''''''''''''''''
Each thread of the Python interpreter keeps its on stack of
``contextvars.Assignment`` objects, each having a pointer to the previous
(outer) assignment like in a linked list. The local state (also returned by
``contextvars.get_local_state()``) then consists of a reference to the top
of the stack and a pointer/weak reference to the bottom of the stack. This
allows efficient stack manipulations. An object produced by
``contextvars.capture()`` is similar, but refers to only a part of the
stack with the bottom reference pointing to the top of the stack as it was
in the beginning of the capture block.
Now, the stack evolves according to the assignment ``__enter__`` and
``__exit__`` methods. For example::
cvar1 = contextvars.Var()
cvar2 = contextvars.Var()
# stack: []
assert cvar1.value is None
assert cvar2.value is None
with cvar1.assign("outer"):
# stack: [Assignment(cvar1, "outer")]
assert cvar1.value == "outer"
with cvar1.assign("inner"):
# stack: [Assignment(cvar1, "outer"),
# Assignment(cvar1, "inner")]
assert cvar1.value == "inner"
with cvar2.assign("hello"):
# stack: [Assignment(cvar1, "outer"),
# Assignment(cvar1, "inner"),
# Assignment(cvar2, "hello")]
assert cvar2.value == "hello"
# stack: [Assignment(cvar1, "outer"),
# Assignment(cvar1, "inner")]
assert cvar1.value == "inner"
assert cvar2.value is None
# stack: [Assignment(cvar1, "outer")]
assert cvar1.value == "outer"
# stack: []
assert cvar1.value is None
assert cvar2.value is None
Getting a value from the context using ``cvar1.value`` can be implemented
as finding the topmost occurrence of a ``cvar1`` assignment on the stack
and returning the value there, or the default value if no assignment is
found on the stack. However, this can be optimized to instead be an O(1)
operation in most cases. Still, even searching through the stack may be
reasonably fast since these stacks are not intended to grow very large.
The above description is already sufficient for implementing the core
concept. Suspendable frames require some additional attention, as explained
in the following.
Implementation of generator and coroutine semantics
'''''''''''''''''''''''''''''''''''''''''''''''''''
Within generators, coroutines and async generators, assignments and
deassignments are handled in exactly the same way as anywhere else.
However, some changes are needed in the builtin generator methods ``send``,
``__next__``, ``throw`` and ``close``. Here is the Python equivalent of the
changes needed in ``send`` for a generator (here ``_old_send`` refers to
the behavior in Python 3.6)::
def send(self, value):
# if decorated with contextvars.leaking_yields
if self.gi_contextvars is LEAK:
# nothing needs to be done to leak context through yields :)
return self._old_send(value)
try:
with contextvars.capture() as delta:
if self.gi_contextvars:
# non-zero captured content from previous iteration
self.gi_contextvars.reapply()
ret = self._old_send(value)
except Exception:
raise
else:
# suspending, revert context changes but
delta.revert()
self.gi_contextvars = delta
return ret
The corresponding modifications to the other methods is essentially
identical. The same applies to coroutines and async generators.
For code that does not use ``contextvars``, the additions are O(1) and
essentially reduce to a couple of pointer comparisons. For code that does
use ``contextvars``, the additions are still O(1) in most cases.
More on implementation
''''''''''''''''''''''
The rest of the functionality, including ``contextvars.leaking_yields``,
contextvars.capture()``, ``contextvars.get_local_state()`` and
``contextvars.clean_context()`` are in fact quite straightforward to
implement, but their implementation will be discussed further in later
versions of this proposal. Caching of assigned values is somewhat more
complicated, and will be discussed later, but it seems that most cases
should achieve O(1) complexity.
Backwards compatibility
=======================
There are no *direct* backwards-compatibility concerns, since a completely
new feature is proposed.
However, various traditional uses of thread-local storage may need a smooth
transition to ``contextvars`` so they can be concurrency-safe. There are
several approaches to this, including emulating task-local storage with a
little bit of help from async frameworks. A fully general implementation
cannot be provided, because the desired semantics may depend on the design
of the framework.
Another way to deal with the transition is for code to first look for a
context created using ``contextvars``. If that fails because a new-style
context has not been set or because the code runs on an older Python
version, a fallback to thread-local storage is used.
Open Issues
===========
Out-of-order de-assignments
---------------------------
In this proposal, all variable deassignments are made in the opposite order
compared to the preceding assignments. This has two useful properties: it
encourages using ``with`` statements to define assignment scope and has a
tendency to catch errors early (forgetting a ``.__exit__()`` call often
results in a meaningful error. To have this as a requirement requirement is
beneficial also in terms of implementation simplicity and performance.
Nevertheless, allowing out-of-order context exits is not completely out of
the question, and reasonable implementation strategies for that do exist.
Rejected Ideas
==============
Dynamic scoping linked to subroutine scopes
-------------------------------------------
The scope of value visibility should not be determined by the way the code
is refactored into subroutines. It is necessary to have per-variable
control of the assignment scope.
Acknowledgements
================
To be added.
References
==========
To be added.
--
+ Koos Zevenhoven + http://twitter.com/k7hoven +
Forwarding my reply, since Google Groups still can't get the Reply-To
headers for the mailing list right, and we still don't know how to
categorically prohibit posting from there.
---------- Forwarded message ----------
From: Nick Coghlan <ncoghlan(a)gmail.com>
Date: 26 September 2017 at 12:51
Subject: Re: [Python-ideas] Fwd: A PEP to define basical metric which
allows to guarantee minimal code quality
To: Alexandre GALODE <alexandre.galode(a)gmail.com>
Cc: python-ideas <python-ideas(a)googlegroups.com>
On 25 September 2017 at 21:49, <alexandre.galode(a)gmail.com> wrote:
> Hi,
>
> Sorry from being late, i was in professional trip to Pycon FR.
>
> I see that the subject is divising advises.
>
> Reading responses, i have impression that my proposal has been saw as
> mandatory, that i don't want of course. As previously said, i see this "PEP"
> as an informational PEP. So it's a guideline, not a mandatory. Each
> developer will have right to ignore it, as each developer can choose to
> ignore PEP8 or PEP20.
>
> Perfect solution does not exist, i know it, but i think this "PEP" could,
> partially, be a good guideline.
Your question is essentially "Are python-dev prepared to offer generic
code quality assessment advice to Python developers?"
The answer is "No, we're not". It's not our role, and it's not a role
we're the least bit interested in taking on. Just because we're the
ones making the software equivalent of hammers and saws doesn't mean
we're also the ones that should be drafting or signing off on people's
building codes :)
Python's use cases are too broad, and what's appropriate for my ad hoc
script to download desktop wallpaper backgrounds, isn't going to be
what's appropriate for writing an Ansible module, which in turn isn't
going to be the same as what's appropriate for writing a highly
scalable web service or a complex data analysis job.
So the question of "What does 'good enough for my purposes' actually
mean?" is something for end users to tackle for themselves, either
individually or collaboratively, without seeking specific language
designer endorsement of their chosen criteria.
However, as mentioned earlier in the thread, it would be *entirely*
appropriate for the folks participating in PyCQA to decide to either
take on this work themselves, or else endorse somebody else taking it
on. I'd see such an effort as being similar to the way that
packaging.python.org originally started as an independent PyPA project
hosted at python-packaging-user-guide.readthedocs.io, with a fair bit
of content already being added before we later requested and received
the python.org subdomain.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
Hi folks:
I was recently looking for an entry-level cpython task to work on in
my spare time and plucked this off of someone's TODO list.
"Make optimizations more fine-grained than just -O and -OO"
There are currently three supported optimization levels (0, 1, and 2).
Briefly summarized, they do the following.
0: no optimizations
1: remove assert statements and __debug__ blocks
2: remove docstrings, assert statements, and __debug__ blocks
>From what I gather, their use-case is assert statements in production
code. More specifically, they want to be able to optimize away
docstrings, but keep the assert statements, which currently isn't
possible with the existing optimization levels.
As a first baby-step, I considered just adding a new optimization
level 3 that keeps asserts but continues to remove docstrings and
__debug__ blocks.
3: remove docstrings and __debug__ blocks
>From a command-line perspective, there is already support for
additional optimization levels. That is, without making any changes,
the optimization level will increase with the number of 0s provided.
$ python -c "import sys; print(sys.flags.optimize)"
0
$ python -OO -c "import sys; print(sys.flags.optimize)"
2
$ python -OOOOOOO -c "import sys; print(sys.flags.optimize)"
7
And the PYTHONOPTIMIZE environment variable will happily assign
something like 42 to sys.flags.optimize.
$ unset PYTHONOPTIMIZE
$ python -c "import sys; print(sys.flags.optimize)"
0
$ export PYTHONOPTIMIZE=2
$ python -c "import sys; print(sys.flags.optimize)"
2
$ export PYTHONOPTIMIZE=42
$ python -c "import sys; print(sys.flags.optimize)"
42
Finally, the resulting __pycache__ folder also already contains the
expected bytecode files for the new optimization levels (
__init__.cpython-37.opt-42.pyc was created for optimization level 42,
for example).
$ tree
.
└── test
├── __init__.py
└── __pycache__
├── __init__.cpython-37.opt-1.pyc
├── __init__.cpython-37.opt-2.pyc
├── __init__.cpython-37.opt-42.pyc
├── __init__.cpython-37.opt-7.pyc
└── __init__.cpython-37.pyc
Adding optimization level 3 is an easy change to make. Here's that
quick proof of concept (minus changes to the docs, etc). I've also
attached that diff as 3.diff.
https://github.com/dianaclarke/cpython/commit/4bd7278d87bd762b2989178e5bfed…
I was initially looking for a more elegant solution that allowed you
to specify exactly which optimizations you wanted, and when I floated
this naive ("level 3") approach off-list to a few core developers,
their feedback confirmed my hunch (too hacky).
So for my second pass at this task, I started with the following two
pronged approach.
1) Changed the various compile signatures to accept a set of
string optimization flags rather than an int value.
2) Added a new command line option N that allows you to specify
any number of individual optimization flags.
For example:
python -N nodebug -N noassert -N nodocstring
The existing optimization options (-O and -OO) still exist in this
approach, but they are mapped to the new optimization flags
("nodebug", "noassert", "nodocstring").
With the exception of the builtin complile() function, all underlying
compile functions would only accept optimization flags going forward,
and the builtin compile() function would accept both an integer
optimize value or a set of optimization flags for backwards
compatibility.
You can find that work-in-progress approach here on github (also
attached as N.diff).
https://github.com/dianaclarke/cpython/commit/3e36cea1fc8ee6f4cdc584851e4c1…
All in all, that approach is going fairly well, but there's a lot of
work remaining, and that diff is already getting quite large (for my
new-contributor status).
Note for example, that I haven't yet tackled adding bytecode files to
__pycache__ that reflect these new optimization flags. Something like:
$ tree
.
└── test
├── __init__.py
└── __pycache__
├── __init__.cpython-37.opt-nodebug-noassert.pyc
├── __init__.cpython-37.opt-nodebug-nodocstring.pyc
├── __init__.cpython-37.opt-nodebug-noassert-nodocstring.pyc
└── __init__.cpython-37.pyc
I'm also not certain if the various compile signatures are even open
for change (int optimize => PyObject *optimizations), or if that's a
no-no.
And there are still a ton of references to "-O", "-OO",
"sys.flags.optimize", "Py_OptimizeFlag", "PYTHONOPTIMIZE", "optimize",
etc that all need to be audited and their implications considered.
I've really enjoyed this task and I'm learning a lot about the c api,
but I think this is a good place to stop and solicit feedback and
direction.
My gut says that the amount of churn and resulting risk is too high to
continue down this path, but I would love to hear thoughts from others
(alternate approaches, ways to limit scope, confirmation that the
existing approach is too entrenched for change, etc).
Regardless, I think the following subset change could merge without
any bigger picture changes, as it just adds test coverage for a case
not yet covered. I can reopen that pull request once I clean up the
commit message a bit (I closed it in the mean time).
https://github.com/python/cpython/pull/3450/commits/bfdab955a94a7fef431548f…
Thanks for your time!
Cheers,
--diana