On 13 October 2017 at 04:21, Martin Teichmann <lkb.teichmann(a)gmail.com>
> For me, the dataclasses were a typical example for inheritance, to be
> more precise, for metaclasses. I was astonished to see them
> implemented using decorators, and I was not the only one, citing
> > I think it would be useful to write 1-2 sentences about the problem with
> > inheritance -- in that case you pretty much have to use a metaclass, and
> > use of a metaclass makes life harder for people who want to use their own
> > metaclass (since metaclasses don't combine without some manual
> > intervention).
> Python is at a weird point here. At about every new release of Python,
> a new idea shows up that could be easily solved using metaclasses, yet
> every time we hesitate to use them, because of said necessary manual
> intervention for metaclass combination.
Metaclasses currently tend to serve two distinct purposes:
1. Actually altering the runtime behaviour of a class and its children in
non-standard ways (e.g. enums, ABCs, ORMs)
2. Boilerplate reduction in class definitions, reducing the amount of code
you need to write as the author of that class
Nobody has a problem with using metaclasses for the first purpose - that's
what they're for.
It's the second use case where they're problematic, as the fact that
they're preserved on the class becomes a leaky implementation detail, and
the lack of a JIT in CPython means they can also end up being expensive
from a runtime performance perspective.
Mixin classes have the same problem: something that the author may want to
handle as an internal implementation detail leaks through to the runtime
state of the class object.
Code generating decorators like functools.total_ordering and
dataclasses.dataclass (aka attr.s) instead aim at the boilerplate reduction
problem directly: they let you declare in the class body the parts that you
need to specify as the class designer, and then fill in at class definition
time the parts that can be inferred from that base.
If all you have access to is the runtime class, it behaves almost exactly
as if you had written out all the autogenerated methods by hand (there may
be subtle differences in the method metadata, such as the values of
`__qualname__` and `__globals__`).
Such decorators also do more work at class definition time in order to
reduce the amount of runtime overhead introduced by reliance on chained
method calls in a non-JITted Python runtime.
As such, the code generating decorators have a clear domain of
applicability: boilerplate reduction for class definitions without
impacting the way instances behave (other than attribute and method
injection), and without implicitly impacting subclass definitions (other
than through regular inheritance behaviour).
As far as the dataclass interaction with `__slots__` goes, that's a problem
largely specific to slots (and `__metaclass__` before it), in that they're
the only characteristics of a class definition that affect how CPython
allocates memory for the class object itself (the descriptors for the slots
are stored as a pointer array after the class struct, rather than only in
the class dict).
Given PEP 526 variable annotations, __slots__ could potentially benefit
from a __metaclass__ style makeover, allowing an "infer_slots=True" keyword
argument to type.__new__ to request that the list of slots be inferred from
__annotations__ (Slot inference would conflict with setting class level
default values, but that's a real conflict, as you'd be trying to use the
same name on the class object for both the slot descriptor and the default
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
It looks like the discussion about the execution context became
extremely hard to follow. There are many opinions on how the spec for
generators should look like. What seems to be "natural"
behaviour/example to one, seems to be completely unreasonable to other
people. Recent emails from Guido indicate that he doesn't want to
implement execution contexts for generators (at least in 3.7).
In another thread Guido said this: "... Because coroutines and
generators are similar under the covers, Yury demonstrated the issue
with generators instead of coroutines (which are unfamiliar to many
people). And then somehow we got hung up about fixing the problem in
And Guido is right. My initial motivation to write PEP 550 was to
solve my own pain point, have a solution for async code.
'threading.local' is completely unusable there, but complex code bases
demand a working solution. I thought that because coroutines and
generators are so similar under the hood, I can design a simple
solution that will cover all edge cases. Turns out it is not possible
to do it in one pass.
Therefore, in order to make some progress, I propose to split the
problem in half:
Stage 1. A new execution context PEP to solve the problem *just for
async code*. The PEP will target Python 3.7 and completely ignore
synchronous generators and asynchronous generators. It will be based
on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
optimization) and borrow some good API decisions from PEP 550 v3+
(contextvars module, ContextVar class). The API (and C-API) will be
designed to be future proof and ultimately allow transition to the
Stage 2. When Python 3.7 is out, we'll see how people use execution
contexts for async code and collect feedback. If we recognize that
Python users want execution contexts for generators/asynchronous
generators, we'll make a new PEP to add support for them in Python
3.8. That future discussion will be focused on generators
specifically, and therefore I expect it to be somewhat more focused.
I will start working on the new PEP for stage 1 tomorrow. I expect to
have a first version by the end of the week.
I will also publish PEP 550 v1 as a separate PEP (as v1 is a totally
different PEP anyways).
Since the beginning of the year, I'm working on a tool to track if all
security vulnerabilities are fixed in all Python maintained versions
(versions still accepting security fixes):
Currently, five branches are maintained: 2.7, 3.4, 3.5, 3.6 and master.
Thanks to Ned Deily and Georg Brandl, Python 3.3 reached its
end-of-life (EOL) last month, after 5 years of good service (as
expected). It reduced the number of maintained branches from six to
five :-) Python 3.3.7 released last months contains the last security
The good news is that we got releases last months with fixes for
almost all security vulnerabilities. Only Python 3.4 and Python 3.5
have two known vulnerabilities, but I consider that their severity is
not high hopefully.
"Expat 2.2.3" is not fixed yet in Python 3.4 and 3.5, but I'm not sure
that Python is really affected by fixed Expat vulnerabilities, since
Python uses its own code to generate a secret key for the Expat "hash
secret". Our embedded expat copy is used on Windows and macOS, but not
"update zlib to 1.2.11" was fixed in the Python 3.4 branch, but no
release was made yet. This issue only impacts Windows. Linux and macOS
use the system zlib.
it's great to see that so many developers are working on speeding up
Python's startup. The improvements are going to make Python more
suitable for command line scripts. However I'm worried that some
approaches are going to make other use cases slower and less efficient.
I'm talking about downsides of lazy initialization and deferred imports.
For short running command line scripts, lazy initialization of regular
expressions and deferred import of rarely used modules can greatly
reduce startup time and reduce memory usage.
For long running processes, deferring imports and initialization can be
a huge performance problem. A typical server application should
initialize as much as possible at startup and then signal its partners
that it is ready to serve requests. A deferred import of a module is
going to slow down the first request that happens to require the module.
This is unacceptable for some applications, e.g. Raymond's example of
It's even worse for forking servers. A forking HTTP server handles each
request in a forked child. Each child process has to compile a lazy
regular expression or important a deferred module over and over.
uWSGI's emperor / vassal mode us a pre-fork model with multiple server
processes to efficiently share memory with copy-on-write semantics. Lazy
imports will make the approach less efficient and slow down forking of
TL;DR please refrain from moving imports into functions or implementing
lazy modes, until we have figured out how to satisfy requirements of
both scripts and long running services. We probably need a PEP...
Python 3.7.0a2 is the second of four planned alpha previews of Python 3.7,
the next feature release of Python. During the alpha phase, Python 3.7
remains under heavy development: additional features will be added
and existing features may be modified or deleted. Please keep in mind
that this is a preview release and its use is not recommended for
production environments. The next preview, 3.7.0a3, is planned for
2017-11-27. You can find Python 3.7.0a2 and more information here:
nad(a)python.org -- 
Python uses a few categories to group bugs (on bugs.python.org) and
NEWS entries (in the Python changelog). List used by the blurb tool:
#.. section: Security
#.. section: Core and Builtins
#.. section: Library
#.. section: Documentation
#.. section: Tests
#.. section: Build
#.. section: Windows
#.. section: macOS
#.. section: IDLE
#.. section: Tools/Demos
#.. section: C API
My problem is that almost all changes go into "Library" category. When
I read long changelogs, it's sometimes hard to identify quickly the
context (ex: impacted modules) of a change.
It's also hard to find open bugs of a specific module on
bugs.python.org, since almost all bugs are in the very generic
"Library" category. Using full text returns "false positives".
I would prefer to see more specific categories like:
* Buildbots: only issues specific to buildbots
* Networking: socket, asyncio, asyncore, asynchat modules
* Security: ssl module but also vulnerabilities in any other part of
CPython -- we already added a Security category in NEWS/blurb
* Parallelim: multiprocessing and concurrent.futures modules
It's hard to find categories generic enough to not only contain a
single item, but not contain too many items neither. Other ideas:
* XML: xml.doc, xml.etree, xml.parsers, xml.sax modules
* Import machinery: imp and importlib modules
* Typing: abc and typing modules
The best would be to have a mapping of a module name into a category,
and make sure that all modules have a category. We might try to count
the number of commits and NEWS entries of the last 12 months to decide
if a category has the correct size.
I don't think that we need a distinct categoy for each module. We can
put many uncommon modules in a generic category.
By the way, we need maybe also a new "module name" field in the bug
tracker. But then comes the question of normalizing module names. For
example, should "email.message" be normalized to "email"? Maybe store
"email.message" but use "email" for search, display the module in the
issue title, etc.
FYI I just merged two pull requests into Python 2.7, each add a new
environment variable changing the behaviour in debug mode:
bpo-31733: don't dump "[xxx refs]" into stderr by default anymore
bpo-31692, bpo-19527: don't dump allocations counts by default anymore
I never ever used "[xxx refs]" to detect a reference leak. I only use
"./python -m test -R 3:3 test_xxx" to detect reference leaks. To be
honest, usually I only run such test explicitly when one of our
"Refleaks" buildbot starts to comlain :-)
I have implemented a C preprocessor written in Python which gives some
useful visualisations of source code, particularly macro usage:
I have been running this on the CPython source code and it occurs to me
that this might be useful to the python-dev community.
For example the Python dictionary source code is visualised here:
index_dictobject.c_a3f5bfec1ed531371fb1a2bcdcb2e9c2.html I found this
really useful when I was getting a segfault during a dictionary insert from
my C code. The segfault was on this line http://cpip.readthedocs.io/en/
c2.html#1130 but it is hard to see what is going on with macros inside
macros. If you click on the link on the left end of the line it takes you
to the full expansion of the macros http://cpip.readthedocs.io/en/
latest/_static/dictobject.c/dictobject.c.html#1130 as this is what the
compiler and debugger see.
I could examine these values in GDB and figure out what was going on. I
could also figure what that MAINTAIN_TRACKING macro was doing by looking at
the macros page generated by CPIP: http://cpip.readthedocs.io/en/
and following those links.
I was wondering if it would be valuable to python-dev developers if this
tool was run regularly over the CPython source tree(s). A single source
tree takes about 12 CPU hours to process and generates 8GB of HTML/SVG. If
this is useful then where to host this?