I have three hashing-related patches for Python 3.6 that are waiting for
review. Altogether the three patches add ten new hash algorithms to the
hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256),
BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256).
SHA-3 / SHAKE: https://bugs.python.org/issue16113
SHA512/224 / SHA512/256: https://bugs.python.org/issue26834
I like to push the patches during the sprints at PyCon. Please assist
A problem has surfaced just this week in 3.5.1. Obviously this is a
good time to fix it for 3.5.2. But there's a big argument over what is
"broken" and what is an appropriate "fix".
As 3.5 Release Manager, I can put my foot down and make rulings, and
AFAIK the only way to overrule me is with the BDFL. In two of three
cases I've put my foot down. In the third I'm pretty sure I'm right,
but IIUC literally everyone with a stated opinion else disagrees with
me. So I thought it best I escalate it. Note that 3.5.2 is going to
wait until the issue is settled and any changes to behavior are written
and checked in.
(Blanket disclaimer for the below: in some places I'm trying to
communicate other's people positions. I apologize if I misrepresented
yours; please reply and correct my mistake. Also, sorry for the length
of this email. But feel even sorrier for me: this debate has already
eaten two days this week.)
For 3.5 os.urandom() was changed: instead of reading from /dev/urandom,
it uses the new system call getrandom() where available. This is a new
system call on Linux (which has already been cloned by Solaris).
getrandom(), as CPython uses it, reads from the same PRNG that
/dev/urandom gets its bits from. But because it's a system call you
don't have to mess around with file handles. Also it always works in
chrooted environments. Sounds like a fine idea.
Also for 3.5, several other places where CPython internally needs random
bits were switched from reading from /dev/urandom to calling
getrandom(). The two that I know of: choosing the seed for hash
randomization, and initializing the default Mersenne Twister for the
There's one subtle but important difference between /dev/urandom and
getrandom(). At startup, Linux seeds the urandom PRNG from the entropy
pool. If the entropy pool is uninitialized, what happens? CPython's
calls to getrandom() will block until the entropy pool is initialized,
which is usually just a few seconds (or less) after startup. But
/dev/urandom *guarantees* that reads will *always* work. If the entropy
pool hasn't been initialized, it pulls numbers from the PRNG before it's
been properly seeded. What this results in depends on various aspects
of the configuration (do you have ECC RAM? how long was the machine
powered down? does the system have a correct realtime clock?). In
extreme circumstances this may mean the "random" numbers are shockingly
Under normal circumstances this minor difference is irrelevant. After
all, when would the entropy pool ever be uninitialized?
(warning, the issue is now astonishingly long, and exhausting to read,
and various bits of it are factually wrong)
A user reports that when starting CPython soon after startup on a fresh
virtual machine, the process would hang for a long time. Someone on the
issue reported observed delays of over 90 seconds. Later we found out:
it wasn't 90 seconds before CPython became usable, these 90 seconds
delays were before systemd timed out and simply killed the process.
It's not clear what the upper bound on the delay might be.
The issue author had already identified the cause: CPython was blocking
on getrandom() in order to initialize hash randomization. On this fresh
virtual machine the entropy pool started out uninitialized. And since
the only thing running on the machine was CPython, and since CPython was
blocked on initialization, the entropy pool was initializing very, very
Other posters to the thread pointed out that the same thing would happen
in "import random", if your code could get that far. The constructor
for the Random() object would seed the Mersenne Twister, which would
call getrandom() and block.
Naturally, callers to os.urandom() could also block for an unbounded
period for the same reason.
MY RULINGS SO FAR
1) The change in 3.5 that means "import random" may block for an
unbounded period of time on Linux due to the switch to getrandom() must
be backed out or amended so that it never blocks.
I *think* everyone agrees with this. The Mersenne Twister is not a
CPRNG, so seeding it with crypto-quality bits isn't necessary. And
unbounded delays are bad.
2) The change in 3.5 that means hash randomization initialization may
block for an unbounded period of time on Linux due to the switch to
getrandom() must be backed out or amended so that it never blocks.
I believe most people agree with me. The cryptography experts
disagree. IIUC both Alex Gaynor and Christian Heimes feel the blocking
is preferable to non-random hash "randomization".
Yes, the bad random data means the hashing will be predictable. Neither
choice is exactly what you want. But most people feel it's simply
unreasonable that in extreme corner cases CPython can block for an
unbounded amount of time before running user code.
Here's where it gets complicated--and where everyone else thinks I'm wrong.
os.urandom() is currently the best place for a Python programmer to get
high-quality random bits. The one-line summary for os.urandom() reads:
"Return a string of n random bytes suitable for cryptographic use."
On 3.4 and before, on Linux, os.urandom() would never block, but if the
entropy pool was uninitialized it could return very-very-poor-quality
random bits. On 3.5.0 and 3.5.1, on Linux, when using the getrandom()
call, it will instead block for an apparently unbounded period before
returning high-quality random bits. The question: is this new behavior
preferable, or should we return to the old behavior?
Since I'm the one writing this email, let me make the case for my
position: I think that os.urandom() should never block on Linux. Why?
1) Functions in the os module that look like OS functions should behave
predictably like thin wrappers over those OS functions.
Most of the time this is exactly what they are. In some cases they're
more sophisticated; examples include os.popen(), os.scandir(), and the
byzantine os.utime(). There are also some functions provided by the os
module that don't resemble any native functionality, but these have
unique names that don't look like anything provided by the OS.
This makes the behavior of the Python function easy to reason about: it
always behaves like your local OS function. Python provides os.stat()
and it behaves like the local stat(). So if you want to know how any os
module function behaves, just read your local man page. Therefore,
os.urandom() should behave exactly like a thin shell around reading the
On Linux, /dev/urandom guarantees that it will never block. This means
it has undesirable behavior if read immediately after a fresh boot. But
this guarantee is so strong that Theodore Ts'o couldn't break it to fix
the undesirable behavior. Instead he added the getrandom() system
call. But he left /dev/urandom alone. Therefore, on Linux, os.urandom()
should behave the same way, and also never block.
2) It's unfair to change the semantics of a well-established function to
such a radical degree.
os.urandom() has been in Python since at least 2.6--I was too lazy to go
back any further. From 2.6 to 3.4, it behaved exactly like
/dev/urandom, which meant that on Linux it would never block. As of
3.5, on Linux, it might now block for an unbounded period of time. Any
code that calls os.urandom() has had its behavior radically changed in
this extreme corner case.
3) os.urandom() doesn't actually guarantee it's suitable for cryptography.
The documentation for os.urandom() has contained this sentence,
untouched, since 2.6:
The returned data should be unpredictable enough for cryptographic
applications, though its exact quality depends on the OS
implementation. On a Unix-like system this will query /dev/urandom,
and on Windows it will use CryptGenRandom().
Of course, version 3.5 added this:
On Linux 3.17 and newer, the getrandom() syscall is now used when
But the waffling about its suitability for cryptography remains
unchanged. So, while it's undesirable that os.urandom() might return
shockingly poor quality random bits, it is *permissible* according to
4) This really is a rare corner-case we're talking about.
I just want to re-state: this case on Linux where /dev/urandom returns
totally predictable bytes, and getrandom() will block, only happens when
the entropy pool for urandom is uninitialized. Although it has been seen
in the field, it's extremely rare. 99.99999%+ of the time, reading
/dev/urandom and calling getrandom() will both return the exact same
high-quality random bits without blocking.
5) This corner-case behavior is fixable externally to CPython.
I don't really understand the subject, but apparently it's entirely
reasonable to expect sysadmins to directly manage the entropy pools of
virtual machines. They should be able to spin up their VMs with a
pre-filled entropy pool. So it should be possible to ensure that
os.urandom() always returns the high-quality random bits we wanted, even
on freshly-booted VMs.
6) Guido and Tim Peters already decided once that os.urandom() should
behave like /dev/urandom.
In 2.7.10, os.urandom() was changed to call getentropy() instead of
reading /dev/urandom when getentropy() was available. getentropy() was
"stunningly slow" on Solaris, on the order of 300x slower than reading
/dev/urandom. Guido and Tim both participated in the discussion on the
issue; Guido also apparently discussed it via email with Theo De Raadt.
While it's not quite apples-to-apples, I think this establishes some
precedent that os.urandom() should
* behave like /dev/urandom, and
* be fast.
On the other side is... everybody else. I've already spent an enormous
amount of time researching and writing and re-writing this email.
Rather than try (and fail) to accurately present the other sides of this
debate, I'm just going to end the email here and let the other
participants reply and voice their views.
Bottom line: Guido, in this extreme corner case on Linux, should
os.urandom() return bad random data like it used to, or should it block
forever like it does in 3.5.0 and 3.5.1?
A question for each of the three release managers:
when is the earliest that you might tag your release and
cutoff submission of further patches for the release?
Terry Jan Reedy
Following discussion a few years back (and rough approval from Guido
), I started work on using OrderedDict for the class definition
namespace by default. The bulk of the effort lay in implementing
OrderedDict in C, which I got landed just in time for 3.5. The
remaining work was quite minimal and the actual change is quite small.
My intention was to land the patch soon, having gone through code
review during PyCon. However, Nick pointed out to me the benefit of
having a concrete point of reference for the change, as well as making
sure it isn't a problem for other implementations. So in that spirit,
here's a PEP for the change. Feedback is welcome, particularly from
from other implementors.
Title: Ordered Class Definition Namespace
Author: Eric Snow <ericsnowcurrently(a)gmail.com>
Type: Standards Track
This PEP changes the default class definition namespace to ``OrderedDict``.
Furthermore, the order in which the attributes are defined in each class
body will now be preserved in ``type.__definition_order__``. This allows
introspection of the original definition order, e.g. by class decorators.
Note: just to be clear, this PEP is *not* about changing ``type.__dict__``
Currently the namespace used during execution of a class body defaults
to dict. If the metaclass defines ``__prepare__()`` then the result of
calling it is used. Thus, before this PEP, if you needed your class
definition namespace to be ``OrderedDict`` you had to use a metaclass.
Metaclasses introduce an extra level of complexity to code and in some
cases (e.g. conflicts) are a problem. So reducing the need for them is
worth doing when the opportunity presents itself. Given that we now have
a C implementation of ``OrderedDict`` and that ``OrderedDict`` is the
common use case for ``__prepare__()``, we have such an opportunity by
defaulting to ``OrderedDict``.
The usefulness of ``OrderedDict``-by-default is greatly increased if the
definition order is directly introspectable on classes afterward,
particularly by code that is independent of the original class definition.
One of the original motivating use cases for this PEP is generic class
decorators that make use of the definition order.
Changing the default class definition namespace has been discussed a
number of times, including on the mailing lists and in PEP 422 and
PEP 487 (see the References section below).
* the default class *definition* namespace is now ``OrderdDict``
* the order in which class attributes are defined is preserved in the
new ``__definition_order__`` attribute on each class
* "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored
* ``__definition_order__`` is a tuple
* ``__definition_order__`` is a read-only attribute
* ``__definition_order__`` is always set:
* if ``__definition_order__`` is defined in the class body then it
* types that do not have a class definition (e.g. builtins) have
their ``__definition_order__`` set to ``None``
* types for which `__prepare__()`` returned something other than
``OrderedDict`` (or a subclass) have their ``__definition_order__``
set to ``None``
The following code demonstrates roughly equivalent semantics::
def __prepare__(cls, *args, **kwargs):
ham = None
eggs = 5
__definition_order__ = tuple(k for k in locals()
if (!k.startswith('__') or
Note that [pep487_] proposes a similar solution, albeit as part of a
This PEP does not break backward compatibility, except in the case that
someone relies *strictly* on dicts as the class definition namespace. This
shouldn't be a problem.
In addition to the class syntax, the following expose the new behavior:
Other Python Implementations
Pending feedback, the impact on Python implementations is expected to
be minimal. If a Python implementation cannot support switching to
`OrderedDict``-by-default then it can always set ``__definition_order__``
The implementation is found in the tracker. [impl_]
type.__dict__ as OrderedDict
Instead of storing the definition order in ``__definition_order__``,
the now-ordered definition namespace could be copied into a new
``OrderedDict``. This would mostly provide the same semantics.
However, using ``OrderedDict`` for ``type,__dict__`` would obscure the
relationship with the definition namespace, making it less useful.
Additionally, doing this would require significant changes to the
semantics of the concrete dict C-API.
A "namespace" Keyword Arg for Class Definition
PEP 422 introduced a new "namespace" keyword arg to class definitions
that effectively replaces the need to ``__prepare__()``. [pep422_]
However, the proposal was withdrawn in favor of the simpler PEP 487.
.. [impl] issue #24254
.. [pep422] PEP 422
.. [pep487] PEP 487
.. [orig] original discussion
.. [followup1] follow-up 1
.. [followup2] follow-up 2
This document has been placed in the public domain.
On 06/09/2016 05:00 PM, Steve Dower wrote:
> If the pattern is really going to be the hasattr check you posted
> earlier, can we just do it for people and save them writing code that
> won't work on different OSs?
No. That's what got us into this mess in the first place.
3.5.0 and 3.5.1 *already* changed to the new behavior, and it resulted
in the situation where CPython blocked forever at startup in these
certain edge cases. os.urandom() has been around for more than a
decade, we can't unilaterally change its semantics now. os.urandom() in
3.5 has to go back to how it behaved on Linux in 3.4. And if I were
release manager for 3.6, I'd say "it has to stay that way in 3.6 too".
However, Guido's already said "don't add os.getrandom() in 3.5", so the
debate is somewhat irrelevant.
Larry Hastings wrote:
On 3.4 and before, on Linux, os.urandom() would never block, but if the
> entropy pool was uninitialized it could return very-very-poor-quality
> random bits. On 3.5.0 and 3.5.1, on Linux, when using the getrandom()
> call, it will instead block for an apparently unbounded period before
> returning high-quality random bits.
Just a point of information here. Ted Ts'o commented on the quality of the
pre-initialization bits; it's not a given that they're "very very poor
quality". Even before the per-boot entropy pool is initialized, the kernel
has a few sources of randomness available to it - viz: interrupt timings,
RDRAND (on x86) and a little per-machine data (uname -a). If RDRAND is
trusted, this is enough to provide quite significant entropy, however
that's not much help to all the ARM devices out there.
The most pressing issue from my perspective is the hash randomization
initialization; as there is currently nothing a script author can do to
influence its behavior (except setting PYTHONHASHSEED before invocation,
which might not be an option).
It should be possible, at least conceptually, for Python to be used to
implement /sbin/init. This isn't currently the case on Linux with Python
3.5.1 and Linux 3.17+
For what it's worth, I do agree with Larry that os.urandom() should hew as
closely as possible to the OS-specific urandom implementation. Adding an
optional "blocking" boolean flag might be a useful addition for 3.6.
Colm Buckley / colm(a)tuatha.org / +353 87 2469146
For binary methods, such as __add__, either do not implement or return
NotImplemented if the other operand/class is not supported.
For non-binary methods, simply do not define.
Except for subclasses when the super-class defines __hash__ and the
subclass is not hashable -- then set __hash__ to None.
Are there any other methods that should be set to None to tell the
run-time that the method is not supported? Or is this a general
mechanism for subclasses to declare any method is unsupported?
There is a small flaw in PEP 492 design -- __aiter__ should not return
an awaitable object that resolves to an asynchronous iterator. It should
return an asynchronous iterator directly.
Let me explain this by showing some examples.
I've discovered this while working on a new asynchronous generators
PEP. Let's pretend that we have them already: if we have a 'yield'
expression in an 'async def' function, the function becomes an
"asynchronous generator function":
async def foo():
# foo -- is an `asynchronous generator function`
# foo() -- is an `asynchronous generator`
If we iterate through "foo()", it will await on "bar()", yield "1",
await on "baz()", and yield "2":
>>> async for el in foo():
If we decide to have a class with an __aiter__ that is an async
generator, we'd write something like this:
async def __aiter__(self):
However, with the current PEP 492 design, the above code would be
invalid! The interpreter expects __aiter__ to return a coroutine, not
an async generator.
I'm still working on the PEP for async generators, targeting CPython
3.6. And once it is ready, it might still be rejected or deferred. But
in any case, this PEP 492 flaw has to be fixed now, in 3.5.2 (since PEP
492 is provisional).
I've created an issue on the bug tracker: http://bugs.python.org/issue27243
The proposed patch fixes the __aiter__ in a backwards compatible way:
1. ceval/GET_AITER opcode calls the __aiter__ method.
2. If the returned object has an '__anext__' method, GET_AITER silently
wraps it in an awaitable, which is equivalent to the following coroutine:
async def wrapper(aiter_result):
3. If the returned object does not have an '__anext__' method, a
DeprecationWarning is raised.