
On 24 June 2016 at 16:21, Victor Stinner <victor.stinner@gmail.com> wrote:
2016-06-24 22:05 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
As such, the idioms I currently have in PEP 522 are wrong - the "wait for the system RNG or not" decision wouldn't be one to be made on a per-call basis, but rather on a per-__main__ execution basis, with developers choosing which user experience they want to support on systems with a non-blocking /dev/urandom:
* this application will fail if you run it before the system RNG is ready (so you may need to add "ExecStartPre=python3 -c 'import secrets; secrets.wait_for_system_rng()'" in your systemd unit file)
In short, if an application is not run using systemd but directly on the command line, it *can* fail with a fatal BlockingIOError?
From the command line, the answer is equally simple: just run "python3 -c 'import secrets; secrets.wait_for_system_rng()'" before the command you actually care about.
As an added bonus, that will work even if the command you care about isn't written in Python 3, and even if it reads from /dev/urandom rather than using the new syscall.
Wait, I don't think that it is an acceptable behaviour from the user point of view.
Compared to Python 2.7, Python 3.4 and Python 3.5.2 where os.urandom() never blocks nor raises an exception on Linux, such behaviour change can be seen as a major regression.
The *only* way to get it to block (your PEP) or raise an exception (PEP 522) is to call os.urandom() (directly or indirectly) when the kernel RNG isn't ready - I consider the relevant analogy to be to PEP 476, where we turned the silent security failure of accepting an invalid or untrusted certificate (or one that didn't cover the named host) into the noisy error of failing to make the connection.
* this application implicitly calls "secrets.wait_for_system_rng()" and hence may block waiting for the system RNG if you run it before the system RNG is ready
It's hard to guess if os.urandom() is used in a third-party library. Maybe it's not. What if a new library version starts to use os.urandom()? Should you start to call secrets.wait_for_system_rng()?
To be safe, I expect that *all* applications should start with secrets.wait_for_system_rng()... It doesn't make sense to have to put such code in *all* applications.
Application developers porting to Python 3.6 can wait and see what their own testing reports and what their users report - they don't need to guess.
The main advantage of the PEP 522 is to control how the "system urandom not initialized yet" case is handled. But you are more and more saying that secrets.wait_for_system_rng() should be used to not get BlockingIOError in most cases. Am I wrong?
I'm saying I think it's an application level decision, not a library level decision.
I expect that some libraries will start to use secrets.wait_for_system_rng() in their own code.
... At the end, it looks you basically reimplemented a blocking os.urandom(), no?
Potentially, but one of the important aspects of PEP 522 is that we're not imposing that outcome by fiat - we're letting developers choose the behaviour they want on a case by case basis, and seeing what the emergent consensus on correct behaviour turns out to be. It's equally possible that the outcome will be that both Python and Linux developers conclude that this is an operating system integration issue, so systemd ends up adding a standard "kernelrng" target that components can wait for, and that then gets included as a requirement for getting to the singleuser state on most distros. If we *do* reach a point where "always call secrets.wait_for_system_rng() before using secrets, random.SystemRandom or os.urandom" is the idiomatic advice for Pythonistas, *then* we can make os.urandom() blocking, and secrets.wait_for_system_rng() would reduced to: def wait_for_system_rng(): os.urandom(1)
--
Why do we have to bother *all* users with secrets.wait_for_system_rng(), while only a very few will really care of the exceptional case?
We don't - only the ones that actually get the exception, since they're necessarily the ones the problem is relevant to. Runtime system configuration related exceptions aren't something to be avoided at all costs - if they were, we'd never have made the changes we did to the way Unicode handling works. A good example of this at the library level is Armin Ronacher's click command line helper - when you run that in the C locale under Python 3, it just fails immediately, since the actual problem is that something has gone wrong and your system locale isn't configured properly. The right answer is almost always to fix the locale configuration settings, not to change anything in the Python code.
Why not adding something for users who want to handle the exceptional case, but make os.urandom() blocking?
The main problem I have with the blocking solution is that if someone hits it unexpectedly, they're left staring at a blinking cursor (at best), and no helpful hints to get started on debugging the problem. If it's a component they didn't write, they also can't really give a good bug report beyond "It hangs when I try to run it". By contrast, PEP 522 gives them an immediate exception and error message: "BlockingIOError: system random number generator is not ready". If they're a developer themselves, they can plug that into Google and hopefully find a relevant answer (which we can virtually guarantee by preseeding Stack Overflow with a suitable response) If they're *not* the application developer, they can paste the traceback into a bug report or support ticket and say "Hey, what's going on here?". At which point, the developer or support tech handling the ticket can do the appropriate Google search and respond accordingly. Now, we could gain most of those debuggability benefits for a blocking solution by trying in non-blocking mode first, then falling back to blocking only if we get EAGAIN - that would let us print a Google-friendly warning message before we implicitly block. That's where the argument of adopting a consistent approach of "try non-blocking first, then maybe fall back to something else if it doesn't work" comes into play - if os.urandom() (and hence indirectly the secrets module) is trying in non-blocking mode and falling back to an alternative, *and* SipHash initialisation is doing that, *and* importing the random module is doing that, it sends a strong message to me that the base primitive here is actually "try to read the system RNG, and maybe fail to do so", rather than "read the system RNG and only return when the requested data is available" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia