
On 23 June 2016 at 15:54, Victor Stinner <victor.stinner@gmail.com> wrote:
The new exception would potentially be encountered in the following situations:
* Python code calling these APIs during Linux system initialization
I'm not sure that there is such use case in practice.
Can you please try to describe an use case where you would need blocking system urandom *during the Python initialization*?
It looks like my use case 1, but I consider that os.urandom() is *not* called on such use case: https://haypo-notes.readthedocs.io/pep_random.html#use-case-1-init-script
My preference for an exception comes from the fact that we can never prove the non-existence of proprietary software that does certain things, but we *can* ensure that such code gets an easy to debug exception rather than a potential deadlock if it does exist. The argument chain runs: - if such software doesn't exist, it doesn't matter which behaviour we choose - if we're wrong and it does exist, we can choose how it fails: - blocking (with associated potential for init system deadlock) - throwing an exception Given the choice between debugging an apparent system hang and an unexpected exception when testing against a new version of a platform, I'll choose the exception every time.
* Python code running on improperly initialized Linux systems (e.g. embedded hardware without adequate sources of entropy to seed the system random number generator, or Linux VMs that aren't configured to accept entropy from the VM host)
If the program doesn't use os.urandom(), well, we don't care, there is no issue :-)
IMO the interesting use case is when the application really requires secure secret. That's my use case 2, a web server: https://haypo-notes.readthedocs.io/pep_random.html#use-case-2-web-server
I chose to not give the choice to the developer and block on such case. IMO it's accepable because the application should not have to wait forever for urandom.
Should not, but actually can, depending on the characteristics of the underlying system and its runtime environment.
Changing ``os.urandom()`` on Linux ----------------------------------
This PEP proposes that in Python 3.6+, ``os.urandom()`` be updated to call the new Linux ``getrandom()`` syscall in non-blocking mode if available and raise ``BlockingIOError: system random number generator is not ready`` if the kernel reports that the call would block.
To be clear, the behaviour is unchanged on other platforms, right?
Cory Benfield pointed out that the proposal as currently written isn't clear as to whether or not it applies to recent versions of Solaris and Illumos, as they also provide a getrandom() syscall.
I'm just trying to understand the scope of the PEP. It looks like as mine, it is written for Linux. (Even if other platforms may implement the same behaviour later, if needed.)
If it's deliberate to restrict to Linux, you may be more explicit at least in the abstract.
It's in the PEP title: "Allow BlockingIOError in security sensitive APIs on Linux" However, I need to update it to indicate it applies to any system that provides a non-blocking getrandom() syscall.
--
By the way, are you aware of other programming languages or applications using an exception when random would block? (It's not a requirement, I'm just curious.)
No, but I haven't really gone looking either. It's also worth keeping in mind that it's only in the last 12 months folks have even had the *option* of doing better than just reading from /dev/urandom and hoping it's been initialised properly.
By contrast, if ``BlockingIOError`` is raised in those situations, then developers using Python 3.6+ can easily choose their desired behaviour:
1. Loop until the call succeeds (security sensitive)
Is this case different from a blocking os.urandom()?
Yes, as it's up to the application to decide when it wants to check for the system RNG being ready, and how it wants to report that to the user. For example, it may decide to emit a runtime warning before it enters the busy loop (I'm actually having a discussion with Donald in another thread regarding a possible design for a "secrets.wait_for_system_rng()" API that meshes well with the other changes proposed in PEP 522).
2. Switch to using the random module (non-security sensitive)
Hum, I disagree on this point. I don't think that you should start with os.urandom() to fallback on random.
In fact, I only know *one* use case for this: create the random.Random instance when the random module is imported.
In my PEP, I proposed to have a special case for random.Random constructor, implemented in C (to not have to expose anything at the Python level).
We have two use cases for a fallback just in the standard library (SipHash initiliasition and random module initialisation). Rather than assuming no other use cases for the feature exist, we can expose the fallback mechanism we use ourselves and let people decide for themselves whether or not they want to do something similar.
3. Switch to reading ``/dev/urandom`` directly (non-security sensitive)
It is what I propose for the random.Random constructor when the random module is imported.
Again, the question is if there is a real use case for it. And if yes, if the use case common enough to justify the change?
The extreme case is that all applications using os.urandom() would need to be modifiy to add a try/except BlockingIOError. I only exagerate to try to understand the impact of your PEP. I only that only a few applications will use such try/except in practice.
That's where the idea of also adding secrets.wait_for_system_rng() comes, rather than having to wrap every library call in a try/except block (or risk having those APIs become blocking ones such that async developers feel obliged to call them in a separate thread)
As I tried to explain in my PEP, with Python 3.5.2, "the bug" (block on random) became very unlikely.
Aye, I agree with that (hence the references to this being an obscure, Linux-specific problem in PEP 522). However, I think it makes sense to stipulate that someone porting to Python 3.6 *has* unexpectedly encountered the new behaviour, and is trying to debug what has gone wrong with their application/system when comparing the two designs for usability.
Issuing a warning for potentially predictable internal hash initialization
I don't recall Python logging warnings for similar issues. But I don't recall similar issues neither :-)
It's a pretty unique problem, and not one we've been able to detect it in the past.
The challenge for internal hash initialization is that it might be very important to initialize SipHash with a reliably unpredictable random seed (for processes that are exposed to potentially hostile input) or it might be totally unimportant (for processes that never have to deal with untrusted data).
From what I read, /dev/urandom is good even before it is considered as initialized, because the kernel collects various data, but don't increase the entropy estimator.
I'm not completely convinced that a warning is needed. I'm not against it neither. I am doubtful. :-)
Well, let's say that we have a warning. What should the user do in such case? Is it an advice to dig the urandom issue and try to get more entropy?
The warning is for users, no? I imagine that an application can work perfectly for the developer, but only emit the warning for some users depending how the deploy their application.
It's a warning primarily for system integrators (i.e. the folks developing a distro, designing an embedded device or configuring a VM) that they need to either: - reconfigure the application to start later in the boot process (e.g. after the network comes up) - write a systemd PreExec snippet that waits for the system RNG to be initialised (that will be particularly easy if it can be written as "python3 -c 'import secrets; secrets.wait_for_system_rng()") - add a better entropy source to their system The kind of wording I'm thinking of is along the lines of: "Python hash initialization: using potentially predictable fallback hash seed; avoid handling untrusted potentially hostile data in this process"
However, at the same time, since Python has no way to know whether any given invocation needs to handle untrusted data, when the default SipHash initialization fails this *might* indicate a genuine security problem, which should not be allowed to pass silently.
An alternative would be to provide a read-only flag which would indicate if the hash secret is considered as "secure" or not.
Applications considered by security would check the flag and decide themself to emit a warning or not.
I really don't want to add any more knobs and dials that need to be documented and learned if we can possibly avoid it (and I think we can). In this case, turning off hash randomisation entirely will suppress the warning along with hash randomisation itself.
Accordingly, if internal hash initialization needs to fall back to a potentially predictable seed due to the system random number generator not being ready, it will also emit a warning message on ``stderr`` to say that the system random number generator is not available and that processing potentially hostile untrusted data should be avoided.
I know that many of you disagree with me, but I'm not sure that the hash DoS is an important issue.
We should not overestimate the importance of this vulnerability.
It was never particularly important (the payload multiplier on the Denial-of-Service isn't that big), but it was high profile and splashy, and it's relatively cheap to take into account (since folks that know it doesn't apply to them can still turn randomization off entirely)
Affected security sensitive applications ----------------------------------------
Security sensitive applications would need to either change their system configuration so the application is only started after the operating system random number generator is ready for security sensitive operations, or else change their code to busy loop until the operating system is ready::
def blocking_urandom(num_bytes): while True: try: return os.urandom(num_bytes) except BlockingIOError: pass
Such busy-loop may use a lot of CPU :-/ You need a time.sleep() or something like that, no?
Maybe - we can work out the exact details once I've added the secrets.wait_for_system_rng() proposal to the PEP.
A blocking os.urandom() doesn't have such issue ;-)
It also doesn't let an app fail gracefully if it opts not to support running without a pre-initialised system RNG :)
Is it possible that os.urandom() works, but the following os.urandom() call raises a BlockingIOError? If yes, there is an issue with "partial read", we should uses a dedicated exception to return partial data.
No, it's not possible with os.urandom(). (It *can* happen with /dev/random and with getentropy() on OpenBSD and Solaris, which is why folks say "don't use those for anything")
Hopefully, I understood that the issue doesn't occur in pratice. os.urandom() starts with BlockingIOError. But once it "works", it will work forever. Well, at least on Linux.
I don't know how Solaris behaves. I hope that it behaves as Linux (once it works, it always works). At least, I see that Solaris getrandom() can also fails with EAGAIN.
It's the same logic as Linux (once a CSPRNG is properly seeded it can never run out of entropy, but seeding it in the first place does require entropy collection)
Affected non-security sensitive applications --------------------------------------------
Non-security sensitive applications that don't want to assume access to ``/dev/urandom`` (or assume a non-blocking implementation of that device) can be updated to use the ``random`` module as a fallback option::
def pseudorandom_fallback(num_bytes): try: return os.urandom(num_bytes) except BlockingIOError: random.getrandbits(num_bytes*8).to_bytes(num_bytes, "little")
Depending on the application, it may also be appropriate to skip accessing ``os.urandom`` at all, and instead rely solely on the ``random`` module.
Hum, I dislike such change. It overcomplicates applications for a corner-case.
If you use os.urandom(), you already expect security. I prefer to simplify use cases to two cases: (1) you really need security (2) you really don't care of security. If you don't care, use directly the random module. Don't bother with os.urandom() nor having to add try/except BlockingIOError. No?
I *hope* that a regular application will never see BlockingIOError on os.urandom() in the wild.
Yeah, hence why I'm shifting more in favour of the secrets.wait_for_system_rng() idea (which folks can then use as inspiration to write their own "wait for the system RNG" helpers for earlier Python and operating system versions)
Affected Linux specific non-security sensitive applications -----------------------------------------------------------
Non-security sensitive applications that don't need to worry about cross platform compatibility and are willing to assume that ``/dev/urandom`` on Linux will always retain its current behaviour can be updated to access ``/dev/urandom`` directly::
def dev_urandom(num_bytes): with open("/dev/urandom", "rb") as f: return f.read(num_bytes)
Again, I'm against adding such complexity for a corner case. Just use os.urandom().
All of this would be triggered by *application* developers actually hitting the BlockingIOError and decide it was the appropriate course of application for *their* application. The point of this part of the PEP is to highlight that there are some really simple 3-5 functions that let developers get a wide variety of behaviours in ways that are compatible with single-source Python 2/3 code.
For additional background details beyond those captured in this PEP, also see Victor Stinner's summary at http://haypo-notes.readthedocs.io/pep_random.html
Oh, I didn't expect to have references to my document :-) I moved it to: https://haypo-notes.readthedocs.io/summary_python_random_issue.html
http://haypo-notes.readthedocs.io/pep_random.html is now really a PEP ;-)
Cool, I'll update the first reference and also and a reference to your draft PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia