BDFL ruling request: should we block forever waiting for high-quality random bits?
A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix". As 3.5 Release Manager, I can put my foot down and make rulings, and AFAIK the only way to overrule me is with the BDFL. In two of three cases I've put my foot down. In the third I'm pretty sure I'm right, but IIUC literally everyone with a stated opinion else disagrees with me. So I thought it best I escalate it. Note that 3.5.2 is going to wait until the issue is settled and any changes to behavior are written and checked in. (Blanket disclaimer for the below: in some places I'm trying to communicate other's people positions. I apologize if I misrepresented yours; please reply and correct my mistake. Also, sorry for the length of this email. But feel even sorrier for me: this debate has already eaten two days this week.) BACKGROUND For 3.5 os.urandom() was changed: instead of reading from /dev/urandom, it uses the new system call getrandom() where available. This is a new system call on Linux (which has already been cloned by Solaris). getrandom(), as CPython uses it, reads from the same PRNG that /dev/urandom gets its bits from. But because it's a system call you don't have to mess around with file handles. Also it always works in chrooted environments. Sounds like a fine idea. Also for 3.5, several other places where CPython internally needs random bits were switched from reading from /dev/urandom to calling getrandom(). The two that I know of: choosing the seed for hash randomization, and initializing the default Mersenne Twister for the random module. There's one subtle but important difference between /dev/urandom and getrandom(). At startup, Linux seeds the urandom PRNG from the entropy pool. If the entropy pool is uninitialized, what happens? CPython's calls to getrandom() will block until the entropy pool is initialized, which is usually just a few seconds (or less) after startup. But /dev/urandom *guarantees* that reads will *always* work. If the entropy pool hasn't been initialized, it pulls numbers from the PRNG before it's been properly seeded. What this results in depends on various aspects of the configuration (do you have ECC RAM? how long was the machine powered down? does the system have a correct realtime clock?). In extreme circumstances this may mean the "random" numbers are shockingly predictable! Under normal circumstances this minor difference is irrelevant. After all, when would the entropy pool ever be uninitialized? THE PROBLEM Issue #26839: http://bugs.python.org/issue26839 (warning, the issue is now astonishingly long, and exhausting to read, and various bits of it are factually wrong) A user reports that when starting CPython soon after startup on a fresh virtual machine, the process would hang for a long time. Someone on the issue reported observed delays of over 90 seconds. Later we found out: it wasn't 90 seconds before CPython became usable, these 90 seconds delays were before systemd timed out and simply killed the process. It's not clear what the upper bound on the delay might be. The issue author had already identified the cause: CPython was blocking on getrandom() in order to initialize hash randomization. On this fresh virtual machine the entropy pool started out uninitialized. And since the only thing running on the machine was CPython, and since CPython was blocked on initialization, the entropy pool was initializing very, very slowly. Other posters to the thread pointed out that the same thing would happen in "import random", if your code could get that far. The constructor for the Random() object would seed the Mersenne Twister, which would call getrandom() and block. Naturally, callers to os.urandom() could also block for an unbounded period for the same reason. MY RULINGS SO FAR 1) The change in 3.5 that means "import random" may block for an unbounded period of time on Linux due to the switch to getrandom() must be backed out or amended so that it never blocks. I *think* everyone agrees with this. The Mersenne Twister is not a CPRNG, so seeding it with crypto-quality bits isn't necessary. And unbounded delays are bad. 2) The change in 3.5 that means hash randomization initialization may block for an unbounded period of time on Linux due to the switch to getrandom() must be backed out or amended so that it never blocks. I believe most people agree with me. The cryptography experts disagree. IIUC both Alex Gaynor and Christian Heimes feel the blocking is preferable to non-random hash "randomization". Yes, the bad random data means the hashing will be predictable. Neither choice is exactly what you want. But most people feel it's simply unreasonable that in extreme corner cases CPython can block for an unbounded amount of time before running user code. OS.URANDOM() Here's where it gets complicated--and where everyone else thinks I'm wrong. os.urandom() is currently the best place for a Python programmer to get high-quality random bits. The one-line summary for os.urandom() reads: "Return a string of n random bytes suitable for cryptographic use." On 3.4 and before, on Linux, os.urandom() would never block, but if the entropy pool was uninitialized it could return very-very-poor-quality random bits. On 3.5.0 and 3.5.1, on Linux, when using the getrandom() call, it will instead block for an apparently unbounded period before returning high-quality random bits. The question: is this new behavior preferable, or should we return to the old behavior? Since I'm the one writing this email, let me make the case for my position: I think that os.urandom() should never block on Linux. Why? 1) Functions in the os module that look like OS functions should behave predictably like thin wrappers over those OS functions. Most of the time this is exactly what they are. In some cases they're more sophisticated; examples include os.popen(), os.scandir(), and the byzantine os.utime(). There are also some functions provided by the os module that don't resemble any native functionality, but these have unique names that don't look like anything provided by the OS. This makes the behavior of the Python function easy to reason about: it always behaves like your local OS function. Python provides os.stat() and it behaves like the local stat(). So if you want to know how any os module function behaves, just read your local man page. Therefore, os.urandom() should behave exactly like a thin shell around reading the local /dev/urandom. On Linux, /dev/urandom guarantees that it will never block. This means it has undesirable behavior if read immediately after a fresh boot. But this guarantee is so strong that Theodore Ts'o couldn't break it to fix the undesirable behavior. Instead he added the getrandom() system call. But he left /dev/urandom alone. Therefore, on Linux, os.urandom() should behave the same way, and also never block. 2) It's unfair to change the semantics of a well-established function to such a radical degree. os.urandom() has been in Python since at least 2.6--I was too lazy to go back any further. From 2.6 to 3.4, it behaved exactly like /dev/urandom, which meant that on Linux it would never block. As of 3.5, on Linux, it might now block for an unbounded period of time. Any code that calls os.urandom() has had its behavior radically changed in this extreme corner case. 3) os.urandom() doesn't actually guarantee it's suitable for cryptography. The documentation for os.urandom() has contained this sentence, untouched, since 2.6: The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). Of course, version 3.5 added this: On Linux 3.17 and newer, the getrandom() syscall is now used when available. But the waffling about its suitability for cryptography remains unchanged. So, while it's undesirable that os.urandom() might return shockingly poor quality random bits, it is *permissible* according to the documentation. 4) This really is a rare corner-case we're talking about. I just want to re-state: this case on Linux where /dev/urandom returns totally predictable bytes, and getrandom() will block, only happens when the entropy pool for urandom is uninitialized. Although it has been seen in the field, it's extremely rare. 99.99999%+ of the time, reading /dev/urandom and calling getrandom() will both return the exact same high-quality random bits without blocking. 5) This corner-case behavior is fixable externally to CPython. I don't really understand the subject, but apparently it's entirely reasonable to expect sysadmins to directly manage the entropy pools of virtual machines. They should be able to spin up their VMs with a pre-filled entropy pool. So it should be possible to ensure that os.urandom() always returns the high-quality random bits we wanted, even on freshly-booted VMs. 6) Guido and Tim Peters already decided once that os.urandom() should behave like /dev/urandom. Issue #25003: http://bugs.python.org/issue25003 In 2.7.10, os.urandom() was changed to call getentropy() instead of reading /dev/urandom when getentropy() was available. getentropy() was "stunningly slow" on Solaris, on the order of 300x slower than reading /dev/urandom. Guido and Tim both participated in the discussion on the issue; Guido also apparently discussed it via email with Theo De Raadt. While it's not quite apples-to-apples, I think this establishes some precedent that os.urandom() should * behave like /dev/urandom, and * be fast. -- On the other side is... everybody else. I've already spent an enormous amount of time researching and writing and re-writing this email. Rather than try (and fail) to accurately present the other sides of this debate, I'm just going to end the email here and let the other participants reply and voice their views. Bottom line: Guido, in this extreme corner case on Linux, should os.urandom() return bad random data like it used to, or should it block forever like it does in 3.5.0 and 3.5.1? //arry/
I understood that Christian Heimes and/or Donald Stufft are interested to work on a PEP. 2016-06-09 13:25 GMT+02:00 Larry Hastings <larry@hastings.org>:
A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix".
IMHO the bug is now fixed in 3.5.2 as I explained at: http://haypo-notes.readthedocs.io/pep_random.html#status-of-python-3-5-2
THE PROBLEM
Issue #26839:
http://bugs.python.org/issue26839
(warning, the issue is now astonishingly long, and exhausting to read, and various bits of it are factually wrong)
You may want to read my summary: http://haypo-notes.readthedocs.io/pep_random.html I'm not interested to reply to Larry's email point per point. IHMO a formal PEP is now required for Python 3.6 (to enhance os.urandom and clarify Python behaviour before urandom is initialized). Python 3.5.2 is fixed, there is no more urgency ;-) Victor
On 2016-06-09 13:25, Larry Hastings wrote:
A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix".
As 3.5 Release Manager, I can put my foot down and make rulings, and AFAIK the only way to overrule me is with the BDFL. In two of three cases I've put my foot down. In the third I'm pretty sure I'm right, but IIUC literally everyone with a stated opinion else disagrees with me. So I thought it best I escalate it. Note that 3.5.2 is going to wait until the issue is settled and any changes to behavior are written and checked in.
(Blanket disclaimer for the below: in some places I'm trying to communicate other's people positions. I apologize if I misrepresented yours; please reply and correct my mistake. Also, sorry for the length of this email. But feel even sorrier for me: this debate has already eaten two days this week.)
Thanks for the digest, Larry. I would appreciate if we could split the issue into three separate problems: 1) behavior of os.urandom() 2) initialization of _Py_HashSecret for byte, str and XML hash randomization. 3) initialization of default random.random Mersenne-Twister As of now 2 and 3 are the culprit for blocking starting. Both happen to use _PyOS_URandom() either directly or indirectly through os.urandom(). We chose to use the OS random source because it was convenient. It is not a necessity. The seed for Mersenne-Twister and the keys for hash randomization don't have to be strong cryptographic values in all cases. They just have to be hard-to-guess by an attacker. In case of scripts in early boot, there are no viable attack scenarios. Therefore I propose to fix problem 2 and 3: - add a new random_seed member to _Py_HashSecret and use it to derive an initial Mersenne-Twister state for the default random instance of the random module. - try CPRNG for _Py_HashSecret first, fall back to a user space RNG when the Kernel's CPRNG would block. For some operating systems like Windows and OSX, we can assume that Kernel CPRNG is always available. For Linux we can use getrandom() in non-blocking mode and handle EWOULDBLOCK. On BSD the seed state can be queried from /proc. Christian
On 9 Jun 2016, at 12:54, Christian Heimes <christian@python.org> wrote:
Therefore I propose to fix problem 2 and 3:
- add a new random_seed member to _Py_HashSecret and use it to derive an initial Mersenne-Twister state for the default random instance of the random module.
- try CPRNG for _Py_HashSecret first, fall back to a user space RNG when the Kernel's CPRNG would block.
For some operating systems like Windows and OSX, we can assume that Kernel CPRNG is always available. For Linux we can use getrandom() in non-blocking mode and handle EWOULDBLOCK. On BSD the seed state can be queried from /proc.
I am in agreement with Christian here. Let me add: Larry has suggested that it’s ok that os.urandom() can degrade to weak random numbers in part because "os.urandom() doesn't actually guarantee it's suitable for cryptography.” That’s true, that is what the documentation says. However, that documentation has been emphatically disagreed with by the entire Python ecosystem *including* the Python standard library. Both random.SystemRandom and the secrets module use os.urandom() to generate their random numbers. The secrets module says this right at the top: "The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.” Regressing the behaviour in os.urandom() would mean that this statement is not unequivocally true but only situationally true. It would be more accurate to say “The secrets module should generate cryptographically strong random numbers most of the time”. So I’d argue that while os.urandom() does not make these promises, the rest of the standard library behaves like it does. While we’re here I should note that the cryptography project unequivocally recommends os.urandom[0], and that this aspect of Linux’s /dev/urandom behaviour is considered to be a dangerous misfeature by almost everyone in the crypto community. The Linux kernel can’t change this stuff easily because they mustn’t break userspace. Python *is* userspace, we can do what we like, and we should be aiming to make sure that doing the obvious thing in Python amounts to doing the *right* thing. *Obviously* this shouldn’t block startup, and obviously we should fix that, but I disagree that we should be reverting the change to os.urandom(). Cory [0]: https://cryptography.io/en/latest/random-numbers/
On Thu, 09 Jun 2016 13:12:22 +0100, Cory Benfield <cory@lukasa.co.uk> wrote:
The Linux kernel can’t change this stuff easily because they mustn’t break userspace. Python *is* userspace, we can do what we like, and we
I don't have specific input on the rest of this discussion, but I disagree strongly with this statement. The environment in which python programs run, ie: the python runtime and standard library, are *our* "userspace", and the same constraints apply to our making changes there as apply to the linux kernel and its userspace...even though we knowingly break those constraints from time to time[*]. --David [*] Which I think the twisted folks at least would argue we shouldn't be doing :)
Excerpts from R. David Murray's message of 2016-06-09 08:41:01 -0400:
On Thu, 09 Jun 2016 13:12:22 +0100, Cory Benfield <cory@lukasa.co.uk> wrote:
The Linux kernel can���t change this stuff easily because they mustn���t break userspace. Python *is* userspace, we can do what we like, and we
I don't have specific input on the rest of this discussion, but I disagree strongly with this statement. The environment in which python programs run, ie: the python runtime and standard library, are *our* "userspace", and the same constraints apply to our making changes there as apply to the linux kernel and its userspace...even though we knowingly break those constraints from time to time[*].
--David
[*] Which I think the twisted folks at least would argue we shouldn't be doing :)
I agree with David. We shouldn't break existing behavior in a way that might lead to someone else's software being unusable. Adding a new API that does block allows anyone to call that when they want guaranteed random values, and the decision about whether to block or not can be placed in the application developer's hands. Christian's points about separating the various cases and solutions also make sense. Doug
On Jun 9, 2016, at 8:53 AM, Doug Hellmann <doug@doughellmann.com> wrote:
Excerpts from R. David Murray's message of 2016-06-09 08:41:01 -0400:
On Thu, 09 Jun 2016 13:12:22 +0100, Cory Benfield <cory@lukasa.co.uk> wrote:
The Linux kernel can���t change this stuff easily because they mustn���t break userspace. Python *is* userspace, we can do what we like, and we
I don't have specific input on the rest of this discussion, but I disagree strongly with this statement. The environment in which python programs run, ie: the python runtime and standard library, are *our* "userspace", and the same constraints apply to our making changes there as apply to the linux kernel and its userspace...even though we knowingly break those constraints from time to time[*].
--David
[*] Which I think the twisted folks at least would argue we shouldn't be doing :)
I agree with David. We shouldn't break existing behavior in a way that might lead to someone else's software being unusable.
Adding a new API that does block allows anyone to call that when they want guaranteed random values, and the decision about whether to block or not can be placed in the application developer's hands.
I think this is a terrible compromise. The new API is going to be exactly the same as the old API in 99.9999% of cases and it's fighting against the entire software ecosystem's suggestion of what to use ("use urandom" is basically a meme at this point). This is like saying that we can't switch to verifying HTTPS by default because a one in a million connection might have different behavior instead of being silently insecure. — Donald Stufft
On 9 Jun 2016, at 13:53, Doug Hellmann <doug@doughellmann.com> wrote:
I agree with David. We shouldn't break existing behavior in a way that might lead to someone else's software being unusable.
What does ‘usable’ mean? Does it mean “the code must execute from beginning to end”? Or does it mean “the code must maintain the expected invariants”? If it’s the second, what reasonably counts as “the expected invariants”? The problem here is that both definitions of ‘broken’ are unclear. If we leave os.urandom() as it is, there is a small-but-nonzero change that your program will hang, potentially indefinitely. If we change it back, there is a small-but-nonzero chance your program will generate you bad random numbers. If we assume, for a moment, that os.urandom() doesn’t get called during Python startup (that is that we adopt Christian’s approach to deal with random and SipHash as separate concerns), what we’ve boiled down to is: your application called os.urandom() so early that you’ve got weak random numbers, does it hang or proceed? Those are literally our two options. These two options can be described a different way. If you didn’t actually need strong random numbers but were affected by the hang, that program failed obviously, and it failed closed. You *will* notice that your program didn’t start up, you’ll investigate, and you’ll take action. On the other hand, if you need strong random numbers but were affected by os.urandom() returning bad random numbers, you almost certainly will *not* notice, and your program will have failed *open*: that is, you are exposed to a security risk, and you have no way to be alerted to that fact. For my part, I think the first failure mode is *vastly* better than the second, even if the first failure mode affects vastly more people than the second one does. Failing early, obviously, and safely is IMO much, much better than failing late, silently, and dangerously. I’d argue that all the security disagreements that happen in this list boil down to weighting that differently. For my part, I want code that expects to be used in a secure context to fail *as loudly as possible* if it is unable to operate securely. And for that reason:
Adding a new API that does block allows anyone to call that when they want guaranteed random values, and the decision about whether to block or not can be placed in the application developer's hands.
I’d rather flip this around. Add a new API that *does not* block. Right now, os.urandom() is trying to fill two niches, one of which is security focused. I’d much rather decide that os.urandom() is the secure API and fail as loudly as possible when people are using it insecurely than to decide that os.urandom() is the *insecure* API and require changes. This is because, again, people very rarely notice this kind of new API introduction unless their code explodes when they migrate. If you think you can find a way to blow up the secure crypto code only, I’m willing to have that patch too, but otherwise I really think that those who expect this code to be safe should be prioritised over those who expect it to be 100% available. My ideal solution: change os.urandom() to throw an exception if the kernel CSPRNG is not seeded, and add a new function for saying you don’t care if the CSPRNG isn’t seeded, with all the appropriate “don’t use this unless you’re sure” warnings on it. Cory
On Jun 9, 2016, at 9:27 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 9 Jun 2016, at 13:53, Doug Hellmann <doug@doughellmann.com> wrote:
I agree with David. We shouldn't break existing behavior in a way that might lead to someone else's software being unusable.
What does ‘usable’ mean? Does it mean “the code must execute from beginning to end”? Or does it mean “the code must maintain the expected invariants”? If it’s the second, what reasonably counts as “the expected invariants”?
The code must not cause the user’s computer to completely freeze in a way that makes their VM appear to be failing to boot?
The problem here is that both definitions of ‘broken’ are unclear. If we leave os.urandom() as it is, there is a small-but-nonzero change that your program will hang, potentially indefinitely. If we change it back, there is a small-but-nonzero chance your program will generate you bad random numbers.
If we assume, for a moment, that os.urandom() doesn’t get called during Python startup (that is that we adopt Christian’s approach to deal with random and SipHash as separate concerns), what we’ve boiled down to is: your application called os.urandom() so early that you’ve got weak random numbers, does it hang or proceed? Those are literally our two options.
I agree those are the two options. I want the application developer to make the choice, not us.
These two options can be described a different way. If you didn’t actually need strong random numbers but were affected by the hang, that program failed obviously, and it failed closed. You *will* notice that your program didn’t start up, you’ll investigate, and you’ll take action. On the other hand, if you need strong random numbers but were affected by os.urandom() returning bad random numbers, you almost certainly will *not* notice, and your program will have failed *open*: that is, you are exposed to a security risk, and you have no way to be alerted to that fact.
For my part, I think the first failure mode is *vastly* better than the second, even if the first failure mode affects vastly more people than the second one does. Failing early, obviously, and safely is IMO much, much better than failing late, silently, and dangerously.
I’d argue that all the security disagreements that happen in this list boil down to weighting that differently. For my part, I want code that expects to be used in a secure context to fail *as loudly as possible* if it is unable to operate securely. And for that reason:
Adding a new API that does block allows anyone to call that when they want guaranteed random values, and the decision about whether to block or not can be placed in the application developer's hands.
I’d rather flip this around. Add a new API that *does not* block. Right now, os.urandom() is trying to fill two niches, one of which is security focused. I’d much rather decide that os.urandom() is the secure API and fail as loudly as possible when people are using it insecurely than to decide that os.urandom() is the *insecure* API and require changes.
This is because, again, people very rarely notice this kind of new API introduction unless their code explodes when they migrate. If you think you can find a way to blow up the secure crypto code only, I’m willing to have that patch too, but otherwise I really think that those who expect this code to be safe should be prioritised over those who expect it to be 100% available.
My ideal solution: change os.urandom() to throw an exception if the kernel CSPRNG is not seeded, and add a new function for saying you don’t care if the CSPRNG isn’t seeded, with all the appropriate “don’t use this unless you’re sure” warnings on it.
All of which fails to be backwards compatible (new exceptions and hanging behavior), which means you’re breaking apps. Introducing a new API lets the developers who care about strong random values use them without breaking anyone else. Doug
On Jun 9, 2016, at 9:48 AM, Doug Hellmann <doug@doughellmann.com> wrote:
All of which fails to be backwards compatible (new exceptions and hanging behavior), which means you’re breaking apps. Introducing a new API lets the developers who care about strong random values use them without breaking anyone else.
I assert that the vast bulk of users of os.urandom are using it because they care about strong random values, not because they care about the nuances of it's behavior on Linux. You're suggesting that almost every [1] single use of os.urandom in the wild should switch to this new API. Forcing the multitudes to adapt for the minority is just pointless churn and pain. Besides, Python has never held backwards compatibility sacred above all else and regularly breaks it in X.Y+1 releases when there is good reason to do so. Just yesterday there was discussion on removing bytes(n) from Python 3.x not because it's dangerous in any way, but because it's behavior makes it slightly confusing in an extremely obvious way in a PEP that appears like it has a reasonably good chance of being accepted. [1] I would almost go as far as to call it every single use, but I'm sure someone can dig up one person somewhere who purposely used this behavior. — Donald Stufft
Wow. I have to decide an issue on which lots of people I respect disagree strongly. So no matter how I decide some of you are going to hate me. Oh well. :-( So let's summarize the easy part first. It seems that there is actually agreement that for the initialization of hash randomization and for the random module's Mersenne Twister initialization it is not worth waiting. That leaves direct calls to os.urandom(). I don't think this should block either. I'm not a security expert. I'm not really an expert in anything. But I often have a good sense for what users need or want. In this case it's clear what users want: they don't want Python to hang waiting for random numbers. Take an example from asyncio. If os.urandom() could block, then an ayncio coroutine that wants to call it would have to move that call to a separate thread using loop.run_in_executor() and await the resulting Future, just to avoid blocking all I/O. But you can't test such code, because in practice when you're there to test it, it will never block anyway. So nobody will write it that way, and everybody's code will have a subtle bug (i.e. a coroutine may block without letting other coroutines run). And it's not just bare calls to os.urandom() -- it's any call to library code that might call os.urandom(). Who documents whether their library call uses os.urandom()? It's unknowable. And therein lies madness. The problem with security experts is that they're always right when they say you shouldn't do something. The only truly secure computer is one that's disconnected and buried 6 feet under the ground. There's always a scenario through which an attacker could exploit a certain behavior. And there's always the possibility that the computer that's thus compromised is guarding a list of Chinese dissidents, or a million credit card numbers, or the key Apple uses to sign iPhone apps. But much more likely it just has my family photos and 100 cloned GitHub projects. And the only time when os.urandom() is going to block on me is probably when I'm rebooting a development VM and wondering why it's so slow. Maybe we can put in a warning when getrandom(..., GRND_NONBLOCK) returns EAGAIN? And then award a prize to people who can make it print that warning. Maybe we'll find a way to actually test this code. -- --Guido van Rossum (python.org/~guido)
On Jun 9, 2016, at 11:52 AM, Guido van Rossum <guido@python.org> wrote:
Wow. I have to decide an issue on which lots of people I respect disagree strongly. So no matter how I decide some of you are going to hate me. Oh well. :-(
So let's summarize the easy part first. It seems that there is actually agreement that for the initialization of hash randomization and for the random module's Mersenne Twister initialization it is not worth waiting.
That leaves direct calls to os.urandom(). I don't think this should block either.
To be clear, it’s going to block until urandom has been initialized on most non Linux OSs, so either way if the requirement of someone calling os.urandom is “must never block”, then they can’t use os.urandom on most non Linux systems.
I'm not a security expert. I'm not really an expert in anything. But I often have a good sense for what users need or want. In this case it's clear what users want: they don't want Python to hang waiting for random numbers.
Take an example from asyncio. If os.urandom() could block, then an ayncio coroutine that wants to call it would have to move that call to a separate thread using loop.run_in_executor() and await the resulting Future, just to avoid blocking all I/O. But you can't test such code, because in practice when you're there to test it, it will never block anyway. So nobody will write it that way, and everybody's code will have a subtle bug (i.e. a coroutine may block without letting other coroutines run). And it's not just bare calls to os.urandom() -- it's any call to library code that might call os.urandom(). Who documents whether their library call uses os.urandom()? It's unknowable. And therein lies madness.
The problem with security experts is that they're always right when they say you shouldn't do something. The only truly secure computer is one that's disconnected and buried 6 feet under the ground. There's always a scenario through which an attacker could exploit a certain behavior. And there's always the possibility that the computer that's thus compromised is guarding a list of Chinese dissidents, or a million credit card numbers, or the key Apple uses to sign iPhone apps. But much more likely it just has my family photos and 100 cloned GitHub projects.
And the only time when os.urandom() is going to block on me is probably when I'm rebooting a development VM and wondering why it's so slow.
Maybe we can put in a warning when getrandom(..., GRND_NONBLOCK) returns EAGAIN? And then award a prize to people who can make it print that warning. Maybe we'll find a way to actually test this code.
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
— Donald Stufft
On 06/09/2016 08:52 AM, Guido van Rossum wrote:
That leaves direct calls to os.urandom(). I don't think this should block either.
Then it's you and me against the rest of the world ;-) Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. It's permissible to take advantage of getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from /dev/urandom. It's already well established that this will upset the cryptography experts. As a concession to them, I propose adding a simple! predictable! function to Python 3.5.2: os.getrandom(). This would be a simple wrapper over getrandom, only available on platforms that expose it. It would provide a way to use both extant flags, GRND_RANDOM and GRND_NONBLOCK, though possibly not exactly mirroring the native API. This would enable cryptography libraries to easily do what (IIUC) they regard as the "correct" thing on Linux for all supported versions of Python: if hasattr(os, "getrandom"): bits = os.getrandom(n) else: bits = os.urandom(n) I'm not excited about adding a new function in 3.5.2, but on the other hand we are taking away this functionality they had in 3.5.0 and 3.5.1 so only seems fair. And the implementation of os.getrandom() should be very straightforward, and its semantics will mirror the native call, so I'm pretty confident we can get it solid in a couple of days, though we might slip 3.5.2rc1 by a day or two. Guido: do you see this as an acceptable compromise? Cryptographers: given that os.urandom() will no longer block in 3.5.2, do you want this? Pointing out an alternate approach: Marc-Andre Lemburg proposes in issue #27279 ( http://bugs.python.org/issue27279 ) that we should add two "known best-practices" functions to get pseudo-random bits; one merely for pseudo random bits, the other for crypto-strength pseudo random bits. While I think this is a fine idea, the exact spelling, semantics, and per-platform implementation of these functions is far from settled, and nobody is proposing that we do something like that for 3.5. //arry/
On 06/09/2016 03:22 PM, Larry Hastings wrote:
On 06/09/2016 08:52 AM, Guido van Rossum wrote:
That leaves direct calls to os.urandom(). I don't think this should block either.
Then it's you and me against the rest of the world ;-)
Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call.
One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?].
As a concession to [the crypto experts], I propose adding a simple! predictable! function to Python 3.5.2: os.getrandom().
This would be unnecessary if we go the exception route.
And the implementation of os.getrandom() should be very straightforward, and its semantics will mirror the native call, so I'm pretty confident we can get it solid in a couple of days, though we might slip 3.5.2rc1 by a day or two.
I would think the exception route would also not take very long to make solid. Okay, I'll shut up now. ;) -- ~Ethan~
On 06/09/2016 03:44 PM, Ethan Furman wrote:
On 06/09/2016 03:22 PM, Larry Hastings wrote:
Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call.
One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?].
Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". //arry/
Can we get any new function on all platforms, deferring to urandom() if getrandom() isn't there? If the pattern is really going to be the hasattr check you posted earlier Top-posted from my Windows Phone -----Original Message----- From: "Larry Hastings" <larry@hastings.org> Sent: 6/10/2016 8:50 To: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? On 06/09/2016 03:44 PM, Ethan Furman wrote: On 06/09/2016 03:22 PM, Larry Hastings wrote: Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?]. Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". /arry
(fat fingered the send button, picking up where I left off) If the pattern is really going to be the hasattr check you posted earlier, can we just do it for people and save them writing code that won't work on different OSs? Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "Larry Hastings" <larry@hastings.org> Sent: 6/10/2016 8:50 To: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? On 06/09/2016 03:44 PM, Ethan Furman wrote: On 06/09/2016 03:22 PM, Larry Hastings wrote: Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?]. Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". /arry
On Jun 09 2016, Larry Hastings <larry@hastings.org> wrote:
On 06/09/2016 03:44 PM, Ethan Furman wrote:
On 06/09/2016 03:22 PM, Larry Hastings wrote:
Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call.
One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?].
Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin".
Aeh, what the tin says is "return random bytes". What everyone uses it for (including the standard library) is to provide randomness for cryptographic purposes. What it does (in the problematic case) is return something that's not random. To me this sounds about as sensible as having open('/dev/zero') return non-zero values in some rare situations. And yes, for most people "the kernel running out of zeros" makes exactly as much sense as "the kernel runs out of random data". Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
On 06/09/2016 07:38 PM, Nikolaus Rath wrote:
On Jun 09 2016, Larry Hastings <larry@hastings.org> wrote:
Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". Aeh, what the tin says is "return random bytes".
What the tin says is "urandom", which has local man pages that dictate exactly how it behaves. On Linux the "urandom" man page says: A read from the /dev/urandom device will not block waiting for more entropy. If there is not sufficient entropy, a pseudorandom number generator is used to create the requested bytes. os.urandom() needs to behave like that on Linux, which is how it behaved in Python 2.4 through 3.4. //arry/
On Jun 09 2016, Larry Hastings <larry@hastings.org> wrote:
On 06/09/2016 07:38 PM, Nikolaus Rath wrote:
On Jun 09 2016, Larry Hastings <larry@hastings.org> wrote:
Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". Aeh, what the tin says is "return random bytes".
What the tin says is "urandom", which has local man pages that dictate exactly how it behaves. [...]
I disagree. The authoritative source for the behavior of the Python 'urandom' function is the Python documentation, not the Linux manpage for the "urandom" device. And https://docs.python.org/3.4/library/os.html says first and foremost: os.urandom(n)¶ Return a string of n random bytes suitable for cryptographic use. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
[Nikolaus Rath]
Aeh, what the tin says is "return random bytes".
[Larry Hastings]
What the tin says is "urandom", which has local man pages that dictate exactly how it behaves. On Linux the "urandom" man page says:
A read from the /dev/urandom device will not block waiting for more entropy. If there is not sufficient entropy, a pseudorandom number generator is used to create the requested bytes.
os.urandom() needs to behave like that on Linux, which is how it behaved in Python 2.4 through 3.4.
I agree (with Larry). If the change hadn't already been made, nobody would get anywhere trying to make it now. So best to pretend it was never made to begin with ;-) The tin that _will_ say "return random bytes" in Python will be`secrets.token_bytes()`. That's self-evidently (to me) where the "possibly block forever" implementation belongs.
So secrets.py needs an upgrade; it currently uses random.SysRandom. On Thursday, June 9, 2016, Tim Peters <tim.peters@gmail.com> wrote:
[Nikolaus Rath]
Aeh, what the tin says is "return random bytes".
[Larry Hastings]
What the tin says is "urandom", which has local man pages that dictate exactly how it behaves. On Linux the "urandom" man page says:
A read from the /dev/urandom device will not block waiting for more entropy. If there is not sufficient entropy, a pseudorandom number generator is used to create the requested bytes.
os.urandom() needs to behave like that on Linux, which is how it behaved in Python 2.4 through 3.4.
I agree (with Larry). If the change hadn't already been made, nobody would get anywhere trying to make it now. So best to pretend it was never made to begin with ;-)
The tin that _will_ say "return random bytes" in Python will be`secrets.token_bytes()`. That's self-evidently (to me) where the "possibly block forever" implementation belongs. _______________________________________________ Python-Dev mailing list Python-Dev@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido (mobile)
On Thu, Jun 9, 2016 at 3:22 PM, Larry Hastings <larry@hastings.org> wrote:
On 06/09/2016 08:52 AM, Guido van Rossum wrote:
That leaves direct calls to os.urandom(). I don't think this should block either.
Then it's you and me against the rest of the world ;-)
Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. It's permissible to take advantage of getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from /dev/urandom.
It's already well established that this will upset the cryptography experts. As a concession to them, I propose adding a simple! predictable! function to Python 3.5.2: os.getrandom(). This would be a simple wrapper over getrandom, only available on platforms that expose it. It would provide a way to use both extant flags, GRND_RANDOM and GRND_NONBLOCK, though possibly not exactly mirroring the native API.
This would enable cryptography libraries to easily do what (IIUC) they regard as the "correct" thing on Linux for all supported versions of Python:
if hasattr(os, "getrandom"): bits = os.getrandom(n) else: bits = os.urandom(n)
So I understand that the trade-offs between crypto users and regular users are tricky, but this resolution concerns me quite a bit :-( Specifically, it seems to me that: 1) we now have these two functions that need to be supported forever, and AFAICT in every case where someone is currently explicitly calling os.urandom and the behavior differs, they want os.getrandom instead. (This is based on the assumption that the only time that explicitly calling os.urandom is the best option is when one cares about the cryptographic strength of the result -- I'm explicitly distinguishing here between the hash seeding issue that triggered the original bug report and explicit calls to os.urandom.) So in practice this change makes it so that the only correct way of calling either of these functions is the if/else stanza above. 2) every piece of security-sensitive software is going to spend resources churning their code to implement the above, 3) every future security audit of Python software is going to spend resources making sure this is on their checklist of incredibly subtle gotchas that have to be audited for, 4) the crypto folks are going to have to spin up a whole evangelism effort to re-educate everyone that (contrary to what we've been telling everyone for years), os.urandom is no longer the right way to get cryptographic randomness. OTOH if we allow explicit calls to os.urandom to block or raise an exception, then AFAICT from this thread this will break exactly zero projects. Maybe this is just rehashing the same things that have already been discussed ad naseaum, in which case I apologize. But I really feel like this is one of those cases where the crypto folks aren't so much saying "oh BUT what if <incredibly unlikely situation involving oppressive regimes and ticking bombs>"; they're more saying "oh $#@ you're going to cause me a *massive* amount of real work and churn and ongoing costs for no perceivable gain and I'm exhausted even thinking about it".
I'm not excited about adding a new function in 3.5.2, but on the other hand we are taking away this functionality they had in 3.5.0 and 3.5.1 so only seems fair. And the implementation of os.getrandom() should be very straightforward, and its semantics will mirror the native call, so I'm pretty confident we can get it solid in a couple of days, though we might slip 3.5.2rc1 by a day or two.
Guido: do you see this as an acceptable compromise?
Cryptographers: given that os.urandom() will no longer block in 3.5.2, do you want this?
Pointing out an alternate approach: Marc-Andre Lemburg proposes in issue #27279 ( http://bugs.python.org/issue27279 ) that we should add two "known best-practices" functions to get pseudo-random bits; one merely for pseudo random bits, the other for crypto-strength pseudo random bits. While I think this is a fine idea, the exact spelling, semantics, and per-platform implementation of these functions is far from settled, and nobody is proposing that we do something like that for 3.5.
We already have a function for non-crypto-strength pseudo-random bits: random.getrandbits. os.urandom is the one for the cryptographers (I thought). -n -- Nathaniel J. Smith -- https://vorpus.org
I don't think we should add a new function. I think we should convince ourselves that there is not enough of a risk of an exploit even if os.urandom() falls back. On Thu, Jun 9, 2016 at 6:03 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jun 9, 2016 at 3:22 PM, Larry Hastings <larry@hastings.org> wrote:
On 06/09/2016 08:52 AM, Guido van Rossum wrote:
That leaves direct calls to os.urandom(). I don't think this should
block
either.
Then it's you and me against the rest of the world ;-)
Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. It's permissible to take advantage of getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from /dev/urandom.
It's already well established that this will upset the cryptography experts. As a concession to them, I propose adding a simple! predictable! function to Python 3.5.2: os.getrandom(). This would be a simple wrapper over getrandom, only available on platforms that expose it. It would provide a way to use both extant flags, GRND_RANDOM and GRND_NONBLOCK, though possibly not exactly mirroring the native API.
This would enable cryptography libraries to easily do what (IIUC) they regard as the "correct" thing on Linux for all supported versions of Python:
if hasattr(os, "getrandom"): bits = os.getrandom(n) else: bits = os.urandom(n)
So I understand that the trade-offs between crypto users and regular users are tricky, but this resolution concerns me quite a bit :-(
Specifically, it seems to me that: 1) we now have these two functions that need to be supported forever, and AFAICT in every case where someone is currently explicitly calling os.urandom and the behavior differs, they want os.getrandom instead. (This is based on the assumption that the only time that explicitly calling os.urandom is the best option is when one cares about the cryptographic strength of the result -- I'm explicitly distinguishing here between the hash seeding issue that triggered the original bug report and explicit calls to os.urandom.) So in practice this change makes it so that the only correct way of calling either of these functions is the if/else stanza above. 2) every piece of security-sensitive software is going to spend resources churning their code to implement the above, 3) every future security audit of Python software is going to spend resources making sure this is on their checklist of incredibly subtle gotchas that have to be audited for, 4) the crypto folks are going to have to spin up a whole evangelism effort to re-educate everyone that (contrary to what we've been telling everyone for years), os.urandom is no longer the right way to get cryptographic randomness.
OTOH if we allow explicit calls to os.urandom to block or raise an exception, then AFAICT from this thread this will break exactly zero projects.
Maybe this is just rehashing the same things that have already been discussed ad naseaum, in which case I apologize. But I really feel like this is one of those cases where the crypto folks aren't so much saying "oh BUT what if <incredibly unlikely situation involving oppressive regimes and ticking bombs>"; they're more saying "oh $#@ you're going to cause me a *massive* amount of real work and churn and ongoing costs for no perceivable gain and I'm exhausted even thinking about it".
I'm not excited about adding a new function in 3.5.2, but on the other hand we are taking away this functionality they had in 3.5.0 and 3.5.1 so only seems fair. And the implementation of os.getrandom() should be very straightforward, and its semantics will mirror the native call, so I'm pretty confident we can get it solid in a couple of days, though we might slip 3.5.2rc1 by a day or two.
Guido: do you see this as an acceptable compromise?
Cryptographers: given that os.urandom() will no longer block in 3.5.2, do you want this?
Pointing out an alternate approach: Marc-Andre Lemburg proposes in issue #27279 ( http://bugs.python.org/issue27279 ) that we should add two "known best-practices" functions to get pseudo-random bits; one merely for pseudo random bits, the other for crypto-strength pseudo random bits. While I think this is a fine idea, the exact spelling, semantics, and per-platform implementation of these functions is far from settled, and nobody is proposing that we do something like that for 3.5.
We already have a function for non-crypto-strength pseudo-random bits: random.getrandbits. os.urandom is the one for the cryptographers (I thought).
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Jun 09 2016, Guido van Rossum <guido@python.org> wrote:
I don't think we should add a new function. I think we should convince ourselves that there is not enough of a risk of an exploit even if os.urandom() falls back.
That will be hard, because you have to consider an active, clever adversary. On the other hand, convincing yourself that in practice os.urandom would never block unless the setup is super exotic or there is active maliciousness seems much easier. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
On Thu, Jun 09, 2016 at 07:52:31PM -0700, Nikolaus Rath wrote:
On Jun 09 2016, Guido van Rossum <guido@python.org> wrote:
I don't think we should add a new function. I think we should convince ourselves that there is not enough of a risk of an exploit even if os.urandom() falls back.
That will be hard, because you have to consider an active, clever adversary.
We know that there are exploitable bugs from Linux systems due to urandom, e.g. the Raspberry Pi bug referenced elsewhere in this thread. https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892
On the other hand, convincing yourself that in practice os.urandom would never block unless the setup is super exotic or there is active maliciousness seems much easier.
Not that super exotic. In my day job, I've seen processes hang for five or ten minutes during boot up, waiting for the OS to collect enough entropy, although this was not recently and it wasn't involving Python. But VMs or embedded devices may take a long time to generate entropy. If the device doesn't have a hardware source of randomness, and isn't connected to an external source of noise like networking or a user who habitually fiddles with the mouse, it might take a very long time indeed to gather entropy... If I have understood the concensus, I think we're on the right track: (1) os.urandom should do whatever the OS says it should do, which on Linux is fall back on pseudo-random bytes when the entropy pool hasn't be initialised yet. It won't block and won't raise. (2) os.getrandom will be added to 3.6, and it will block, or possibly raise, whichever the caller specifies. (3) The secrets module in 3.6 will stop relying on os.urandom, and use os.getrandom. It may provide a switch to choose between blocking and non-blocking (raise an exception) behaviour. It WON'T fall back to predictable non-crypto bytes (unless the OS itself is completely broken). (4) random will continue to seed itself from os.urandom, because it doesn't care if urandom provides degraded randomness. It just needs to be better than using the time as seed. (5) What about random.SysRandom? I think it should use os.getrandom. (6) A bunch of stuff will happen to make the hash randomisation not break when systemd runs Python scripts early in the boot process, but I haven't been paying attention to that part :-) Is this a good summary of where we are at? -- Steve
In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits To those still upset by the decision, please read Ted Ts'o's message. On Saturday, June 11, 2016, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Jun 09, 2016 at 07:52:31PM -0700, Nikolaus Rath wrote:
On Jun 09 2016, Guido van Rossum <guido@python.org <javascript:;>> wrote:
I don't think we should add a new function. I think we should convince ourselves that there is not enough of a risk of an exploit even if os.urandom() falls back.
That will be hard, because you have to consider an active, clever adversary.
We know that there are exploitable bugs from Linux systems due to urandom, e.g. the Raspberry Pi bug referenced elsewhere in this thread.
https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892
On the other hand, convincing yourself that in practice os.urandom would never block unless the setup is super exotic or there is active maliciousness seems much easier.
Not that super exotic. In my day job, I've seen processes hang for five or ten minutes during boot up, waiting for the OS to collect enough entropy, although this was not recently and it wasn't involving Python. But VMs or embedded devices may take a long time to generate entropy. If the device doesn't have a hardware source of randomness, and isn't connected to an external source of noise like networking or a user who habitually fiddles with the mouse, it might take a very long time indeed to gather entropy...
If I have understood the concensus, I think we're on the right track:
(1) os.urandom should do whatever the OS says it should do, which on Linux is fall back on pseudo-random bytes when the entropy pool hasn't be initialised yet. It won't block and won't raise.
(2) os.getrandom will be added to 3.6, and it will block, or possibly raise, whichever the caller specifies.
(3) The secrets module in 3.6 will stop relying on os.urandom, and use os.getrandom. It may provide a switch to choose between blocking and non-blocking (raise an exception) behaviour. It WON'T fall back to predictable non-crypto bytes (unless the OS itself is completely broken).
(4) random will continue to seed itself from os.urandom, because it doesn't care if urandom provides degraded randomness. It just needs to be better than using the time as seed.
(5) What about random.SysRandom? I think it should use os.getrandom.
(6) A bunch of stuff will happen to make the hash randomisation not break when systemd runs Python scripts early in the boot process, but I haven't been paying attention to that part :-)
Is this a good summary of where we are at?
-- Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido (mobile)
On Jun 11, 2016, at 11:34 AM, Guido van Rossum <guido@python.org> wrote:
In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits
If os.urandom can’t block on Linux, then I feel like it’d be saner to add os.getrandom(). I feel like these flags are going to confuse people, particularly when you take into account that all 3 of them are only going to really matter on Linux (and particularly on newer Linux) and for things like “blocking” it’s going to get confused with the blocking that /dev/random does on Linux. Right now there are two ways to access the system CSPRNG on *nix, there is /dev/urandom pretty much always, and then there is getrandom() (or arc4random, etc, depending on the specific OS you’re on). Perhaps the right answer is to go back to making os.urandom always open(“/dev/urandom”).read() instead of trying to save a FD by using getrandom() and just add os.getrandom() which will interface with getrandom()/arc4random()/etc and always in blocking mode. Why always in blocking mode? Because it’s the only way to get consistent behavior across different platforms, all non Linux OSs either block or they otherwise ensure that it is initialized prior to it even being possible to access the CSPRNG. Using this, code can be smarter about what to do in edge cases than we can reasonably be in os.urandom, for example see https://bpaste.net/show/41d89e520913 <https://bpaste.net/show/41d89e520913>. The reasons I think this is preferable to adding parameters to os.urandom are: * If we add parameters to os.urandom, you can’t feature detect their existence easily, you have to use version checks. * With flags, unless we add even more flags we can’t dictate what should happen if we’re on a system where the person’s desired preference can’t be satisfied. We either have to just silently do something that may be wrong, or add more flags. By adding two functions people can pick which of the following they want with some programming (see example): * Just try to get the strongest random, but fall back to maybe not random if it’s early enough in boot process. * Fail on old Linux rather than possibly get insecure random. * Actually write cross platform code to prevent blocking (since only Linux allows you to not block) * Fail hard rather than block if we can’t get secure random bytes without blocking. * Soft fail and get “probably good enough” random from os.urandom on Linux. * Hard fail on non Linux if we would block since there’s no non-blocking and “probably good enough” interface. * Soft fail and get “probably good enough” random from os.urandom on Linux, and use time/pid/memory offsets on non Linux. * Just use the best source of random available to use on the system, and block rather than fail. I don’t see any way to get the same wide set of options by just adding flags to os.urandom unless we add flags that work for every possible combination of what people may or may not want to. — Donald Stufft
Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()? My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right). My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can. I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station. On Sat, Jun 11, 2016 at 10:15 AM, Donald Stufft <donald@stufft.io> wrote:
On Jun 11, 2016, at 11:34 AM, Guido van Rossum <guido@python.org> wrote:
In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits
If os.urandom can’t block on Linux, then I feel like it’d be saner to add os.getrandom(). I feel like these flags are going to confuse people, particularly when you take into account that all 3 of them are only going to really matter on Linux (and particularly on newer Linux) and for things like “blocking” it’s going to get confused with the blocking that /dev/random does on Linux.
Right now there are two ways to access the system CSPRNG on *nix, there is /dev/urandom pretty much always, and then there is getrandom() (or arc4random, etc, depending on the specific OS you’re on).
Perhaps the right answer is to go back to making os.urandom always open(“/dev/urandom”).read() instead of trying to save a FD by using getrandom() and just add os.getrandom() which will interface with getrandom()/arc4random()/etc and always in blocking mode. Why always in blocking mode? Because it’s the only way to get consistent behavior across different platforms, all non Linux OSs either block or they otherwise ensure that it is initialized prior to it even being possible to access the CSPRNG.
Using this, code can be smarter about what to do in edge cases than we can reasonably be in os.urandom, for example see https://bpaste.net/show/41d89e520913.
The reasons I think this is preferable to adding parameters to os.urandom are:
* If we add parameters to os.urandom, you can’t feature detect their existence easily, you have to use version checks. * With flags, unless we add even more flags we can’t dictate what should happen if we’re on a system where the person’s desired preference can’t be satisfied. We either have to just silently do something that may be wrong, or add more flags. By adding two functions people can pick which of the following they want with some programming (see example):
* Just try to get the strongest random, but fall back to maybe not random if it’s early enough in boot process. * Fail on old Linux rather than possibly get insecure random. * Actually write cross platform code to prevent blocking (since only Linux allows you to not block) * Fail hard rather than block if we can’t get secure random bytes without blocking. * Soft fail and get “probably good enough” random from os.urandom on Linux. * Hard fail on non Linux if we would block since there’s no non-blocking and “probably good enough” interface. * Soft fail and get “probably good enough” random from os.urandom on Linux, and use time/pid/memory offsets on non Linux. * Just use the best source of random available to use on the system, and block rather than fail.
I don’t see any way to get the same wide set of options by just adding flags to os.urandom unless we add flags that work for every possible combination of what people may or may not want to.
— Donald Stufft
-- --Guido van Rossum (python.org/~guido)
On Jun 11, 2016, at 1:39 PM, Guido van Rossum <guido@python.org> wrote:
Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()?
My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right).
My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can.
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized. The “silently does the wrong thing, even though I explicitly asked for it do something different” is something that I would consider to be a footgun and footgun’s in security sensitive code make me really worried. Outside of the security side of things, if someone goes “Ok I need some random bytes and I need to make sure it doesn’t block”, then doing ``os.random(block=False, exception=False)`` isn’t going to make sure that it doesn’t block except on Linux. In other words, it’s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don’t. Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs: * If you just want the best the OS has to offer, os.getrandom falling back to os.urandom. * If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux. * If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform). * If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station.
Sorry, to be more specific I meant the 3.4 behavior, which was open(“/dev/urandom”).read() on *nix and CryptGenRandom on Windows. — Donald Stufft
On Sat, 11 Jun 2016 at 11:31 Donald Stufft <donald@stufft.io> wrote:
On Jun 11, 2016, at 1:39 PM, Guido van Rossum <guido@python.org> wrote:
Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()?
My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right).
My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can.
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized.
The “silently does the wrong thing, even though I explicitly asked for it do something different” is something that I would consider to be a footgun and footgun’s in security sensitive code make me really worried.
Outside of the security side of things, if someone goes “Ok I need some random bytes and I need to make sure it doesn’t block”, then doing ``os.random(block=False, exception=False)`` isn’t going to make sure that it doesn’t block except on Linux.
In other words, it’s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don’t.
Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs:
* If you just want the best the OS has to offer, os.getrandom falling back to os.urandom. * If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux. * If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform). * If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
I'm +1 w/ what Donald is suggesting here and below w/ proper documentation in both the secrets and random modules to explain when to use what (i.e. secrets for crypto-no-matter-what randomness, random for quick-and-dirty randomness). This also includes any appropriate decoupling of the secrets module from the random module so there's no reliance on the random module in the docs of the secrets module beyond "this class has the same interface", and letting the secrets module be the way people generally get crypto randomness. -Brett
I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station.
Sorry, to be more specific I meant the 3.4 behavior, which was open(“/dev/urandom”).read() on *nix and CryptGenRandom on Windows.
—
Donald Stufft _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On Sat, Jun 11, 2016 at 11:30 AM, Donald Stufft <donald@stufft.io> wrote:
On Jun 11, 2016, at 1:39 PM, Guido van Rossum <guido@python.org> wrote:
Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()?
My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right).
My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can.
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized.
The “silently does the wrong thing, even though I explicitly asked for it do something different” is something that I would consider to be a footgun and footgun’s in security sensitive code make me really worried.
Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation.
Outside of the security side of things, if someone goes “Ok I need some random bytes and I need to make sure it doesn’t block”, then doing ``os.random(block=False, exception=False)`` isn’t going to make sure that it doesn’t block except on Linux.
To people who "just want some random bytes" we should recommend the random module.
In other words, it’s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don’t.
OK, it looks like the flags just won't make you happy, and I'm happy to give up on them. By default the status quo will win, and that means neither these flags nor os.getrandom(). (But of course you can roll your own using ctypes. :-)
Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs:
* If you just want the best the OS has to offer, os.getrandom falling back to os.urandom.
Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True).
* If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux.
"Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.)
* If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform).
That's fine with me.
* If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
Now you're just taking the mickey.
I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station.
Sorry, to be more specific I meant the 3.4 behavior, which was open(“/dev/urandom”).read() on *nix and CryptGenRandom on Windows.
I am all for keeping it that way. The secrets module doesn't have to use any of these, it can use an undocumented extension module for all I care. Or it can use os.urandom() and trust Ted Ts'o. -- --Guido van Rossum (python.org/~guido)
http://bugs.python.org/issue27288 covers updating the secrets module to use getrandom(). http://bugs.python.org/issue27292 covers documenting the drawbacks of os.urandom() http://bugs.python.org/issue27293 covers documenting all of the issues pointed out in this discussion. Only issue I can think of that we're missing is one to track reverting os.urandom() to 3.4 semantics (any doc updates required for the random module?). Am I missing anything? On Sat, Jun 11, 2016, 12:41 Guido van Rossum <guido@python.org> wrote:
On Sat, Jun 11, 2016 at 11:30 AM, Donald Stufft <donald@stufft.io> wrote:
On Jun 11, 2016, at 1:39 PM, Guido van Rossum <guido@python.org> wrote:
Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()?
My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right).
My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can.
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized.
The “silently does the wrong thing, even though I explicitly asked for it do something different” is something that I would consider to be a footgun and footgun’s in security sensitive code make me really worried.
Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation.
Outside of the security side of things, if someone goes “Ok I need some random bytes and I need to make sure it doesn’t block”, then doing ``os.random(block=False, exception=False)`` isn’t going to make sure that it doesn’t block except on Linux.
To people who "just want some random bytes" we should recommend the random module.
In other words, it’s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don’t.
OK, it looks like the flags just won't make you happy, and I'm happy to give up on them. By default the status quo will win, and that means neither these flags nor os.getrandom(). (But of course you can roll your own using ctypes. :-)
Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs:
* If you just want the best the OS has to offer, os.getrandom falling back to os.urandom.
Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True).
* If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux.
"Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.)
* If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform).
That's fine with me.
* If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
Now you're just taking the mickey.
I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station.
Sorry, to be more specific I meant the 3.4 behavior, which was open(“/dev/urandom”).read() on *nix and CryptGenRandom on Windows.
I am all for keeping it that way. The secrets module doesn't have to use any of these, it can use an undocumented extension module for all I care. Or it can use os.urandom() and trust Ted Ts'o.
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On Jun 11, 2016, at 4:26 PM, Brett Cannon <brett@python.org> wrote:
Only issue I can think of that we're missing is one to track reverting os.urandom() to 3.4 semantics (any doc updates required for the random module?). Am I missing anything?
It’s already been reverted to 3.4 semantics (well, it will try to use getrandom(GRD_NONBLOCK) but falls back to /dev/urandom if that would have blocked). — Donald Stufft
On Jun 11, 2016, at 3:40 PM, Guido van Rossum <guido@python.org> wrote:
On Sat, Jun 11, 2016 at 11:30 AM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
On Jun 11, 2016, at 1:39 PM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:
Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()?
My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right).
My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can.
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized.
The “silently does the wrong thing, even though I explicitly asked for it do something different” is something that I would consider to be a footgun and footgun’s in security sensitive code make me really worried.
Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation.
Have we? There are real, documented security failures in the wild because of /dev/urandom’s behavior. This isn’t just a theoretical problem, it actually has had consequences in real life, and those same consequences could just have easily happened to Python (in one of the cases that most recently comes to mind it was a C program, but that’s not really relevant because the same problem would have happened if they had written in Python using os.urandom in 3.4 but not in 3.5.0 or 3.5.1.
Outside of the security side of things, if someone goes “Ok I need some random bytes and I need to make sure it doesn’t block”, then doing ``os.random(block=False, exception=False)`` isn’t going to make sure that it doesn’t block except on Linux.
To people who "just want some random bytes" we should recommend the random module.
In other words, it’s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don’t.
OK, it looks like the flags just won't make you happy, and I'm happy to give up on them. By default the status quo will win, and that means neither these flags nor os.getrandom(). (But of course you can roll your own using ctypes. :-)
Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs:
* If you just want the best the OS has to offer, os.getrandom falling back to os.urandom.
Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True).
I’m fine if this lives in the secrets module— Steven asked for it to be an os function so that secrets.py could continue to be pure python.
* If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux.
"Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.)
Erroring does satisfy the ensure part, because if it’s not possible to get cryptographically secure bytes then the only option is to error if you want to be ensured of cryptographically secure bytes. It’s a bit like if you did open(“somefile.txt”), it’s reasonable to say that we should ensure that open(“somefile.txt”) actually opens ./somefile.txt, and doesn’t randomly open a different file if ./somefile.txt doesn’t exist— if it can’t open ./somefile.txt it should error. If I *need* cryptographically secure random bytes, and I’m on a platform that doesn’t provide those, then erroring is often times the correct behavior. This is such an important thing that OS X will flat out kernel panic and refuse to boot if it can’t ensure that it can give people cryptographically secure random bytes. It’s a fairly simple decision tree, I go “hey, give me cryptographically secure random bytes, and only cryptographically secure random bytes”. If it cannot give them to me because the APIs of the system cannot guarantee they are cryptographically secure then there are only two options, either A) it is explicit about it’s inability to do this and raises an error or B) it does something completely different than what I asked it to do and pretends that it’s what I wanted.
* If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform).
That's fine with me.
* If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
Now you're just taking the mickey.
No I’m not— random.Random is such a use case where it wants to seed with as secure of bytes as it can get it’s hands on, but it doesn’t care if it falls back to insecure bytes if it’s not possible to get secure bytes. This code even falls back to using time as a seed if all else fails.
I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station.
Sorry, to be more specific I meant the 3.4 behavior, which was open(“/dev/urandom”).read() on *nix and CryptGenRandom on Windows.
I am all for keeping it that way. The secrets module doesn't have to use any of these, it can use an undocumented extension module for all I care. Or it can use os.urandom() and trust Ted Ts'o.
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
— Donald Stufft
On Sat, Jun 11, 2016 at 1:48 PM, Donald Stufft <donald@stufft.io> wrote:
On Jun 11, 2016, at 3:40 PM, Guido van Rossum <guido@python.org> wrote:
Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation.
Have we? There are real, documented security failures in the wild because of /dev/urandom’s behavior. This isn’t just a theoretical problem, it actually has had consequences in real life, and those same consequences could just have easily happened to Python (in one of the cases that most recently comes to mind it was a C program, but that’s not really relevant because the same problem would have happened if they had written in Python using os.urandom in 3.4 but not in 3.5.0 or 3.5.1.
Actually it's not clear to me at all that it could have happened to Python. (Wasn't it an embedded system?)
Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True).
I’m fine if this lives in the secrets module— Steven asked for it to be an os function so that secrets.py could continue to be pure python.
The main thing that I want to avoid is that people start cargo-culting whatever the secrets module uses rather than just using the secrets module. Having it redundantly available as os.getrandom() is just begging for people to show off how much they know about writing secure code.
* If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux.
"Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.)
Erroring does satisfy the ensure part, because if it’s not possible to get cryptographically secure bytes then the only option is to error if you want to be ensured of cryptographically secure bytes.
It’s a bit like if you did open(“somefile.txt”), it’s reasonable to say that we should ensure that open(“somefile.txt”) actually opens ./somefile.txt, and doesn’t randomly open a different file if ./somefile.txt doesn’t exist— if it can’t open ./somefile.txt it should error. If I *need* cryptographically secure random bytes, and I’m on a platform that doesn’t provide those, then erroring is often times the correct behavior. This is such an important thing that OS X will flat out kernel panic and refuse to boot if it can’t ensure that it can give people cryptographically secure random bytes.
But what is a Python script going to do with that error? IIUC this kind of error would only happen very early during boot time, and rarely, so the most likely outcome is a hard-to-debug mystery failure.
It’s a fairly simple decision tree, I go “hey, give me cryptographically secure random bytes, and only cryptographically secure random bytes”. If it cannot give them to me because the APIs of the system cannot guarantee they are cryptographically secure then there are only two options, either A) it is explicit about it’s inability to do this and raises an error or B) it does something completely different than what I asked it to do and pretends that it’s what I wanted.
I really don't believe that there is only one kind of cryptographically secure random bytes. There are many different applications (use cases) of randomness and they need different behaviors. (If it was simple we wouldn't still be arguing. :-)
* If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform).
That's fine with me.
* If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
Now you're just taking the mickey.
No I’m not— random.Random is such a use case where it wants to seed with as secure of bytes as it can get it’s hands on, but it doesn’t care if it falls back to insecure bytes if it’s not possible to get secure bytes. This code even falls back to using time as a seed if all else fails.
Fair enough. The hash randomization is the other case I suppose (since not running any Python code at all isn't an option, and neither is waiting indefinitely before the user's code gets control). It does show the point that there are different use cases with different needs. But I think the stdlib should limit the choices. -- --Guido van Rossum (python.org/~guido)
On Jun 11, 2016, at 5:16 PM, Guido van Rossum <guido@python.org> wrote:
On Sat, Jun 11, 2016 at 1:48 PM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
On Jun 11, 2016, at 3:40 PM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:
Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation.
Have we? There are real, documented security failures in the wild because of /dev/urandom’s behavior. This isn’t just a theoretical problem, it actually has had consequences in real life, and those same consequences could just have easily happened to Python (in one of the cases that most recently comes to mind it was a C program, but that’s not really relevant because the same problem would have happened if they had written in Python using os.urandom in 3.4 but not in 3.5.0 or 3.5.1.
Actually it's not clear to me at all that it could have happened to Python. (Wasn't it an embedded system?)
It was a RaspberryPI that ran a shell script on boot that called ssh-keygen. That shell script could have just as easily been a Python script that called os.urandom via https://github.com/sybrenstuvel/python-rsa <https://github.com/sybrenstuvel/python-rsa> instead of a shell script that called ssh-keygen.
Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True).
I’m fine if this lives in the secrets module— Steven asked for it to be an os function so that secrets.py could continue to be pure python.
The main thing that I want to avoid is that people start cargo-culting whatever the secrets module uses rather than just using the secrets module. Having it redundantly available as os.getrandom() is just begging for people to show off how much they know about writing secure code.
I guess one question would be, what does the secrets module do if it’s on a Linux that is too old to have getrandom(0), off the top of my head I can think of: * Silently fall back to reading os.urandom and hope that it’s been seeded. * Fall back to os.urandom and hope that it’s been seeded and add a SecurityWarning or something like it to mention that it’s falling back to os.urandom and it may be getting predictable random from /dev/urandom. * Hard fail because it can’t guarantee secure cryptographic random. Of the three, I would probably suggest the second one, it doesn’t let the problem happen silently, but it still “works” (where it’s basically just hoping it’s being called late enough that /dev/urandom has been seeded), and people can convert it to the third case using the warnings module to turn the warning into an exception.
* If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux.
"Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.)
Erroring does satisfy the ensure part, because if it’s not possible to get cryptographically secure bytes then the only option is to error if you want to be ensured of cryptographically secure bytes.
It’s a bit like if you did open(“somefile.txt”), it’s reasonable to say that we should ensure that open(“somefile.txt”) actually opens ./somefile.txt, and doesn’t randomly open a different file if ./somefile.txt doesn’t exist— if it can’t open ./somefile.txt it should error. If I *need* cryptographically secure random bytes, and I’m on a platform that doesn’t provide those, then erroring is often times the correct behavior. This is such an important thing that OS X will flat out kernel panic and refuse to boot if it can’t ensure that it can give people cryptographically secure random bytes.
But what is a Python script going to do with that error? IIUC this kind of error would only happen very early during boot time, and rarely, so the most likely outcome is a hard-to-debug mystery failure.
Depends on why they’re calling it, which is sort of the underlying problem I suspect with why there isn’t agreement about what the right default behavior is. The correct answer for some application might be to hard fail and wait for the operator to fix the environment that it’s running in. It depends on how important the thing that is getting this random is. One example: If I was writing a communication platform for people who are fighting oppressive regimes or to securely discuss sexual orientation in more dangerous parts of the world, I would want to make this program hard fail if it couldn’t ensure that it was using an interface that ensured cryptographic random, because the alternative is predictable numbers and someone possibly being arrested or executed. I know that’s a bit of an extreme edge case, but it’s also the kind of thing that people can might use Python for where the predictability of the CSPRNG it’s using is of the utmost importance. For other things, the importance will fall somewhere between best effort being good enough and predictable random numbers being a catastrophic.
It’s a fairly simple decision tree, I go “hey, give me cryptographically secure random bytes, and only cryptographically secure random bytes”. If it cannot give them to me because the APIs of the system cannot guarantee they are cryptographically secure then there are only two options, either A) it is explicit about it’s inability to do this and raises an error or B) it does something completely different than what I asked it to do and pretends that it’s what I wanted.
I really don't believe that there is only one kind of cryptographically secure random bytes. There are many different applications (use cases) of randomness and they need different behaviors. (If it was simple we wouldn't still be arguing. :-)
I mean for a CSPRNG there’s only one real important property: Can an attacker predict the next byte. Any other property for a CSPRNG doesn’t really matter. For other, non kinds of CSPRNGs they want other behaviors (equidistribution, etc) but those aren’t cryptographically secure (nor do they need to be).
* If you want to *ensure* that there’s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that’s the only way to ensure not blocking cross platform).
That's fine with me.
* If you just don’t care, YOLO it up with either os.urandom or os.getrandom or random.random.
Now you're just taking the mickey.
No I’m not— random.Random is such a use case where it wants to seed with as secure of bytes as it can get it’s hands on, but it doesn’t care if it falls back to insecure bytes if it’s not possible to get secure bytes. This code even falls back to using time as a seed if all else fails.
Fair enough. The hash randomization is the other case I suppose (since not running any Python code at all isn't an option, and neither is waiting indefinitely before the user's code gets control).
It does show the point that there are different use cases with different needs. But I think the stdlib should limit the choices.
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
— Donald Stufft
On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote:
It was a RaspberryPI that ran a shell script on boot that called ssh-keygen. That shell script could have just as easily been a Python script that called os.urandom via https://github.com/sybrenstuvel/python-rsa instead of a shell script that called ssh-keygen.
So I'm going to argue that the primary bug was in the how the systemd init scripts were configured. In generally, creating keypairs at boot time is just a bad idea. They should be created lazily, in a just-in-time paradigm. Consider that if you assume that os.urandom can block, this isn't necessarily going to do the right thing either --- if you use getrandom and it blocks, and it's part of a systemd unit which is blocking futher boot progress, then the system will hang for 90 seconds, and while it's hanging, there won't be any interrupts, so the system will be dead in the water, just like the orignal bug report complaining that Python was hanging when it was using getrandom() to initialize its SipHash. At which point there will be another bug complaining about how python was causing systemd to hang for 90 seconds, and there will be demand to make os.random no longer block. (Since by definition, systemd can do no wrong; it's always other programs that have to change to accomodate systemd. :-) So some people will freak out when the keygen systemd unit hangs, blocking the boot --- and other people will freak out of the systemd unit doesn't hang, and you get predictable SSH keys --- and some wiser folks will be asking the question, why the *heck* is it not openssh/systemd's fault for trying to generate keys this early, instead of after the first time sshd needs host ssh keys? If you wait until the first time the host ssh keys are needed, then the system is fully booted, so it's likely that the entropy will be collected -- and even if it isn't, networking will already be brought up, and the system will be in multi-user mode, so entropy will be collected very quickly. Sometimes, we can't solve the problem at the Python level or at the Kernel level. It will require security-saavy userspace/application programmers as well. Cheers, - Ted
On 12 Jun 2016, at 07:11, Theodore Ts'o <tytso@mit.edu> wrote:
On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote:
It was a RaspberryPI that ran a shell script on boot that called ssh-keygen. That shell script could have just as easily been a Python script that called os.urandom via https://github.com/sybrenstuvel/python-rsa instead of a shell script that called ssh-keygen.
So I'm going to argue that the primary bug was in the how the systemd init scripts were configured. In generally, creating keypairs at boot time is just a bad idea. They should be created lazily, in a just-in-time paradigm.
Agreed. I hope that if there is only one thing every participant has learned from this (extremely painful for all concerned) discussion, it’s that doing anything that requires really good random numbers should be delayed as long as possible on all systems, and should absolutely not be done during the boot process on Linux. Don’t generate key pairs, don’t make TLS connections, just don’t perform any action that requires really good randomness at all.
So some people will freak out when the keygen systemd unit hangs, blocking the boot --- and other people will freak out of the systemd unit doesn't hang, and you get predictable SSH keys --- and some wiser folks will be asking the question, why the *heck* is it not openssh/systemd's fault for trying to generate keys this early, instead of after the first time sshd needs host ssh keys? If you wait until the first time the host ssh keys are needed, then the system is fully booted, so it's likely that the entropy will be collected -- and even if it isn't, networking will already be brought up, and the system will be in multi-user mode, so entropy will be collected very quickly.
As far as I know we still only have three programs that were encountering this problem: Debian’s autopkgtest (which patched with PYTHONHASHSEED=0), systemd-cron (which is moving from Python to Rust anyway), and cloud-init (not formally reported but mentioned to me by a third-party). It remains unclear to me why the systemd-cron service files can’t simply request to be delayed until the kernel CSPRNG is seeded: I guess systemd doesn’t have any way to express that constraint? Perhaps it should. Of this set, only cloud-init worries me, and it worries me for the *opposite* reason that Guido and Larry are worried. Guido and Larry are worried that programs like cloud-init will be delayed by two minutes while they wait for entropy: that’s an understandable concern. I’m much more worried that programs like cloud-init may attempt to establish TLS connections or create keys during this two minute window, leaving them staring down the possibility of performing “secure” actions with insecure keys. This is why I advocate, like Donald does, for having *some* tool in Python that allows Python programs to crash if they attempt to generate cryptographically secure random bytes on a system that is incapable of providing them (which, in practice, can only happen on Linux systems). I don’t care how it’s spelled, I just care that programs that want to use a properly-seeded CSPRNG can error out effectively when one is not available. That allows us to ensure that Python programs that want to do TLS or build key pairs correctly refuse to do so when used in this state, *and* that they provide a clearly debuggable reason for why they refused. That allows the savvy application developers that Ted talked about to make their own decisions about whether their rapid startup is sufficiently important to take the risk. Cory [0]: https://github.com/systemd-cron/systemd-cron/issues/43#issuecomment-16034398...
On Sun, Jun 12, 2016 at 11:40:58AM +0100, Cory Benfield wrote:
Of this set, only cloud-init worries me, and it worries me for the *opposite* reason that Guido and Larry are worried. Guido and Larry are worried that programs like cloud-init will be delayed by two minutes while they wait for entropy: that’s an understandable concern. I’m much more worried that programs like cloud-init may attempt to establish TLS connections or create keys during this two minute window, leaving them staring down the possibility of performing “secure” actions with insecure keys.
There are patches in the dev branch of: https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/ which will automatically use virtio-rng (if it is provided by the cloud provider) to initialize /dev/urandom. It also uses a much more aggressive mechanism to initialize the /dev/urandom pool, so that getrandom(2) will block for a much shorter period of time immediately after boot time on real hardware. I'm confident it's secure for x86 platforms. I'm still thinking about whether I should fall back to something more conservative for crappy embedded processors that don't have a cycle counter or an CPU-provided RDRAND-like instruction. Related to this is whether I should finally make the change so that /dev/urandom will block until it is initialized. (This would make Linux work like FreeBSD, which *will* also block if its entropy pool is not initialized.)
This is why I advocate, like Donald does, for having *some* tool in Python that allows Python programs to crash if they attempt to generate cryptographically secure random bytes on a system that is incapable of providing them (which, in practice, can only happen on Linux systems).
Well, it can only happen on Linux because you insist on falling back to /dev/urandom --- and because other OS's have the good taste not to use systemd and/or Python very early in the boot process. If someone tried to run a python script in early FreeBSD init scripts, it would block just as you were seeing on Linux --- you just haven't seen that yet, because arguably the FreeBSD developers have better taste in their choice of init scripts than Red Hat and Debian. :-) So the question is whether I should do what FreeBSD did, which will statisfy those people who are freaking out and whinging about how Linux could allow stupidly written or deployed Python scripts get cryptographically insecure bytes, by removing that option from Python developers. Or should I remove that one line from changes in the random.git patch series, and allow /dev/urandom to be used even when it might be insecure, so as to satisfy all of the people who are freaking out and whinging about the fact that a stupildly written and/or deployed Python script might block during early boot and hang a system? Note that I've tried to do what I can to make the time that /dev/urandom might block as small as possible, but at the end of the day, there is still the question of whether I should remove the choice re: blocking from userspace, ala FreeBSD, or not. And either way, some number of people will be whinging and freaking out. Which is why I completely sympathetic to how Guido might be getting a little exasperated over this whole thread. :-) - Ted
On 12 Jun 2016, at 14:43, Theodore Ts'o <tytso@mit.edu> wrote:
Well, it can only happen on Linux because you insist on falling back to /dev/urandom --- and because other OS's have the good taste not to use systemd and/or Python very early in the boot process. If someone tried to run a python script in early FreeBSD init scripts, it would block just as you were seeing on Linux --- you just haven't seen that yet, because arguably the FreeBSD developers have better taste in their choice of init scripts than Red Hat and Debian. :-)
Heh, yes, so to be clear, I said “this can only happen on Linux” because I’m talking about the world that we live in: the one where I lost this debate. =D Certainly right now the codebase as it stands could encounter the same problems on FreeBSD. That’s a problem for Python to deal with.
So the question is whether I should do what FreeBSD did, which will statisfy those people who are freaking out and whinging about how Linux could allow stupidly written or deployed Python scripts get cryptographically insecure bytes, by removing that option from Python developers. Or should I remove that one line from changes in the random.git patch series, and allow /dev/urandom to be used even when it might be insecure, so as to satisfy all of the people who are freaking out and whinging about the fact that a stupildly written and/or deployed Python script might block during early boot and hang a system?
Note that I've tried to do what I can to make the time that /dev/urandom might block as small as possible, but at the end of the day, there is still the question of whether I should remove the choice re: blocking from userspace, ala FreeBSD, or not. And either way, some number of people will be whinging and freaking out. Which is why I completely sympathetic to how Guido might be getting a little exasperated over this whole thread. :-)
I don’t know that we need to talk about removing the choice. I understand the desire to commit to backwards compatibility, of course I do. My problem with /dev/urandom is not that it *exists*, per se: all kinds of stupid stuff exists for the sake of backward compatibility. My problem with /dev/urandom is that it’s a trap, lying in wait for someone who doesn’t know enough about the problem they’re solving to step into it. And it’s the worst kind of trap: it’s one you don’t know you’ve stepped in. Nothing about the failure mode of /dev/urandom is obvious. Worse, well-written apps that try their best to do the right thing can still step into that failure mode if they’re run in a situation that they weren’t expecting (e.g. on an embedded device without hardware RNG or early in the boot process). So my real problem with /dev/urandom is that the man page doesn’t say, in gigantic letters, “this device has a really nasty failure mode that you cannot possibly detect by just running the code in the dangerous mode”. It’s understandable to have insecure weak stuff available to users: Python has loads of it. But where possible, the documentation marks it as such. It’d be good to have /dev/urandom’s man page say “hey, by the way, you almost certainly don’t want this: try using getrandom() instead”. Anyway, regarding changing the behaviour of /dev/urandom: as you’ve correctly highlighted, at this point you’re damned if you do and damned if you don’t. If you don’t change, you’ll forever have people like me saying that /dev/urandom is dangerous, and that its behaviour in the unseeded/poorly-seeded state is a misfeature. I trust you’ll understand when I tell you that that opinion has nothing to do with *you* or the Linux kernel maintainership. This is all about the way software security evolves: things that used to be ok start to become not ok over time. We learn, we improve. Of course, if you do change the behaviour, you’ll rightly have programmers stumble onto this exact problem. They’ll be unhappy too. And the worst part of all of this is that neither side of that debate is *wrong*: they just prioritise different things. Guido, Larry, and friends aren’t wrong, any more than I am: we just rate the different concerns differently. That’s fine: after all, it’s probably why Guido invented and maintains an extremely popular programming language and I haven’t and never will! I have absolutely no problem with breaking “working” code if I believe that that code is exposing users to risks they aren’t aware of (you can check my OSS record to prove it, and I’m happy to provide references). The best advice I can give anyone in this debate, on either side, is to make decisions that you can live with. Consider the consequences, consider the promises you’ve made to users, and then do what you think is right. Guido and Larry have decided to go with backward-compatibility: fine. They’re responsible, the buck stops with them, they know that. The same is true for you, Ted, with the /dev/urandom device. If it were me, I’d change the behaviour of /dev/urandom in a heartbeat. But then again, I’m not Ted Ts’o, and I suspect that instinct is part of why. For my part, thanks for participating, Ted. It’s good to know you know what the problems are, even if your solution isn’t necessarily the one I’d go for. =) Cory
On Sun, Jun 12, 2016 at 09:01:09PM +0100, Cory Benfield wrote:
My problem with /dev/urandom is that it’s a trap, lying in wait for someone who doesn’t know enough about the problem they’re solving to step into it.
And my answer to that question is absent backwards compatibility concerns, use getrandom(2) on Linux, or getentropy(2) on *BSD, and be happy. Don't use /dev/urandom; use getrandom(2) instead. That way you also solve a number of other problems such as the file descriptor DOS attack issue, etc. The problem with Python is that you *do* have backwards compatibility concerns. At which point you are faced with the same issues that we are in the kernel; except I gather than that the commitment to backwards compatibility isn't quite as absolute (although it is strong). Which is why I've been trying very hard not to tell python-dev what to do, but rather to give you folks the best information I can, and then encouraging you to do whatever seems most "Pythony" --- which might or might not be the same as the decisions we've made in the kernel. Cheers, - Ted P.S. BTW, I probably won't change the behaviour of /dev/urandom to make it be blocking. Before I found out about Pyhton Bug #26839, I actually had patches that did make /dev/urandom blocking, and they were planned to for the next kernel merge window. But ultimately, the reason why I won't is because there is a set of real users (Debian Stretch users on Amazon AWS and Google GCE) for which if I changed how /dev/urandom worked, then I would be screwing them over, even if Python 3.5.2 falls back to /dev/urandom. It's not a problem for bare metal hardware and cloud systems with virtio-rng; I have patches that will take care of those scenarios. Unfortunately, both AWS and GCE don't support virtio-rng currently, and as much as some poeple are worried about the hypothetical problems of stupidly written/deployed Python scripts that try to generate long-term secrets during early boot, weighed against the very real prospect of user lossage on two of the most popular Cloud environments out there --- it's simply no contest.
On Sun, Jun 12, 2016 at 4:28 PM, Theodore Ts'o <tytso@mit.edu> wrote:
P.S. BTW, I probably won't change the behaviour of /dev/urandom to make it be blocking. Before I found out about Pyhton Bug #26839, I actually had patches that did make /dev/urandom blocking, and they were planned to for the next kernel merge window. But ultimately, the reason why I won't is because there is a set of real users (Debian Stretch users on Amazon AWS and Google GCE) for which if I changed how /dev/urandom worked, then I would be screwing them over, even if Python 3.5.2 falls back to /dev/urandom. It's not a problem for bare metal hardware and cloud systems with virtio-rng; I have patches that will take care of those scenarios.
Unfortunately, both AWS and GCE don't support virtio-rng currently, and as much as some poeple are worried about the hypothetical problems of stupidly written/deployed Python scripts that try to generate long-term secrets during early boot, weighed against the very real prospect of user lossage on two of the most popular Cloud environments out there --- it's simply no contest.
Speaking of full-stack perspectives, would it affect your decision if Debian Stretch were made robust against blocking /dev/urandom on AWS/GCE? Because I think we could find lots of people who would be overjoyed to fix Stretch before the next merge window even opens (AFAICT the quick fix is literally a 1 line patch), if that allowed the blocking /dev/urandom patches to go in upstream... (It looks like Jessie isn't affected, because while Jessie does provide a systemd-cron package for those who decide to install it, Jessie's systemd-cron is still using python2, python2 doesn't have hash randomization so it doesn't touch /dev/urandom at startup, and systemd-cron doesn't have any code that would trigger access to /dev/urandom otherwise. It looks like Xenial *is* affected, because they ship systemd-cron with python3, but their python3 is still unconditionally using getrandom() in blocking mode, so they need to patch that regardless, and could just as easily make it robust against blocking /dev/urandom at the same time. I don't understand the RPM world as well, but I can't find any evidence that Fedora or SuSE ship systemd-cron at all.) -n -- Nathaniel J. Smith -- https://vorpus.org
On Sun, Jun 12, 2016 at 06:53:54PM -0700, Nathaniel Smith wrote:
Speaking of full-stack perspectives, would it affect your decision if Debian Stretch were made robust against blocking /dev/urandom on AWS/GCE? Because I think we could find lots of people who would be overjoyed to fix Stretch before the next merge window even opens (AFAICT the quick fix is literally a 1 line patch), if that allowed the blocking /dev/urandom patches to go in upstream...
Alas, it's not just Debian. Apparently it breaks the boot on Openwrt as well as Ubuntu Quantal: https://lkml.org/lkml/2016/6/13/48 https://lkml.org/lkml/2016/5/31/599 (Yay for an automated test infrastructure that fires off as soon as you push to an externally visible git repository. :-) I haven't investigated to see exactly *why* it's blowing up on these userspace setups, but it's a great reminder for why changing an established interface is something that has to be done very carefully indeed. - Ted
[whew, actually read the whole thread] On 11 June 2016 at 10:28, Terry Reedy <tjreedy@udel.edu> wrote:
On 6/11/2016 11:34 AM, Guido van Rossum wrote:
In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits
+100 ;-)
I proposed exactly this 2 days ago, 5 hours after Larry's initial post.
No, this is a bad idea. Asking novice developers to make security decisions they're not yet qualified to make when it's genuinely possible for us to do the right thing by default is the antithesis of good security API design, and os.urandom() *is* a security API (whether we like it or not - third party documentation written by the cryptographic software development community has made it so, since it's part of their guidelines for writing security sensitive code in pure Python). Adding *new* APIs is also a bad idea, since "os.urandom() is the right answer on every OS except Linux, and also the best currently available answer on Linux" has been the standard security advice for generating cryptographic secrets in pure Python code for years now, so we should only change that guidance if we have extraordinarily compelling reasons to do so, and we don't. Instead, we have Ted T'so himself chiming in to say: "My preference would be that os.[u]random should block, because the odds that people would be trying to generate long-term cryptographic secrets within seconds after boot is very small, and if you *do* block for a second or two, it's not the end of the world." The *actual bug* that triggered this latest firestorm of commentary (from experts and non-experts alike) had *nothing* to do with user code calling os.urandom, and instead was a combination of: - CPython startup requesting cryptographically secure randomness when it didn't need it - a systemd init script written in Python running before the kernel RNG was fully initialised That created a deadlock between CPython startup and the rest of the Linux init process, so the latter only continued when the systemd watchdog timed out and killed the offending script. As others have noted, this kind of deadlock scenario is generally impossible on other operating systems, as the operating system doesn't provide a way to run Python code before the random number generator is ready. The change Victor made in 3.5.2 to fall back to reading /dev/urandom directly if the getrandom() syscall returns EAGAIN (effectively reverting to the Python 3.4 behaviour) was the simplest possible fix for that problem (and an approach I thoroughly endorse, both for 3.5.2 and for the life of the 3.5 series), but that doesn't make it the right answer for 3.6+. To repeat: the problem encountered was NOT due to user code calling os.urandom(), but rather due to the way CPython initialises its own internal hash algorithm at interpreter startup. However, due to the way CPython is currently implemented, fixing the regression in that not only changed the behaviour of CPython startup, it *also* changed the behaviour of every call to os.urandom() in Python 3.5.2+. For 3.6+, we can instead make it so that the only things that actually rely on cryptographic quality randomness being available are: - calling a secrets module API - calling a random.SystemRandom method - calling os.urandom directly These are all APIs that were either created specifically for use in security sensitive situations (secrets module), or have long been documented (both within our own documentation, and in third party documentation, books and Q&A sites) as being an appropriate choice for use in security sensitive situations (os.urandom and random.SystemRandom). However, we don't need to make those block waiting for randomness to be available - we can update them to raise BlockingIOError instead (which makes it trivial for people to decide for themselves how they want to handle that case). Along with that change, we can make it so that starting the interpreter will never block waiting for cryptographic randomness to be available (since it doesn't need it), and importing the random module won't block waiting for it either. To the best of our knowledge, on all operating systems other than Linux, encountering the new exception will still be impossible in practice, as there is no known opportunity to run Python code before the kernel random number generator is ready. On Linux, init scripts may still run before the kernel random number generator is ready, but will now throw an immediate BlockingIOError if they access an API that relies on crytographic randomness being available, rather than potentially deadlocking the init process. Folks encountering that situation will then need to make an explicit decision: - loop until the exception is no longer thrown - switch to reading from /dev/urandom directly instead of calling os.urandom() - switch to using a cross-platform non-cryptographic API (probably the random module) Victor has some additional technical details written up at http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to formalise this proposed approach as a PEP (the current reference is http://bugs.python.org/issue27282 ) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 06/15/2016 01:01 PM, Nick Coghlan wrote:
For 3.6+, we can instead make it so that the only things that actually rely on cryptographic quality randomness being available are:
- calling a secrets module API - calling a random.SystemRandom method - calling os.urandom directly
However, we don't need to make those block waiting for randomness to be available - we can update them to raise BlockingIOError instead (which makes it trivial for people to decide for themselves how they want to handle that case).
Along with that change, we can make it so that starting the interpreter will never block waiting for cryptographic randomness to be available (since it doesn't need it), and importing the random module won't block waiting for it either.
+1 -- ~Ethan~
On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: [...]
For 3.6+, we can instead make it so that the only things that actually rely on cryptographic quality randomness being available are:
- calling a secrets module API - calling a random.SystemRandom method - calling os.urandom directly
These are all APIs that were either created specifically for use in security sensitive situations (secrets module), or have long been documented (both within our own documentation, and in third party documentation, books and Q&A sites) as being an appropriate choice for use in security sensitive situations (os.urandom and random.SystemRandom).
However, we don't need to make those block waiting for randomness to be available - we can update them to raise BlockingIOError instead (which makes it trivial for people to decide for themselves how they want to handle that case).
Along with that change, we can make it so that starting the interpreter will never block waiting for cryptographic randomness to be available (since it doesn't need it), and importing the random module won't block waiting for it either.
This all seems exactly right to me, to the point that I've been dreading having to find the time to write pretty much this exact email. So thank you :-)
To the best of our knowledge, on all operating systems other than Linux, encountering the new exception will still be impossible in practice, as there is no known opportunity to run Python code before the kernel random number generator is ready.
On Linux, init scripts may still run before the kernel random number generator is ready, but will now throw an immediate BlockingIOError if they access an API that relies on crytographic randomness being available, rather than potentially deadlocking the init process. Folks encountering that situation will then need to make an explicit decision:
- loop until the exception is no longer thrown - switch to reading from /dev/urandom directly instead of calling os.urandom() - switch to using a cross-platform non-cryptographic API (probably the random module)
Victor has some additional technical details written up at http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to formalise this proposed approach as a PEP (the current reference is http://bugs.python.org/issue27282 )
I'd make two additional suggestions: - one person did chime in on the thread to say that they've used os.urandom for non-security-sensitive purposes, simply because it provided a convenient "give me a random byte-string" API that is missing from random. I think we should go ahead and add a .randbytes method to random.Random that simply returns a random bytestring using the regular RNG, to give these users a nice drop-in replacement for os.urandom. Rationale: I don't think the existence of these users should block making os.urandom appropriate for generating secrets, because (1) a glance at github shows that this is very unusual -- if you skim through this search you get page after page of functions with names like "generate_secret_key" https://github.com/search?l=python&p=2&q=urandom&ref=searchresults&type=Code&utf8=%E2%9C%93 and (2) for the minority of people who are using os.urandom for non-security-sensitive purposes, if they find os.urandom raising an error, then this is just a regular bug that they will notice immediately and fix, and anyway it's basically never going to happen. (As far as we can tell, this has never yet happened in the wild, even once.) OTOH if os.urandom is allowed to fail silently, then people who are using it to generate secrets will get silent catastrophic failures, plus those users can't assume it will never happen because they have to worry about active attackers trying to drive systems into unusual states. So I'd much rather ask the non-security-sensitive users to switch to using something in random, than force the cryptographic users to switch to using secrets. But it does seem like it would be good to give those non-security-sensitive users something to switch to :-). - It's not exactly true that the Python interpreter doesn't need cryptographic randomness to initialize SipHash -- it's more that *some* Python invocations need unguessable randomness (to first approximation: all those which are exposed to hostile input), and some don't. And since the Python interpreter has no idea which case it's in, and since it's unacceptable for it to break invocations that don't need unguessable hashes, then it has to err on the side of continuing without randomness. All that's fine. But, given that the interpreter doesn't know which state it's in, there's also the possibility that this invocation *will* be exposed to hostile input, and the 3.5.2+ behavior gives absolutely no warning that this is what's happening. So instead of letting this potential error pass silently, I propose that if SipHash fails to acquire real randomness at startup, then it should issue a warning. In practice, this will almost never happen. But in the rare cases it does, it at least gives the user a fighting chance to realize that their system is in a potentially dangerous state. And by using the warnings module, we automatically get quite a bit of flexibility. If some particular invocation (e.g. systemd-cron) has audited their code and decided that they don't care about this issue, they can make the message go away: PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning OTOH if some particular invocation knows that they do process potentially hostile input early on (e.g. cloud-init, maybe?), then they can explicitly promote the warning to an error: PYTHONWARNINGS=error::NoEntropyAtStartupWarning (I guess the way to implement this would be for the SipHash initialization code -- which runs very early -- to set some flag, and then we expose that flag in sys._something, and later in the startup sequence check for it after the warnings module is functional. Exposing the flag at the Python level would also make it possible for code like cloud-init to do its own explicit check and respond appropriately.) -n -- Nathaniel J. Smith -- https://vorpus.org
On 15 June 2016 at 16:12, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Victor has some additional technical details written up at http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to formalise this proposed approach as a PEP (the current reference is http://bugs.python.org/issue27282 )
I'd make two additional suggestions:
- one person did chime in on the thread to say that they've used os.urandom for non-security-sensitive purposes, simply because it provided a convenient "give me a random byte-string" API that is missing from random. I think we should go ahead and add a .randbytes method to random.Random that simply returns a random bytestring using the regular RNG, to give these users a nice drop-in replacement for os.urandom.
That seems reasonable.
- It's not exactly true that the Python interpreter doesn't need cryptographic randomness to initialize SipHash -- it's more that *some* Python invocations need unguessable randomness (to first approximation: all those which are exposed to hostile input), and some don't. And since the Python interpreter has no idea which case it's in, and since it's unacceptable for it to break invocations that don't need unguessable hashes, then it has to err on the side of continuing without randomness. All that's fine.
But, given that the interpreter doesn't know which state it's in, there's also the possibility that this invocation *will* be exposed to hostile input, and the 3.5.2+ behavior gives absolutely no warning that this is what's happening. So instead of letting this potential error pass silently, I propose that if SipHash fails to acquire real randomness at startup, then it should issue a warning. In practice, this will almost never happen. But in the rare cases it does, it at least gives the user a fighting chance to realize that their system is in a potentially dangerous state. And by using the warnings module, we automatically get quite a bit of flexibility.
If some particular invocation (e.g. systemd-cron) has audited their code and decided that they don't care about this issue, they can make the message go away:
PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning
OTOH if some particular invocation knows that they do process potentially hostile input early on (e.g. cloud-init, maybe?), then they can explicitly promote the warning to an error:
PYTHONWARNINGS=error::NoEntropyAtStartupWarning
(I guess the way to implement this would be for the SipHash initialization code -- which runs very early -- to set some flag, and then we expose that flag in sys._something, and later in the startup sequence check for it after the warnings module is functional. Exposing the flag at the Python level would also make it possible for code like cloud-init to do its own explicit check and respond appropriately.)
A Python level warning/flag seems overly elaborate to me, but we can easily emit a warning on stderr when SipHash is initialised via the fallback rather than the operating system's RNG. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Jun 15, 2016 at 04:12:57PM -0700, Nathaniel Smith wrote:
- It's not exactly true that the Python interpreter doesn't need cryptographic randomness to initialize SipHash -- it's more that *some* Python invocations need unguessable randomness (to first approximation: all those which are exposed to hostile input), and some don't. And since the Python interpreter has no idea which case it's in, and since it's unacceptable for it to break invocations that don't need unguessable hashes, then it has to err on the side of continuing without randomness. All that's fine.
In practice, those Python ivocation which are exposed to hostile input are those that are started while the network are up. The vast majority of time, they are launched by the web brwoser --- and if this happens after a second or so of the system getting networking interrupts, (a) getrandom won't block, and (b) /dev/urandom and getrandom will be initialized. Also, I wish people would say that this is only an issue on Linux. Again, FreeBSD's /dev/urandom will block as well if it is uninitialized. It's just that in practice, for both Linux and Freebsd, we try very hard to make sure /dev/urandom is fully initialized by the time it matters. It's just that so far, it's only on Linux when there was an attempt to use Python in the early init scripts, and in a VM and in a system where everything is modularized such that the deadlock became visible.
(I guess the way to implement this would be for the SipHash initialization code -- which runs very early -- to set some flag, and then we expose that flag in sys._something, and later in the startup sequence check for it after the warnings module is functional. Exposing the flag at the Python level would also make it possible for code like cloud-init to do its own explicit check and respond appropriately.)
I really don't think it's that big a of a deal in *practice*, and but if you really are concerned about the very remote possibility that a Python invocation could start in early boot, and *then* also stick around for the long term, and *then* be exosed to hostile input --- what if you set the flag, and then later on, N minutes, either automatically, or via some trigger such as cloud-init --- try and see if /dev/urandom is initialized (even a few seconds later, so long as the init scripts are hanging, it should be initialized) have Python hash all of its dicts, or maybe just the non-system dicts (since those are presumably the ones mos tlikely to be exposed hostile input). - Ted
On Wed, Jun 15, 2016 at 10:25 PM, Theodore Ts'o <tytso@mit.edu> wrote:
On Wed, Jun 15, 2016 at 04:12:57PM -0700, Nathaniel Smith wrote:
- It's not exactly true that the Python interpreter doesn't need cryptographic randomness to initialize SipHash -- it's more that *some* Python invocations need unguessable randomness (to first approximation: all those which are exposed to hostile input), and some don't. And since the Python interpreter has no idea which case it's in, and since it's unacceptable for it to break invocations that don't need unguessable hashes, then it has to err on the side of continuing without randomness. All that's fine.
In practice, those Python ivocation which are exposed to hostile input are those that are started while the network are up. The vast majority of time, they are launched by the web brwoser --- and if this happens after a second or so of the system getting networking interrupts, (a) getrandom won't block, and (b) /dev/urandom and getrandom will be initialized.
Not sure what you mean about the vast majority of Python invocations being launched by the web browser? But anyway, sure, usually this isn't an issue. This is just discussing about what to do in the unlikely case when it actually has become an issue, and it's hard to be certain that this will *never* happen. E.g. it's entirely plausible that someone will write some cloud-init plugin that exposes an HTTP server or something. People do all kinds of weird things in VMs these days... Basically this is a question of whether we should make an (unlikely) error totally invisible to the user, and "errors should never pass silently" is right there in the Zen of Python :-).
Also, I wish people would say that this is only an issue on Linux. Again, FreeBSD's /dev/urandom will block as well if it is uninitialized. It's just that in practice, for both Linux and Freebsd, we try very hard to make sure /dev/urandom is fully initialized by the time it matters. It's just that so far, it's only on Linux when there was an attempt to use Python in the early init scripts, and in a VM and in a system where everything is modularized such that the deadlock became visible.
(I guess the way to implement this would be for the SipHash initialization code -- which runs very early -- to set some flag, and then we expose that flag in sys._something, and later in the startup sequence check for it after the warnings module is functional. Exposing the flag at the Python level would also make it possible for code like cloud-init to do its own explicit check and respond appropriately.)
I really don't think it's that big a of a deal in *practice*, and but if you really are concerned about the very remote possibility that a Python invocation could start in early boot, and *then* also stick around for the long term, and *then* be exosed to hostile input --- what if you set the flag, and then later on, N minutes, either automatically, or via some trigger such as cloud-init --- try and see if /dev/urandom is initialized (even a few seconds later, so long as the init scripts are hanging, it should be initialized) have Python hash all of its dicts, or maybe just the non-system dicts (since those are presumably the ones mos tlikely to be exposed hostile input).
I don't think this is technically doable. There's no global list of hash tables, and Python exposes the actual hash values to user code with some guarantee that they won't change. -n -- Nathaniel J. Smith -- https://vorpus.org
Nathaniel Smith <njs <at> pobox.com> writes:
On Wed, Jun 15, 2016 at 10:25 PM, Theodore Ts'o <tytso <at> mit.edu> wrote:
In practice, those Python ivocation which are exposed to hostile input are those that are started while the network are up.
Not sure what you mean about the vast majority of Python invocations being launched by the web browser?
"Python invocations which are exposed to hostile input". ;) Stefan Krah
On Jun 16, 2016, at 12:36 AM, Nathaniel Smith wrote:
Basically this is a question of whether we should make an (unlikely) error totally invisible to the user, and "errors should never pass silently" is right there in the Zen of Python :-).
I'd phrase it differently though. To me, it comes down to hand-holding our users who for whatever reason, don't use the appropriate APIs for what they're trying to accomplish. We can educate them through documentation, but I don't think it's appropriate to retrofit existing APIs to different behavior based on those faulty assumptions, because that has other negative effects, such as breaking the promises we make to experienced and knowledgeable developers. To me, the better policy is to admit our mistake in 3.5.0 and 3.5.1, restore pre-existing behavior, accurately document the trade-offs, and provide a clear, better upgrade path for our users. We've done this beautifully and effectively via the secrets module in Python 3.6. Cheers, -Barry
On Jun 16, 2016, at 4:46 AM, Barry Warsaw <barry@python.org> wrote:
We can educate them through documentation, but I don't think it's appropriate to retrofit existing APIs to different behavior based on those faulty assumptions, because that has other negative effects, such as breaking the promises we make to experienced and knowledgeable developers.
You can’t document your way out of a usability problem, in the same way that while it was true that urllib was *documented* to not verify certificates by default, that didn’t matter because a large set of users used it like it did anyways. In my opinion, this is a usability issue as well. You have a ton of third party documentation and effort around “just use urandom” for Cryptographic random which is generally the right (and best!) answer except for this one little niggle on a Linux platform where /dev/urandom *may* produce predictable bytes (but usually doesn’t). That documentation typically doesn’t go into telling people this small niggle because prior to getrandom(0) there wasn’t much they could do about it except use /dev/random which is bad in every other situation but early boot cryptographic keys. Regardless of what we document it as, people are going to use os.urandom for cryptographic purposes because for everyone who doesn’t keep up on exactly what modules are being added to Python who has any idea about cryptography at all is going to look for a Python interface to urandom. That doesn’t even begin to touch the thousands upon thousands of uses that already exist in the wild that are assuming that os.urandom will always give them cryptographic random, who now *need* to write this as: try: from secrets import token_bytes except ImportError: from os import urandom as token_bytes In order to get the best cryptographic random available to them on their system, which assumes they’re even going to notice at all that there’s a new secrets model, and requires each and every use of os.urandom to change. Honestly, I think that the first sentence in the documentation should most obviously be the most pertinent one, and the first sentence here is "Return a string of n random bytes suitable for cryptographic use.”. The bit about how the exact quality depends on the OS and documenting what device it uses is, to my eyes, obviously a hedge to say that “Hey, if this gives you bad random it’s your OSs fault not ours, we can’t produce good random where your OS can’t give us some” and to give people a suggestion of where to look to determine if they’re going to get good random or not. I do not think “uses /dev/urandom” is, or should be considered a core part of this API, it already doesn’t use /dev/urandom on Windows where it doesn’t exist nor does it use /dev/urandom in 3.5+ if it can help it. Using getrandom(0) or using getrandom(GRND_NONBLOCK) and raising an exception on EAGAIN is still accessing the urandom CSPRNG with the same general runtime characteristics of /dev/urandom outside of cases where it’s not safe to actually use /dev/urandom. Frankly, I think it’s a disservice to Python developers to leave in this footgun. — Donald Stufft
On Jun 16, 2016, at 06:04 AM, Donald Stufft wrote:
Regardless of what we document it as, people are going to use os.urandom for cryptographic purposes because for everyone who doesn’t keep up on exactly what modules are being added to Python who has any idea about cryptography at all is going to look for a Python interface to urandom. That doesn’t even begin to touch the thousands upon thousands of uses that already exist in the wild that are assuming that os.urandom will always give them cryptographic random, who now *need* to write this as:
[...]
Frankly, I think it’s a disservice to Python developers to leave in this footgun.
This really gets to the core of our responsibility to our users. Let's start by acknowledging that good-willed people can have different opinions on this, and that we all want to do what's best for our users, although we may have different definitions of "what's best". Since this topic comes up over and over again, it's worth exploring in more detail. Here's my take on it in this context. We have a responsibility to provide stable, well-documented, obvious APIs to our users to provide functionality that is useful and appropriate to the best of our abilities. We have a responsibility to provide secure implementations of that functionality wherever possible. It's in the conflict between these two responsibilities that these heated discussions and differences of opinions come up. This conflict is exposed in the os.urandom() debate because the first responsibility informs us that backward compatibility is more important to maintain because it provides stability and predictability. The second responsibility urges us to favor retrofitting increased security into APIs that for practicality purposes are being used counter to our original intent. It's not that you think backward compatibility is unimportant, or that I think improving security has no value. In the messy mudpit of the middle, we can't seem to have both, as much as I'd argue that providing new, better APIs can give us edible cake. Coming down on either side has its consequences, both known and unintended, and I think in these cases consensus can't be reached. It's for these reasons that we have RMs and BDFLs to break the tie. We must lay out our arguments and trust our Larrys, Neds, and Guidos to make the right --or at least *a*-- decision on a case-by-case basis, and if not agree then accept. Cheers, -Barry
On Jun 16, 2016, at 7:07 AM, Barry Warsaw <barry@python.org> wrote:
On Jun 16, 2016, at 06:04 AM, Donald Stufft wrote:
Regardless of what we document it as, people are going to use os.urandom for cryptographic purposes because for everyone who doesn’t keep up on exactly what modules are being added to Python who has any idea about cryptography at all is going to look for a Python interface to urandom. That doesn’t even begin to touch the thousands upon thousands of uses that already exist in the wild that are assuming that os.urandom will always give them cryptographic random, who now *need* to write this as:
[...]
Frankly, I think it’s a disservice to Python developers to leave in this footgun.
This really gets to the core of our responsibility to our users. Let's start by acknowledging that good-willed people can have different opinions on this, and that we all want to do what's best for our users, although we may have different definitions of "what's best”.
Yes, I don’t think anyone is being malicious :) that’s why I qualified my statement with “I think”, because I don’t believe that whether or not this particular choice is a disservice is a fundamental property of the universe, but rather my opinion influenced by my priorities.
Since this topic comes up over and over again, it's worth exploring in more detail. Here's my take on it in this context.
We have a responsibility to provide stable, well-documented, obvious APIs to our users to provide functionality that is useful and appropriate to the best of our abilities.
We have a responsibility to provide secure implementations of that functionality wherever possible.
It's in the conflict between these two responsibilities that these heated discussions and differences of opinions come up. This conflict is exposed in the os.urandom() debate because the first responsibility informs us that backward compatibility is more important to maintain because it provides stability and predictability. The second responsibility urges us to favor retrofitting increased security into APIs that for practicality purposes are being used counter to our original intent.
Well, I don’t think that for os.urandom someone using it for security is running “counter to it’s original intent”, given that in general urandom’s purpose is for cryptographic random. Someone *may* be using it for something other than that, but it’s pretty explicitly there for security sensitive applications.
It's not that you think backward compatibility is unimportant, or that I think improving security has no value. In the messy mudpit of the middle, we can't seem to have both, as much as I'd argue that providing new, better APIs can give us edible cake.
Right. I personally often fall towards securing the *existing* APIs and adding new, insecure APIs that are obviously so in cases where we can reasonably do that. That’s largely because given an API that’s both being used in security sensitive applications and ones that’s not, the “failure” to be properly secure is almost always a silent failure, while the “failure” to applications that don’t need that security is almost always obvious and immediate. Taking os.urandom as an example, the failure case here for the security side is that you get some bytes that are, to some degree, predictable. There is nobody alive who can look at some bytes and go “oh yep, those bytes are predictable we’re using the wrong API”, thus basically anyone “incorrectly” [1] using this API for security sensitive applications is going to have it just silently doing the wrong thing. On the flip side, if someone is using this API and what they care about is it not blocking, ever, and always giving them some sort of random-ish number no matter how predictable it is, then both of the proposed failure cases are fairly noticeable (to varying degrees), either it blocks long enough for it to matter for those people and they notice and dig in, or it raises an exception and they notice and dig in. In both cases they get some indication that something is wrong.
Coming down on either side has its consequences, both known and unintended, and I think in these cases consensus can't be reached. It's for these reasons that we have RMs and BDFLs to break the tie. We must lay out our arguments and trust our Larrys, Neds, and Guidos to make the right --or at least *a*-- decision on a case-by-case basis, and if not agree then accept.
Right. I’ve personally tried not to personally be the one who keeps pushing for this even after a decree, partially because it’s draining to me to argue for the security side with python-dev [2] and partially because It was ruled on and I lost. However if there continues to be discussion I’ll continue to advocate for what I think is right :) [1] I don’t think using os.urandom is incorrect to use for security sensitive applications and I think it’s a losing battle for Python to try and fight the rest of the world that urandom is not the right answer here. [2] python-dev tends to favor not breaking “working” code over securing existing APIs, even if “working” is silently doing the wrong thing in a security context. This is particularly frustrating when it comes to security because security is by it’s nature the act of taking code that would otherwise execute and making it error, ideally only in bad situations, but this “security’s purpose is to make things break” nature clashes with python-dev’s default of not breaking “working” code in a way that is personally draining to me. — Donald Stufft
On Jun 16, 2016, at 07:34 AM, Donald Stufft wrote:
Well, I don’t think that for os.urandom someone using it for security is running “counter to it’s original intent”, given that in general urandom’s purpose is for cryptographic random. Someone *may* be using it for something other than that, but it’s pretty explicitly there for security sensitive applications.
Except that I disagree. I think os.urandom's original intent, as documented in Python 3.4, is to provide a thin layer over /dev/urandom, with all that implies, and with the documented quality caveats. I know as a Linux developer that if I need to know the details of that, I can `man urandom` and read the gory details. In Python 3.5, I can't do that any more.
Right. I personally often fall towards securing the *existing* APIs and adding new, insecure APIs that are obviously so in cases where we can reasonably do that.
Sure, and I personally fall on the side of maintaining stable, backward compatible APIs, adding new, better, more secure APIs to address deficiencies in real-world use cases. That's because when we break APIs, even with the best of intentions, it breaks people's code in ways and places that we can't predict, and which are very often very difficult to discover. I guess it all comes down to who's yelling at you. ;) Cheers, -Barry P.S. These discussions do not always end in despair. Witness PEP 493.
On Thu, Jun 16, 2016 at 03:24:33PM +0300, Barry Warsaw wrote:
Except that I disagree. I think os.urandom's original intent, as documented in Python 3.4, is to provide a thin layer over /dev/urandom, with all that implies, and with the documented quality caveats. I know as a Linux developer that if I need to know the details of that, I can `man urandom` and read the gory details. In Python 3.5, I can't do that any more.
If Python were to document os.urandom as providing a thin wrapper over /dev/urandom as implemented on Linux, and also document os.getrandom as providing a thin wrapper over getrandom(2) as implemented on Linux. And then say that the best emulation of those two interfaces will be provided say that on other operating systems, and that today the best practice is to call getrandom with the flags set to zero (or defaulted out), that would certainly make me very happy. I could imagine that some people might complain that it is too Linux-centric, or it is not adhering to Python's design principles, but it makes a lot sense of me as a Linux person. :-) Cheers, - Ted
On 16 June 2016 at 12:34, Donald Stufft <donald@stufft.io> wrote:
[1] I don’t think using os.urandom is incorrect to use for security sensitive applications and I think it’s a losing battle for Python to try and fight the rest of the world that urandom is not the right answer here.
[2] python-dev tends to favor not breaking “working” code over securing existing APIs, even if “working” is silently doing the wrong thing in a security context. This is particularly frustrating when it comes to security because security is by it’s nature the act of taking code that would otherwise execute and making it error, ideally only in bad situations, but this “security’s purpose is to make things break” nature clashes with python-dev’s default of not breaking “working” code in a way that is personally draining to me.
Should I take it from these two statements that you do not believe that providing *new* APIs that provide better security compared to a backward compatible but flawed existing implementation is a reasonable approach? And specifically that you don't agree with the decision to provide the new "secrets" module as the recommended interface for getting secure random numbers from Python? One of the aspects of this debate that I'm unclear about is what role the people arguing that os.urandom must change see for the new secrets module. Paul
On Jun 16, 2016, at 8:50 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 16 June 2016 at 12:34, Donald Stufft <donald@stufft.io> wrote:
[1] I don’t think using os.urandom is incorrect to use for security sensitive applications and I think it’s a losing battle for Python to try and fight the rest of the world that urandom is not the right answer here.
[2] python-dev tends to favor not breaking “working” code over securing existing APIs, even if “working” is silently doing the wrong thing in a security context. This is particularly frustrating when it comes to security because security is by it’s nature the act of taking code that would otherwise execute and making it error, ideally only in bad situations, but this “security’s purpose is to make things break” nature clashes with python-dev’s default of not breaking “working” code in a way that is personally draining to me.
Should I take it from these two statements that you do not believe that providing *new* APIs that provide better security compared to a backward compatible but flawed existing implementation is a reasonable approach? And specifically that you don't agree with the decision to provide the new "secrets" module as the recommended interface for getting secure random numbers from Python?
One of the aspects of this debate that I'm unclear about is what role the people arguing that os.urandom must change see for the new secrets module.
Paul
I think the new secrets module is great, particularly for functions other than secrets.token_bytes. If that’s all the secrets module was then I’d argue it shouldn’t exist because we already have os.urandom. IOW I think it solves a different problem than os.urandom, if all you need is cryptographically random bytes, I think that os.urandom is the most obvious thing that someone will reach for given: * Pages upon pages of documentation both inside the Python community and outside saying “use urandom”. * The sheer bulk of existing code that is already out there using os.urandom for it’s cryptographic properties. I also think it’s a great module for providing defaults that we can’t provide in os.urandom, like the number of bytes that are considered “secure” [1]. What I don’t think is that the secrets module means that all of a sudden os.urandom is no longer an API that is primarily used in a security sensitive context [2] and thus we should willfully choose to use a subpar interface to the same CSPRNG when the OS provides us a better one [3] because one small edge case *might* break in a loud an obvious way for the minority of people using this API in a non security sensitive context while leaving the majority of people using this API possible getting silently insecure behavior from it. [1] Of course, what is considered secure is going to be application dependent, but secrets can give a pretty good approximation for the general case. [2] This is one of the things that really gets me about this, it’s not like folks on my side are saying we need to break the pickle module because it’s possible to use it insecurely. That would be silly because one of the primary use cases for that module is using it in a context that is not security sensitive. However, os.urandom is, to the best of my ability to determine and reason, almost always used in a security sensitive context, and thus should make security sensitive trade offs in it’s API. [3] Thus it’s still a small wrapper around OS provided APIs, so we’re not asking for os.py to implement some great big functionality, we’re just asking for it to provide a thin shim over a better interface to the same thing. — Donald Stufft
I also think it’s a great module for providing defaults that we can’t provide in os.urandom, like the number of bytes that are considered “secure” [1].
What I don’t think is that the secrets module means that all of a sudden os.urandom is no longer an API that is primarily used in a security sensitive context
Not all of a sudden. However, I guess things will change in the future. If we want the secrets module to be the first and only place where crypto goes, we should work towards that goal. It needs proper communication, marketing etc. Deprecation periods can be years long. This change (whatever form it will take) can be carried out over 3 or 4 releases when the ultimate goal is made clear to everybody reading the docs. OTOH I don't know whether long deprecation periods are necessary here at all. Other industries are very sensitive to fast changes. Furthermore, next generations will be taught using the new way, so the Python community should not be afraid of some changes because most of them are for the better. On 16.06.2016 15:02, Donald Stufft wrote:
I think that os.urandom is the most obvious thing that someone will reach for given:
* Pages upon pages of documentation both inside the Python community and outside saying “use urandom”. * The sheer bulk of existing code that is already out there using os.urandom for it’s cryptographic properties.
That's maybe you. However, as stated before, I am not expert in this field. So, when I need to, I first would start researching the current state of the art in Python. If the docs says: use the secrets module (e.g. near os.urandom), I would happily comply -- especially when there's reasonable explanation. That's from a newbie's point of view. Best, Sven
On 16 June 2016 at 05:50, Paul Moore <p.f.moore@gmail.com> wrote:
On 16 June 2016 at 12:34, Donald Stufft <donald@stufft.io> wrote:
[1] I don’t think using os.urandom is incorrect to use for security sensitive applications and I think it’s a losing battle for Python to try and fight the rest of the world that urandom is not the right answer here.
[2] python-dev tends to favor not breaking “working” code over securing existing APIs, even if “working” is silently doing the wrong thing in a security context. This is particularly frustrating when it comes to security because security is by it’s nature the act of taking code that would otherwise execute and making it error, ideally only in bad situations, but this “security’s purpose is to make things break” nature clashes with python-dev’s default of not breaking “working” code in a way that is personally draining to me.
Should I take it from these two statements that you do not believe that providing *new* APIs that provide better security compared to a backward compatible but flawed existing implementation is a reasonable approach? And specifically that you don't agree with the decision to provide the new "secrets" module as the recommended interface for getting secure random numbers from Python?
One of the aspects of this debate that I'm unclear about is what role the people arguing that os.urandom must change see for the new secrets module.
The secrets module is great for new code that gets to ignore any version of Python older than 3.6 - it's the "solve this problem for the next generation of developers" answer. All of the complicated "this API is safe for that purpose, this API isn't" discussions get replaced by "do the obvious thing" (i.e. use random for simulations, secrets for security). The os.urandom() debate is about taking the current obvious (because that's what the entire security community is telling you to do) low level way to do it and categorically eliminating any and all caveats on its correctness. Not "it's correct if you use these new flags that are incompatible with older Python versions". Not "it's not correct anymore, use a different API". Just "it's correct, and the newer your Python runtime, the more correct it is". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jun 16, 2016, at 07:34, Donald Stufft wrote:
python-dev tends to favor not breaking “working” code over securing existing APIs, even if “working” is silently doing the wrong thing in a security context. This is particularly frustrating when it comes to security because security is by it’s nature the act of taking code that would otherwise execute and making it error, ideally only in bad situations, but this “security’s purpose is to make things break” nature clashes with python-dev’s default of not breaking “working” code in a way that is personally draining to me.
I was almost about to reply with "Maybe what we need is a new zen of python", then I checked. It turns out we already have "Errors should never pass silently" which fits *perfectly* in this situation. So what's needed is a change to the attitude that if an error passes silently, that making it no longer pass silently is a backward compatibility break. This isn't Java, where the exceptions not thrown by an API are part of that API's contract. We're free to throw new exceptions in a new version of Python.
On Thu, Jun 16, 2016 at 1:04 PM, Donald Stufft <donald@stufft.io> wrote:
In my opinion, this is a usability issue as well. You have a ton of third party documentation and effort around “just use urandom” for Cryptographic random which is generally the right (and best!) answer except for this one little niggle on a Linux platform where /dev/urandom *may* produce predictable bytes (but usually doesn’t).
Why not consider opt-out behavior with environment variables? Eg: people that don't care about crypto mumbojumbo and want fast interpreter startup could just use a PYTHONWEAKURANDOM=y or PYTHONFASTURANDOM=y. That ways there's no need to change api of os.urandom() and users have a clear and easy path to get old behavior. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote:
No, this is a bad idea. Asking novice developers to make security decisions they're not yet qualified to make when it's genuinely possible for us to do the right thing by default is the antithesis of good security API design, and os.urandom() *is* a security API (whether we like it or not - third party documentation written by the cryptographic software development community has made it so, since it's part of their guidelines for writing security sensitive code in pure Python).
Regardless of what third parties have said about os.urandom(), let's look at what *we* have said about it. Going back to pre-churn 3.4 documentation: os.urandom(n) Return a string of n random bytes suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised. For an easy-to-use interface to the random number generator provided by your platform, please see random.SystemRandom. So we very clearly provided platform-dependent caveats on the cryptographic quality of os.urandom(). We also made a strong claim that there's a direct connection between os.urandom() and /dev/urandom on "Unix-like system(s)". We broke that particular promise in 3.5. and semi-fixed it 3.5.2.
Adding *new* APIs is also a bad idea, since "os.urandom() is the right answer on every OS except Linux, and also the best currently available answer on Linux" has been the standard security advice for generating cryptographic secrets in pure Python code for years now, so we should only change that guidance if we have extraordinarily compelling reasons to do so, and we don't.
Disagree. We have broken one long-term promise on os.urandom() ("On a Unix-like system this will query /dev/urandom") and changed another ("should be unpredictable enough for cryptographic applications, though its exact quality depends on OS implementations"). We broke the experienced Linux developer's natural and long-standing link between the API called os.urandom() and /dev/urandom. This breaks pre-3.5 code that assumes read-from-/dev/urandom semantics for os.urandom(). We have introduced churn. Predicting a future SO question such as "Can os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the 3.5.x series, and yes possibly in Python 3.6 and beyond". We have a better answer for "cryptographically appropriate" use cases in Python 3.6 - the secrets module. Trying to make os.urandom() "the right answer on every OS" weakens the promotion of secrets as *the* module to use for cryptographically appropriate use cases. IMHO it would be better to leave os.urandom() well enough alone, except for the documentation which should effectively say, a la 3.4: os.urandom(n) Return a string of n random bytes suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised. Cryptographic applications should use the secrets module for stronger guaranteed sources of randomness. For an easy-to-use interface to the random number generator provided by your platform, please see random.SystemRandom. Cheers, -Barry
On 06/15/2016 11:45 PM, Barry Warsaw wrote:
So we very clearly provided platform-dependent caveats on the cryptographic quality of os.urandom(). We also made a strong claim that there's a direct connection between os.urandom() and /dev/urandom on "Unix-like system(s)".
We broke that particular promise in 3.5. and semi-fixed it 3.5.2.
Well, 3.5.2 hasn't happened yet. So if you see it as still being broken, please speak up now. Why do you call it only "semi-fixed"? As far as I understand it, the semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading from /dev/urandom directly, except it may not need to use a file handle. //arry/
On Jun 15, 2016, at 11:52 PM, Larry Hastings wrote:
Well, 3.5.2 hasn't happened yet. So if you see it as still being broken, please speak up now.
In discussion with other Ubuntu developers, several salient points were raised. The documentation for os.urandom() in 3.5.2rc1 doesn't make sense: On Linux, getrandom() syscall is used if available and the urandom entropy pool is initialized (getrandom() does not block). On a Unix-like system this will query /dev/urandom. Perhaps better would be: Where available, the getrandom() syscall is used (with the GRND_NONBLOCK flag) if available and the urandom entropy pool is initialized. When getrandom() returns EAGAIN because of insufficient entropy, fallback to reading from /dev/urandom. When the getrandom() syscall is unavailable on other Unix-like systems, this will query /dev/urandom. It's actually a rather twisty maze of code to verify these claims, and I'm nearly certain we don't have any tests to guarantee this is what actually happens in those cases, so there are many caveats. This means that an experienced developer can no longer just `man urandom` to understand the unique operational behavior of os.urandom() on their platform, but instead would be forced to actually read our code to find out what's actually happening when/if things break. It is unacceptable if any new exceptions are raised when insufficient entropy is available. Python 3.4 essentially promises that "if only crap entropy is available, you'll get crap, but at least it won't block and no exceptions are raised". Proper backward compatibility requires the same in 3.5 and beyond. Are we sure that's still the case? Using the system call *may* be faster in the we-have-good-entropy-case, but it will definitely be slower in the we-don't-have-good-entropy-case (because of the fallback logic). Maybe that doesn't matter in practice but it's worth noting.
Why do you call it only "semi-fixed"? As far as I understand it, the semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading from /dev/urandom directly, except it may not need to use a file handle.
Semi-fixed because os.urandom() will still not be strictly backward compatible between Python 3.5.2 and 3.4. *If* we can guarantee that os.urandom() will never block or raise an exception when only poor entropy is available, then it may be indeed indistinguishably backward compatible for most if not all cases. Cheers, -Barry
On 06/16/2016 01:03 AM, Barry Warsaw wrote:
*If* we can guarantee that os.urandom() will never block or raise an exception when only poor entropy is available, then it may be indeed indistinguishably backward compatible for most if not all cases.
I stepped through the code that shipped in 3.5.2rc1. It only ever calls getrandom() with the GRND_NONBLOCK flag. If getrandom() returns -1 and errno is EAGAIN it falls back to /dev/urandom--I actually simulated this condition in gdb and watched it open /dev/urandom. I didn't see any code for raising an exception or blocking when only poor entropy is available. As Robert Collins points out, this does change the behavior ever-so-slightly from 3.4; if urandom is initialized, and the kernel has the getrandom system call, getrandom() will give us the bytes we asked for and we won't open and read from /dev/urandom. In this state os.urandom() behaves ever-so-slightly differently: * os.urandom() will now work in chroot environments where /dev/urandom doesn't exist. * If Python runs in a chroot environment with a fake /dev/urandom, we'll ignore that and use the kernel's urandom device. * If the sysadmin changed what the systemwide /dev/urandom points to, we'll ignore that and use the kernel's urandom device. But os.urandom() is documented as calling getrandom() when available in 3.5... though doesn't detail how it calls it or what it uses the result for. Anyway, I feel these differences were minor, and covered by the documented change in 3.5, so I thought it was reasonable and un-broken. If this isn't backwards-compatible enough to suit you, please speak up now! //arry/
On Jun 16, 2016, at 01:40 AM, Larry Hastings wrote:
As Robert Collins points out, this does change the behavior ever-so-slightly from 3.4;
Ah yes, I misunderstood Robert's point.
if urandom is initialized, and the kernel has the getrandom system call, getrandom() will give us the bytes we asked for and we won't open and read from /dev/urandom. In this state os.urandom() behaves ever-so-slightly differently:
* os.urandom() will now work in chroot environments where /dev/urandom doesn't exist. * If Python runs in a chroot environment with a fake /dev/urandom, we'll ignore that and use the kernel's urandom device. * If the sysadmin changed what the systemwide /dev/urandom points to, we'll ignore that and use the kernel's urandom device.
But os.urandom() is documented as calling getrandom() when available in 3.5... though doesn't detail how it calls it or what it uses the result for. Anyway, I feel these differences were minor, and covered by the documented change in 3.5, so I thought it was reasonable and un-broken.
If this isn't backwards-compatible enough to suit you, please speak up now!
It does seem like a narrow corner case, which of course means *someone* will be affected by it <wink>. I'll leave it up to you, though it should at least be clearly documented. Let's hope the googles will also help our hypothetical future head-scratcher. Cheers, -Barry
On Thu, Jun 16, 2016, at 04:03, Barry Warsaw wrote:
*If* we can guarantee that os.urandom() will never block or raise an exception when only poor entropy is available, then it may be indeed indistinguishably backward compatible for most if not all cases.
Why can't we exclude cases when only poor entropy is available from "most if not all cases"?
On Jun 16, 2016, at 09:51 AM, Random832 wrote:
On Thu, Jun 16, 2016, at 04:03, Barry Warsaw wrote:
*If* we can guarantee that os.urandom() will never block or raise an exception when only poor entropy is available, then it may be indeed indistinguishably backward compatible for most if not all cases.
Why can't we exclude cases when only poor entropy is available from "most if not all cases"?
Because if it blocks or raises a new exception on poor entropy it's an API break. Cheers, -Barry
On Thu, Jun 16, 2016, at 10:04, Barry Warsaw wrote:
On Jun 16, 2016, at 09:51 AM, Random832 wrote:
On Thu, Jun 16, 2016, at 04:03, Barry Warsaw wrote:
*If* we can guarantee that os.urandom() will never block or raise an exception when only poor entropy is available, then it may be indeed indistinguishably backward compatible for most if not all cases.
Why can't we exclude cases when only poor entropy is available from "most if not all cases"?
Because if it blocks or raises a new exception on poor entropy it's an API break.
Yes, but in only very rare cases. Which as I *just said* makes it backwards compatible for "most" cases.
On Wed, Jun 15, 2016 at 11:45 PM, Barry Warsaw <barry@python.org> wrote:
On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote:
No, this is a bad idea. Asking novice developers to make security decisions they're not yet qualified to make when it's genuinely possible for us to do the right thing by default is the antithesis of good security API design, and os.urandom() *is* a security API (whether we like it or not - third party documentation written by the cryptographic software development community has made it so, since it's part of their guidelines for writing security sensitive code in pure Python).
Regardless of what third parties have said about os.urandom(), let's look at what *we* have said about it. Going back to pre-churn 3.4 documentation:
os.urandom(n) Return a string of n random bytes suitable for cryptographic use.
This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised.
For an easy-to-use interface to the random number generator provided by your platform, please see random.SystemRandom.
So we very clearly provided platform-dependent caveats on the cryptographic quality of os.urandom(). We also made a strong claim that there's a direct connection between os.urandom() and /dev/urandom on "Unix-like system(s)".
We broke that particular promise in 3.5. and semi-fixed it 3.5.2.
Adding *new* APIs is also a bad idea, since "os.urandom() is the right answer on every OS except Linux, and also the best currently available answer on Linux" has been the standard security advice for generating cryptographic secrets in pure Python code for years now, so we should only change that guidance if we have extraordinarily compelling reasons to do so, and we don't.
Disagree.
We have broken one long-term promise on os.urandom() ("On a Unix-like system this will query /dev/urandom") and changed another ("should be unpredictable enough for cryptographic applications, though its exact quality depends on OS implementations").
We broke the experienced Linux developer's natural and long-standing link between the API called os.urandom() and /dev/urandom. This breaks pre-3.5 code that assumes read-from-/dev/urandom semantics for os.urandom().
We have introduced churn. Predicting a future SO question such as "Can os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the 3.5.x series, and yes possibly in Python 3.6 and beyond".
It also depends on the kernel version, since it will never block on old kernels that are missing getrandom(), but it might block on future kernels if Linux's /dev/urandom ever becomes blocking. (Ted's said that this is not going to happen now, but the only reason it isn't was that he tried to make the change and it broke some distros that are still in use -- so it seems entirely possible that it will happen a few years from now.)
We have a better answer for "cryptographically appropriate" use cases in Python 3.6 - the secrets module. Trying to make os.urandom() "the right answer on every OS" weakens the promotion of secrets as *the* module to use for cryptographically appropriate use cases.
IMHO it would be better to leave os.urandom() well enough alone, except for the documentation which should effectively say, a la 3.4:
os.urandom(n) Return a string of n random bytes suitable for cryptographic use.
This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised.
Cryptographic applications should use the secrets module for stronger guaranteed sources of randomness.
For an easy-to-use interface to the random number generator provided by your platform, please see random.SystemRandom.
This is not an accurate docstring, though. The more accurate docstring for your proposed behavior would be: os.urandom(n) Return a string of n bytes that will usually, but not always, be suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. On non-Linux OSes, this uses the best available source of randomness, e.g. CryptGenRandom() on Windows and /dev/urandom on OS X, and thus will be strong enough for cryptographic use. However, on Linux it uses a deprecated API (/dev/urandom) which in rare cases is known to return bytes that look random, but aren't. There is no way to know when this has happened; your code will just silently stop being secure. In some unusual configurations, where Python is not configured with any source of randomness, it will raise NotImplementedError. You should never use this function. If you need unguessable random bytes, then the 'secrets' module is always a strictly better choice -- unlike this function, it always uses the best available source of cryptographic randomness, even on Linux. Alternatively, if you need random bytes but it doesn't matter whether other people can guess them, then the 'random' module is always a strictly better choice -- it will be faster, as well as providing useful features like deterministic seeding. --- In practice, your proposal means that ~all existing code that uses os.urandom becomes incorrect and should be switched to either secrets or random. This is *far* more churn for end-users than Nick's proposal. ...Anyway, since there's clearly going to be at least one PEP about this, maybe we should stop rehashing bits and pieces of the argument in these long threads that most people end up skipping and then rehashing again later? -n -- Nathaniel J. Smith -- https://vorpus.org
Nathaniel Smith <njs <at> pobox.com> writes:
In practice, your proposal means that ~all existing code that uses os.urandom becomes incorrect and should be switched to either secrets or random. This is *far* more churn for end-users than Nick's proposal.
This should only concern code that a) was specifically written for 3.5.0/3.5.1 and b) implements a serious cryptographic application in Python. I think b) is not a good idea anyway due to timing and side channel attacks and the lack of secure wiping of memory. Such applications should be written in C, where one does not have to predict the behavior of multiple layers of abstractions. Stefan Krah
On 16 Jun 2016, at 09:19, Stefan Krah <stefan@bytereef.org> wrote:
This should only concern code that a) was specifically written for 3.5.0/3.5.1 and b) implements a serious cryptographic application in Python.
I think b) is not a good idea anyway due to timing and side channel attacks and the lack of secure wiping of memory. Such applications should be written in C, where one does not have to predict the behavior of multiple layers of abstractions.
No, it concerns code that generates its random numbers from Python. For example, you may want to use AES GCM to encrypt a file at rest. AES GCM requires the use of an nonce, and has only one rule about this nonce: you MUST NOT, under any circumstances, re-use an nonce/key combination. If you do, AES GCM fails catastrophically (I cannot emphasise this enough, re-using a nonce/key combination in AES GCM totally destroys all the properties the algorithm provides)[0]. You can use a C implementation of all of the AES logic, including offload to your x86 CPU with its fancy AES GCM instruction set. However, you *need* to provide an nonce: AES GCM can’t magically guess what it is, and it needs to be communicated in some way for the decryption[1]. In situations where you do not have an easily available nonce (you do have it for TLS, for example), you will need to provide one, and the logical and obvious thing to do is to use a random number. Your Python application needs to obtain that random number, and the safest way to do it is via os.urandom(). This is the problem with this argument: we cannot wave our hands and say “os.urandom can be as unsafe as we want because crypto code must not be written in Python”. Even if we never implement an algorithm in Python (and I agree with you that crypto primitives in general should not be implemented in Python for the exact reasons you suggest), most algorithms require the ability to be provided with good random numbers by their callers. As long as crypto algorithms require good nonces, Python needs access to a secure CSPRNG. Kernel CSPRNGs are *strongly* favoured for many reasons that I won’t go into here, so os.urandom is our winner. python-dev cannot wash its hands of the security decision here. As I’ve said many times, I’m pleased to see the decision makers have not done that: while I don’t agree with their decision, I totally respect that it was theirs to make, and they made it with all of the facts. Cory [0]: Someone will *inevitably* point out that other algorithms resist nonce misuse somewhat better than this. While that’s true, it’s a) not relevant, because some standards require use of the non-NMR algorithms, and b) unhelpful, because even if we could switch, we’d need access to the better primitives, which we don’t have. [1]: Again, to head off some questions at the pass: the reason nonces are usually provided by the user of the algorithm is that sometimes they’re generated semi-deterministically. For example, TLS generates a unique key for each session (again, requiring randomness, but that’s neither here nor there), and so TLS can use deterministic *but non-repeated* nonces, which in practice it derives from record numbers. Because you have two options (re-use keys with random nonces, or random keys with deterministic nonces), a generic algorithm implementation does not constrain your choice of nonce.
Cory Benfield <cory <at> lukasa.co.uk> writes:
python-dev cannot wash its hands of the security decision here. As I’ve said many times, I’m pleased to see the decision makers have not done that: while I don’t agree with their decision, I totally respect that it was theirs to make, and they made it with all of the facts.
I think the sysadmin's responsibility still plays a major role here. If a Linux system crucially relies on the quality of /dev/urandom, it should be possible to insert a small C program (call it ensure_random) into the boot sequence that does *exactly* what Python did in the bug report: block until entropy is available. Well, it *was* possible with SysVinit ... :) Python is not the only application that needs a secure /dev/urandom. Stefan Krah
On Jun 16, 2016 1:23 AM, "Stefan Krah" <stefan@bytereef.org> wrote:
Nathaniel Smith <njs <at> pobox.com> writes:
In practice, your proposal means that ~all existing code that uses os.urandom becomes incorrect and should be switched to either secrets or random. This is *far* more churn for end-users than Nick's proposal.
This should only concern code that a) was specifically written for 3.5.0/3.5.1 and b) implements a serious cryptographic application in Python.
I think b) is not a good idea anyway due to timing and side channel attacks and the lack of secure wiping of memory. Such applications should be written in C, where one does not have to predict the behavior of multiple layers of abstractions.
This is completely unhelpful. Firstly because it's an argument that os.urandom and the secrets module shouldn't exist, which doesn't tell is much about what their behavior should be given that they do exist, and secondly because it fundamentally misunderstands why they exist. The word "cryptographic" here is a bit of a red herring. The guarantee that a CSPRNG makes is that the output should be *unguessable by third parties*. There are plenty of times when this is what you need even when you aren't using actual cryptography. For example, when someone logs into a web app, I may want to send back a session cookie so that I can recognize this person later without making then reauthenticate all the time. For this to work securely, it's extremely important that no one else be able to predict what session cookie I sent, because if you can guess the cookie then you can impersonate the user. In python 2.3-3.5, the most correct way to write this code is to use os.urandom. The question in this thread is whether we should break that in 3.6, so that conscientious users are forced to switch existing code over to using the secrets module if they want to continue to get the most correct available behavior, or whether we should preserve that in 3.6, so that code like my hypothetical web app that was correct on 2.3-3.5 remains correct on 3.6 (with the secrets module being a more friendly wrapper that we recommend for new code, but with no urgency about porting existing code to it). -n
On 16 June 2016 at 16:58, Nathaniel Smith <njs@pobox.com> wrote:
The word "cryptographic" here is a bit of a red herring. The guarantee that a CSPRNG makes is that the output should be *unguessable by third parties*. There are plenty of times when this is what you need even when you aren't using actual cryptography. For example, when someone logs into a web app, I may want to send back a session cookie so that I can recognize this person later without making then reauthenticate all the time. For this to work securely, it's extremely important that no one else be able to predict what session cookie I sent, because if you can guess the cookie then you can impersonate the user.
In python 2.3-3.5, the most correct way to write this code is to use os.urandom. The question in this thread is whether we should break that in 3.6, so that conscientious users are forced to switch existing code over to using the secrets module if they want to continue to get the most correct available behavior, or whether we should preserve that in 3.6, so that code like my hypothetical web app that was correct on 2.3-3.5 remains correct on 3.6 (with the secrets module being a more friendly wrapper that we recommend for new code, but with no urgency about porting existing code to it).
While your example is understandable and clear, it's also a bit of a red herring as well. Nobody's setting up a web session cookie during the first moments of Linux boot (are they?), so os.urandom is perfectly OK in all cases here. We have a new API in 3.6 that might better express the *intent* of generating a secret token, but (cryptographic) correctness is the same either way for this example. As someone who isn't experienced in crypto, I genuinely don't have the slightest idea of what sort of program we're talking about that is written in Python, runs in the early stages of OS startup, and needs crypto-strength random numbers. So I can't reason about whether the proposed solutions are sensible. Would such programs be used in a variety of environments with different Python versions? Would the developers be non-specialists? Which of the mistakes being made that result in a vulnerability is the easiest to solve (move the code to run later, modify the Python code, require a fixed version of Python)? How severe is the security hole compared to others (for example, users with weak passwords)? What attacks are possible, and what damage could be done? (I know that in principle, any security hole needs to be plugged, but I work in an environment where production services with a password of "password" exist, and applying system security patches is treated as a "think about it when things are quiet" activity - so forgive me if I don't immediately understand why obscure vulnerabilities are important). I'm willing to accept the view of the security experts that there's a problem here. But without a clear explanation of the problem, how can a non-specialist like myself have an opinion? (And I hope the security POV isn't "you don't need an opinion, just do as we say"). Paul
On 16 June 2016 at 09:39, Paul Moore <p.f.moore@gmail.com> wrote:
I'm willing to accept the view of the security experts that there's a problem here. But without a clear explanation of the problem, how can a non-specialist like myself have an opinion? (And I hope the security POV isn't "you don't need an opinion, just do as we say").
If you're not writing Linux (and presumably *BSD) scripts and applications that run during system initialisation or on embedded ARM hardware with no good sources of randomness, then there's zero chance of any change made in relation to this affecting you (Windows and Mac OS X are completely immune, since they don't allow Python scripts to run early enough in the boot sequence for there to ever be a problem). The only question at hand is what CPython should do in the case where the operating system *does* let Python scripts run before the system random number generator is ready, and the application calls a security sensitive API that relies on that RNG: - throw BlockingIOError (so the script developer knows they have a potential problem to fix) - block (so the script developer has a system hang to debug) - return low quality random data (so the script developer doesn't even know they have a potential problem) The last option is the status quo, and has a remarkable number of vocal defenders. The second option is what we changed the behaviour to in 3.5 as a side effect of switching to a syscall to save a file descriptor (and *also* inadvertently made a gating requirement for CPython starting at all, without which I'd be very surprised if anyone actually noticed the potentially blocking behaviour in os.urandom itself) The first option is the one I'm currently writing a PEP for, since it makes the longstanding advice to use os.urandom() as the low level random data API for security sensitive operations unequivocally correct (as it will either do the right thing, or throw an exception which the developer can handle as appropriate for their particular application) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jun 16 2016, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 June 2016 at 09:39, Paul Moore <p.f.moore@gmail.com> wrote:
I'm willing to accept the view of the security experts that there's a problem here. But without a clear explanation of the problem, how can a non-specialist like myself have an opinion? (And I hope the security POV isn't "you don't need an opinion, just do as we say").
If you're not writing Linux (and presumably *BSD) scripts and applications that run during system initialisation or on embedded ARM hardware with no good sources of randomness, then there's zero chance of any change made in relation to this affecting you (Windows and Mac OS X are completely immune, since they don't allow Python scripts to run early enough in the boot sequence for there to ever be a problem).
The only question at hand is what CPython should do in the case where the operating system *does* let Python scripts run before the system random number generator is ready, and the application calls a security sensitive API that relies on that RNG:
- throw BlockingIOError (so the script developer knows they have a potential problem to fix) - block (so the script developer has a system hang to debug) - return low quality random data (so the script developer doesn't even know they have a potential problem)
The last option is the status quo, and has a remarkable number of vocal defenders.
*applaud* Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
On 16 June 2016 at 18:03, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 June 2016 at 09:39, Paul Moore <p.f.moore@gmail.com> wrote:
I'm willing to accept the view of the security experts that there's a problem here. But without a clear explanation of the problem, how can a non-specialist like myself have an opinion? (And I hope the security POV isn't "you don't need an opinion, just do as we say").
If you're not writing Linux (and presumably *BSD) scripts and applications that run during system initialisation or on embedded ARM hardware with no good sources of randomness, then there's zero chance of any change made in relation to this affecting you (Windows and Mac OS X are completely immune, since they don't allow Python scripts to run early enough in the boot sequence for there to ever be a problem).
Understood. I could quite happily ignore this thread for all the impact it will have on me. However, I've seen enough of these debates (and witnessed the frustration of the security advocates) that I want to try to understand the issues better - as much as anything so that I don't end up adding uninformed opposition to these threads (in my day job, unfortunately, security is generally the excuse for all sorts of counter-productive rules, and never offers any practical benefits that I am aware of, so I'm predisposed to rejecting arguments based on security - that background isn't accurate in this environment and I'm actively trying to counter it).
The only question at hand is what CPython should do in the case where the operating system *does* let Python scripts run before the system random number generator is ready, and the application calls a security sensitive API that relies on that RNG:
- throw BlockingIOError (so the script developer knows they have a potential problem to fix) - block (so the script developer has a system hang to debug) - return low quality random data (so the script developer doesn't even know they have a potential problem)
The last option is the status quo, and has a remarkable number of vocal defenders.
Understood. It seems to me that there are two arguments here - backward compatibility (which is always a pressure, but sometimes applied too vigourously and not always consistently) and "we've always done it that way" (aka "people will have to consider what happens when they run under 3.4 anyway, so how will changing help?"). Jusging backward compatibility is always a matter of trade-offs, hence my interest in the actual benefits.
The second option is what we changed the behaviour to in 3.5 as a side effect of switching to a syscall to save a file descriptor (and *also* inadvertently made a gating requirement for CPython starting at all, without which I'd be very surprised if anyone actually noticed the potentially blocking behaviour in os.urandom itself)
OK, so (given that the issue of CPython starting at all was an accidental, and now corrected, side effect) why is this so bad? Maybe not in a minor release, but at least for 3.6? How come this has caused such a fuss? I genuinely don't understand why people see blocking as such an issue (and as far as I can tell, Ted Tso seems to agree). The one case where this had an impact was a quickly fixed bug - so as far as I can tell, the risk of problems caused by blocking is purely hypothetical.
The first option is the one I'm currently writing a PEP for, since it makes the longstanding advice to use os.urandom() as the low level random data API for security sensitive operations unequivocally correct (as it will either do the right thing, or throw an exception which the developer can handle as appropriate for their particular application)
In my code, I typically prefer Python to make detailed decisions for me (e.g. requests follows redirects by default, it doesn't expect me to do so manually). Now certainly this is a low-level interface so the rules are different, but I don't see why blocking by default isn't "unequivocally correct" in the same way that it is on other platforms, rather than raising an exception and requiring the developer to do the wait manually. (What else would they do - fall back to insecure data? I thought the point here was that that's the wrong thing to do?) Having a blocking default with a non-blocking version seems just as arguable, and has the advantage that naive users (I don't even know if we're allowing for naive users here) won't get an unexpected exception and handle it badly because they don't know what to do (a sadly common practice in my experience). OK. Guido has pronounced, you're writing a PEP. None of this debate is really constructive any more. But I still don't understand the trade-offs, which frustrates me. Surely security isn't so hard that it can't be explained in a way that an interested layman like myself can follow? :-( Paul
On Thu, Jun 16, 2016 at 11:58 AM, Nathaniel Smith <njs@pobox.com> wrote:
[...] no one else be able to predict what session cookie I sent [...] In python 2.3-3.5, the most correct way to write this code is to use os.urandom. The question in this thread is whether we should break that in 3.6, so that conscientious users are forced to switch existing code over to using the secrets module if they want to continue to get the most correct available behavior, or whether we should preserve that in 3.6, so that code like my hypothetical web app that was correct on 2.3-3.5 remains correct on 3.6
This is kinda silly. Unless you specifically wrote your code for Python 3.5.1, and NOT for 2.3.x through 3.4.x, your code is NO WORSE in 3.5.2 than it has been under all those prior versions. The cases where the behavior in everything other than 3.5.0-3.5.1 is suboptimal are *extremely limited*, as you understand (things that run in Python very early in the boot process, and only on recent versions of Linux, no other OS). This does not even remotely describe the web-server-with-cookies example that you outline. Python 3.6 is introducing a NEW MODULE, with new APIs. The 'secrets' module is the very first time that Python has ever really explicitly addressed cryptography in the standard library. Yes, there have been third-party modules and libraries, but any cryptographic application of Python prior to 'secrets' is very much roll-your-own and know-what-you-are-doing. Yes, there has been a history of telling people to "use os.urandom()" on StackOverflow and places like that. That's about the best advice that was available prior to 3.6. Adding a new module and API is specifically designed to allow for a better answer, otherwise there'd be no reason to include it. And that advice that's been on StackOverflow and wherever has been subject to the narrow, edge-case flaw we've discussed here for at least a decade without anyone noticing or caring. It seems to me that backporting 'secrets' and putting it on Warehouse would be a lot more productive than complaining about 3.5.2 reverting to (almost) the behavior of 2.3-3.4. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On 16 June 2016 at 10:01, David Mertz <mertz@gnosis.cx> wrote:
It seems to me that backporting 'secrets' and putting it on Warehouse would be a lot more productive than complaining about 3.5.2 reverting to (almost) the behavior of 2.3-3.4.
"Let Flask/Django/passlib/cryptography/whatever handle the problem rather than rolling your own" is already the higher level meta-guidance. However, there are multiple levels of improvement being pursued here, since developer ignorance of security concerns and problematic defaults at the language level is a chronic problem rather than an acute one (and one that affects all languages, not just Python). In that context, the main benefit of the secrets module is as a deterrent against people reaching for the reproducible simulation focused random module to implement security sensitive operations. By offering both secrets and random in the standard library, we help make it clear that secrecy and simulation are *not the same problem*, even though they both involve random numbers. Folks that learn Python 3.6 first and then later start supporting earlier versions are likely to be more aware of the difference, and hence go looking for "What's the equivalent of the secrets module on earlier Python versions?" (at which point they can just copy whichever one-liner they actually need into their particular application - just as not every 3 line function needs to be a builtin, not every 3 line function needs to be a module on PyPI) The os.urandom proposal is aimed more at removing any remaining equivocation from the longstanding "Use os.urandom() for security sensitive operations in Python" advice - it's for the benefit of folks that are *already* attempting to do the right thing given the tools they have available. The sole source of that equivocation is that in some cases, at least on Linux, and potentially on *BSD (although we haven't seen a confirmed reproducer there), os.urandom() may return results that are sufficiently predictable to be inappropriate for use in security sensitive applications. At the moment, determining whether or not you're risking exposure to that problem requires that you know a whole lot about Linux (and *BSD, where even we haven't been able to determine the level of exposure on embedded systems), and also about how ``os.urandom()`` is implemented on different platforms. My proposal is that we do away with the requirement for all that assumed knowledge and instead say "Are you using os.urandom(), random.SystemRandom(), or an API in the secrets module? Are you using Python 3.6+? Did it raise BlockingIOError? No? Then you're fine". The vast majority of Python developers will thus be free to remain entirely ignorant of these platform specific idiosyncracies, while those that have a potential need to know will get an exception from the interpreter that they can then feed into a search engine and get pointed in the right direction. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jun 16, 2016 at 10:26:22AM -0700, Nick Coghlan wrote:
meta-guidance. However, there are multiple levels of improvement being pursued here, since developer ignorance of security concerns and problematic defaults at the language level is a chronic problem rather than an acute one (and one that affects all languages, not just Python).
For a while Christian Heimes has speculated on Twitter about writing a Secure Programming HOWTO. At the last language summit in Montreal, I told him I'd be happy to do the actual writing and editing if given a detailed outline. (I miss not having an ongoing writing project since ceasing to write the "What's New", but have no ideas for anything to write about.) That offer is still open, if Christian or someone else wants to produce an outline. --amk
On Jun 16, 2016 10:01 AM, "David Mertz" <mertz@gnosis.cx> wrote:
Python 3.6 is introducing a NEW MODULE, with new APIs. The 'secrets' module is the very first time that Python has ever really explicitly addressed cryptography in the standard library.
This is completely, objectively untrue. If you look up os.urandom in the official manual for the standard library, then it have always stated explicitly, as the very first line, that os.urandom returns "a string of n random bytes suitable for cryptographic use." This is *exactly* the same explicit guarantee that the secrets module makes. The motivation for adding the secrets module was to make this functionality easier to find and more convenient to use (e.g. by providing convenience functions for getting random strings of ASCII characters), not to suddenly start addressing cryptographic concerns for the first time. (Will try to address other more nuanced points later.) -n
On 16 June 2016 at 10:40, Nathaniel Smith <njs@pobox.com> wrote:
On Jun 16, 2016 10:01 AM, "David Mertz" <mertz@gnosis.cx> wrote:
Python 3.6 is introducing a NEW MODULE, with new APIs. The 'secrets' module is the very first time that Python has ever really explicitly addressed cryptography in the standard library.
This is completely, objectively untrue. If you look up os.urandom in the official manual for the standard library, then it have always stated explicitly, as the very first line, that os.urandom returns "a string of n random bytes suitable for cryptographic use." This is *exactly* the same explicit guarantee that the secrets module makes. The motivation for adding the secrets module was to make this functionality easier to find and more convenient to use (e.g. by providing convenience functions for getting random strings of ASCII characters), not to suddenly start addressing cryptographic concerns for the first time.
An analogy that occurred to me that may help some folks: secrets is a higher level API around os.urandom and some other standard library features (like base64 and binascii.hexlify) in the same way that shutil and pathlib are higher level APIs that aggregate other os module functions with other parts of the standard library. The existence of those higher level APIs doesn't make the lower level building blocks redundant. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jun 16, 2016, at 01:01 PM, David Mertz wrote:
It seems to me that backporting 'secrets' and putting it on Warehouse would be a lot more productive than complaining about 3.5.2 reverting to (almost) the behavior of 2.3-3.4.
Very wise suggestion indeed. We have all kinds of stdlib modules backported and released as third party packages. Why not secrets too? If such were on PyPI, I'd happily package it up for the Debian ecosystem. Problem solved <wink>. But I'm *really* going to try to disengage from this discussion until Nick's PEP is posted. Cheers, -Barry
On 16 June 2016 at 13:09, Barry Warsaw <barry@python.org> wrote:
On Jun 16, 2016, at 01:01 PM, David Mertz wrote:
It seems to me that backporting 'secrets' and putting it on Warehouse would be a lot more productive than complaining about 3.5.2 reverting to (almost) the behavior of 2.3-3.4.
Very wise suggestion indeed. We have all kinds of stdlib modules backported and released as third party packages. Why not secrets too? If such were on PyPI, I'd happily package it up for the Debian ecosystem. Problem solved <wink>.
The secrets module is just a collection of one liners pulling together other stdlib components that have been around for years - the main problem it aims to address is one of discoverability (rather than one of code complexity), while also eliminating the "simulation is in the standard library, secrecy requires a third party module" discrepancy in the long term. Once you're aware the problem exists, the easiest way to use it in a version independent manner is to just copy the relevant snippet into your own project's utility library - adding an entire new dependency to your project just for those utility functions would be overkill. If you *do* add a dependency, you'd typically be better off with something more comprehensive and tailored to the particular problem domain you're dealing with, like passlib or cryptography or itsdangerous. Cheers, Nick. P.S. Having the secrets module available on PyPI wouldn't *hurt*, I just don't think it would help much. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Yes 'secrets' is one-liners. However, it might grow a few more lines around the blocking in getrandom() on Linux. But still, not more than a few. But the reason it should be on PyPI is so that programs can have a uniform API across various Python versions. There's no real reason that someone stick on Python 2.7 or 3.3 shouldn't be able to include the future-style: import secrets Answer = secrets.token_bytes(42) On Jun 16, 2016 4:53 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
On 16 June 2016 at 13:09, Barry Warsaw <barry@python.org> wrote:
On Jun 16, 2016, at 01:01 PM, David Mertz wrote:
It seems to me that backporting 'secrets' and putting it on Warehouse would be a lot more productive than complaining about 3.5.2 reverting to (almost) the behavior of 2.3-3.4.
Very wise suggestion indeed. We have all kinds of stdlib modules backported and released as third party packages. Why not secrets too? If such were on PyPI, I'd happily package it up for the Debian ecosystem. Problem solved <wink>.
The secrets module is just a collection of one liners pulling together other stdlib components that have been around for years - the main problem it aims to address is one of discoverability (rather than one of code complexity), while also eliminating the "simulation is in the standard library, secrecy requires a third party module" discrepancy in the long term.
Once you're aware the problem exists, the easiest way to use it in a version independent manner is to just copy the relevant snippet into your own project's utility library - adding an entire new dependency to your project just for those utility functions would be overkill.
If you *do* add a dependency, you'd typically be better off with something more comprehensive and tailored to the particular problem domain you're dealing with, like passlib or cryptography or itsdangerous.
Cheers, Nick.
P.S. Having the secrets module available on PyPI wouldn't *hurt*, I just don't think it would help much.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx
On Jun 16, 2016, at 12:53 AM, Nathaniel Smith wrote:
We have introduced churn. Predicting a future SO question such as "Can os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the 3.5.x series, and yes possibly in Python 3.6 and beyond".
It also depends on the kernel version, since it will never block on old kernels that are missing getrandom(), but it might block on future kernels if Linux's /dev/urandom ever becomes blocking. (Ted's said that this is not going to happen now, but the only reason it isn't was that he tried to make the change and it broke some distros that are still in use -- so it seems entirely possible that it will happen a few years from now.)
Right; I noticed this and had it in my copious notes for my follow up but forgot to mention it. Thanks!
This is not an accurate docstring, though. The more accurate docstring for your proposed behavior would be:
[...]
You should never use this function. If you need unguessable random bytes, then the 'secrets' module is always a strictly better choice -- unlike this function, it always uses the best available source of cryptographic randomness, even on Linux. Alternatively, if you need random bytes but it doesn't matter whether other people can guess them, then the 'random' module is always a strictly better choice -- it will be faster, as well as providing useful features like deterministic seeding.
Note that I was talking about 3.5.x, where we don't have the secrets module. I'd quibble about the admonition about never using the function. It *can* be useful if the trade-offs are appropriate for your application (e.g. "almost always random enough, but maybe not though at least you won't block and you'll get back something"). Getting the words right is useful, but I agree that we should be strongly recommending crypto applications use the secrets module in Python 3.6.
In practice, your proposal means that ~all existing code that uses os.urandom becomes incorrect and should be switched to either secrets or random. This is *far* more churn for end-users than Nick's proposal.
I disagree. We have a clear upgrade path for end-users. If you're using os.urandom() in pre-3.5 and understand what you're getting (or not getting as the case may be), you will continue to get or not get exactly the same bits in 3.5.x (where x >= 2). No changes to your code are necessary. This is also the case in 3.6 but there you can do much better by porting your code to the new secrets module. Go do that!
...Anyway, since there's clearly going to be at least one PEP about this, maybe we should stop rehashing bits and pieces of the argument in these long threads that most people end up skipping and then rehashing again later?
Sure, I'll try. ;) Cheers, -Barry
On Jun 11, 2016 11:13 PM, "Theodore Ts'o" <tytso@mit.edu> wrote:
On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote:
It was a RaspberryPI that ran a shell script on boot that called ssh-keygen. That shell script could have just as easily been a Python script that called os.urandom via https://github.com/sybrenstuvel/python-rsa instead of a shell script that called ssh-keygen.
So I'm going to argue that the primary bug was in the how the systemd init scripts were configured. In generally, creating keypairs at boot time is just a bad idea. They should be created lazily, in a just-in-time paradigm.
Consider that if you assume that os.urandom can block, this isn't necessarily going to do the right thing either --- if you use getrandom and it blocks, and it's part of a systemd unit which is blocking futher boot progress, then the system will hang for 90 seconds, and while it's hanging, there won't be any interrupts, so the system will be dead in the water, just like the orignal bug report complaining that Python was hanging when it was using getrandom() to initialize its SipHash.
From another perspective, I guess one could also argue that the best place to fix this is in the kernel: if a process is blocked waiting for entropy
Hi Ted, then the kernel probably shouldn't take that its cue to turn off all the entropy generation mechanisms, just like how if a process is blocked waiting for disk I/O then we probably shouldn't power down the disk controller. Obviously this is a weird case because the kernel is architected in a way that makes the dependency between the disk controller and the I/O request obvious, while the dependency between the random pool and... well... everything else, more or less, is much more subtle and goes outside the usual channels, and we wouldn't want to rearchitect everything just for this. But for example, if a process is actively blocked waiting for the initial entropy, one could spawn a kernel thread that keeps the system from quiescing by attempting to scrounge up entropy as fast as possible, via whatever mechanisms are locally appropriate (e.g. doing a busy-loop racing two clocks against each other, or just scheduling lots of interrupts -- which I guess is the same thing, more or less). And the thread would go away again as soon as userspace wasn't blocked on entropy. That way this deadlock wouldn't be possible. I guess someone *might* complain about the idea of the entropy pool actually spending resources instead of being quietly parasitic, because this is the kernel and someone will always complain about everything :-). But complaining about this makes about much sense as complaining about the idea of spending resources trying to service I/O when a process is blocked on that ("maybe if we wait long enough then some other part of the system will just kind of accidentally page in the data we need as a side effect of whatever it's doing, and then this thread will be able to proceed"). Is this an approach that you've considered?
At which point there will be another bug complaining about how python was causing systemd to hang for 90 seconds, and there will be demand to make os.random no longer block. (Since by definition, systemd can do no wrong; it's always other programs that have to change to accomodate systemd. :-)
FWIW, the systemd thing is a red herring -- this was debian's configuration of a particular daemon that is not maintained by the systemd project, and the exact same thing would have happened with sysvinit if debian had tried using python 3.5 early in their rcS. -n
On Sun, Jun 12, 2016 at 11:07:22AM -0700, Nathaniel Smith wrote:
But for example, if a process is actively blocked waiting for the initial entropy, one could spawn a kernel thread that keeps the system from quiescing by attempting to scrounge up entropy as fast as possible, via whatever mechanisms are locally appropriate (e.g. doing a busy-loop racing two clocks against each other, or just scheduling lots of interrupts -- which I guess is the same thing, more or less).
There's a lot of snake oil, or at least, hand waving, that goes on with respect to what will actually work to gather randomness. One of the worst possible choices is a standard, kernel-defined workload that tries to just busy loop two clocks against each other. For one thing, on many embedded systems, all of your clocks are generated off of a single master oscillator anyway. And in early boot, it's not realistic for the kernel to be able to measure network interrupt timings and radio strength indicators from the WiFi, which ultimately is going to be much more likely to be unpredictable by an outside attacker sitting in Fort Meade than pretending that you can just "schedule lots of interrupts". Again, part of the problem here is that if you really want to be secure, it needs to be a full stack perspective, where the hardware designers, the OS developers, and the application level developers are all working together. If one side tries to exert a strong "somebody else's problem field", it's very likely the end solution isn't going to be secure. Because in many cases this is simply not practical, we all have to make assumptions at the OS and C-Python interpreter level, and hope that the assumptions that we make are are conservative enough.
Is this an approach that you've considered?
Ultimately, the arguments made by approaches such as Jitterbug are, to put it succiently and perhaps a little unfairly, "gee whillikers, the Intel L1/L2 cache hierarchy is really complicated and it's a closed hardware implementation so no one can understand it, and besides, the statistical analysis of the output looks good". To which I would say, "the first argument is an argument of security through ignorance", and "AES(NSA_KEY, COUNTER++)" also has really great statistical results, and if you don't know the NSA_KEY, it will look very strong and as far as we know, we wouldn't be able to distinguish it from truly secure random number generator --- but it really isn't secure. So yeah, I don't buy it. In order for it to be secure, we need to be grabbing measurements which can't be replicated or determined by a remote attacker. So having the kernel kick off a kernel thread is not going to be useful unless we can mix in entropy from the user, or the workload, or the local configuration, or from the local environment. (Using RSSI is helpful because the remote attacker might not know whether your mobile handset is in the knapsack under the table, or on the desk, and that will change the RSSI numbers.) Remember, the whole *point* of modern CPU designs is that the huge amounts of engineering effort is put into making the CPU be predictable, and so spawning a kernel thread in isolation isn't going perform magic in terms of getting guaranteed unpredictability.
FWIW, the systemd thing is a red herring -- this was debian's configuration of a particular daemon that is not maintained by the systemd project, and the exact same thing would have happened with sysvinit if debian had tried using python 3.5 early in their rcS.
It's not a daemon. It's the script in /lib/systemd/system-generators/systemd-crontab-generator, and it's needed because systemd subsumed the cron daemon, and developers who wanted to not break user's existing crontab files turned to it. I suppose you are technically correct that it is not mainained by systemd, but the need for it was generated out of systemd's lack of concern of backwards compatibility. Because FreeBSD and Mac OS are not using systemd, they are not likely to run into this problem. I will grant that if they decided to try to run a python script out of their /etc/rc script, they would run into the same problem. - Ted
Donald Stufft writes:
I guess one question would be, what does the secrets module do if it’s on a Linux that is too old to have getrandom(0), off the top of my head I can think of:
* Silently fall back to reading os.urandom and hope that it’s been seeded. * Fall back to os.urandom and hope that it’s been seeded and add a SecurityWarning or something like it to mention that it’s falling back to os.urandom and it may be getting predictable random from /dev/urandom. * Hard fail because it can’t guarantee secure cryptographic random.
I'm going to hide behind the Linux manpage (which actually suggests saving the data in a file to speed initialization at boot) in mentioning this: * if random_initialized_timestamp_pre_boot(): r = open("/dev/random", "rb") u = open("/dev/urandom", "wb") u.write(r.read(enough_bytes)) set_random_initialized_timestamp() # in theory, secrets can now use os.urandom
On 11 June 2016 at 22:46, Donald Stufft <donald@stufft.io> wrote:
I guess one question would be, what does the secrets module do if it’s on a Linux that is too old to have getrandom(0), off the top of my head I can think of:
* Silently fall back to reading os.urandom and hope that it’s been seeded. * Fall back to os.urandom and hope that it’s been seeded and add a SecurityWarning or something like it to mention that it’s falling back to os.urandom and it may be getting predictable random from /dev/urandom. * Hard fail because it can’t guarantee secure cryptographic random.
Of the three, I would probably suggest the second one, it doesn’t let the problem happen silently, but it still “works” (where it’s basically just hoping it’s being called late enough that /dev/urandom has been seeded), and people can convert it to the third case using the warnings module to turn the warning into an exception.
I have kept out of this discussion as I don't know enough about security to comment, but in this instance I think the answer is clear - there is no requirement for Python to protect the user against security bugs in the underlying OS (sure, it's nice if it can, but it's not necessary) so fallng back to os.urandom (with no warning) is fine. A warning, or even worse a hard fail, that 99.99% of the time should be ignored (because you're *not* writing a boot script) seems like a very bad idea. By all means document "if your OS provides no means of getting guaranteed secure randon mumbers (e.g., older versions of Linux very early in the boot sequence) then the secrets module cannot give you results that are any better than the OS provides". It seems self-evident to me that this would be the case, but I see no reason to object if the experts feel it's worth adding. Paul
On Sat, Jun 11, 2016 at 02:16:21PM -0700, Guido van Rossum wrote: [on the real-world consequences of degraded randomness from /dev/urandom]
Actually it's not clear to me at all that it could have happened to Python. (Wasn't it an embedded system?)
A Raspberry Pi. But don't people run Python on at least some embedded systems? The wiki thinks so: https://wiki.python.org/moin/EmbeddedPython And I thought that was the purpose of µPython.
Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True).
I’m fine if this lives in the secrets module— Steven asked for it to be an os function so that secrets.py could continue to be pure python.
The main thing that I want to avoid is that people start cargo-culting whatever the secrets module uses rather than just using the secrets module. Having it redundantly available as os.getrandom() is just begging for people to show off how much they know about writing secure code.
That makes sense. I'm happy for getrandom to be an implementation detail of secrets, but I'll need help with that part.
* If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux. [...]
But what is a Python script going to do with that error? IIUC this kind of error would only happen very early during boot time, and rarely, so the most likely outcome is a hard-to-debug mystery failure.
In my day job, I work for a Linux sys admin consulting company, and I can tell you from our experience that debugging a process that occasionally hangs mysteriously during boot is much harder than debugging a process that occasionally fails with an explicit error in the logs, especially if the error message is explicit about the cause: OSError: entropy pool has not been initialized yet At that point, you can take whatever action is appropriate for your script: - fail altogether, just as it might fail if it requires a writable file system and can't find one; - sleep for three seconds and try again; - log the error and proceed with degraded randomness or functionality; - change it so the script runs later in the boot process. -- Steve
On 06/11/2016 11:30 AM, Donald Stufft wrote:
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized.
Let me jump in tangentially to say: I think os.urandom(block=True) is simply a bad API. On FreeBSD and OpenBSD, /dev/urandom may block, and you don't have a choice. On OS X, /dev/urandom will never block, and you don't have a choice. In Victor's initial patch where he proposed it, the flag was accepted on all platforms but only affected its behavior on Linux and possibly Solaris. I think it's bad API design to have a flag that seems like it would be meaningful on multiple platforms, but in practice is useful only in very limited circumstances. If this were old code, or behavior we inherited from the platform and we were making the best of a bad situation, that'd be one thing. But this is a proposed new API and I definitely think we can do better. As I understand the proposed semantics for os.urandom(exception=True), I feel it falls into the same trap though not to the same degree. Of course, both flags break backwards-compatibility if they default to True, and I strongly disagree with . It's far better in my opinion to keep the os module as a thin shell over platform functionality. That makes Python's behavior more predictable on a platform-by-platform basis. So I think the best approach here is to add os.getrandom() as a thin shell over the local getrandom() (if any). //arry/
On Sat, Jun 11, 2016 at 12:53 PM, Larry Hastings <larry@hastings.org> wrote:
On 06/11/2016 11:30 AM, Donald Stufft wrote:
The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn’t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it’s on a platform that defines that as always returning secure or because it’s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized.
Let me jump in tangentially to say: I think os.urandom(block=True) is simply a bad API. On FreeBSD and OpenBSD, /dev/urandom may block, and you don't have a choice. On OS X, /dev/urandom will never block, and you don't have a choice. In Victor's initial patch where he proposed it, the flag was accepted on all platforms but only affected its behavior on Linux and possibly Solaris. I think it's bad API design to have a flag that seems like it would be meaningful on multiple platforms, but in practice is useful only in very limited circumstances. If this were old code, or behavior we inherited from the platform and we were making the best of a bad situation, that'd be one thing. But this is a proposed new API and I definitely think we can do better.
As I understand the proposed semantics for os.urandom(exception=True), I feel it falls into the same trap though not to the same degree.
Of course, both flags break backwards-compatibility if they default to True, and I strongly disagree with .
It's far better in my opinion to keep the os module as a thin shell over platform functionality. That makes Python's behavior more predictable on a platform-by-platform basis. So I think the best approach here is to add os.getrandom() as a thin shell over the local getrandom() (if any).
OK, the flags are unpopular, so let's forget about them. But I find an os.getrandom() that only exists on those (few?) platforms that support it a nuisance too -- this just encourages cargo cult code that's unnecessarily complicated and believed to be secure without anybody ever verifying. I'd like to consider what people freak out about. - You could freak out about blocking - You could freak out about getting slightly less random bits - You could freak out about supporting Python 3.5 and earlier - You could freak out about supporting all platforms You could also freak out about combinations of the above, but that gets complicated and you should probably consider that you're over-constraining matters. If you freak out about all at once (or both the first and the second bullet) you should consider a career change. If you don't freak out about any of these (meaning you're happy with Python 3.6+) you should use the secrets module. If you freak out about support for older Python versions, try the secrets module first and fall back to os.urandom() -- there really isn't any other choice. If you freak out about getting slightly less random bits you should probably do a complete security assessment of your entire stack and fix the OS and Python version, and use the best you can get for that combination. You may not want to rely on the standard library at all. If you freak out about blocking you're probably on a specific platform, and if that platform is Linux, you're in luck: use os.urandom() and avoid Python 3.5.0 and 3.5.1. On other platforms you're out of luck. So I still don't see why we need os.getrandom() -- it has nothing to recommend it over the secrets module (since both won't happen before 3.6). So what should the secrets module use? Let's make that part an extension module. -- --Guido van Rossum (python.org/~guido)
On Jun 11, 2016, at 4:48 PM, Guido van Rossum <guido@python.org> wrote:
But I find an os.getrandom() that only exists on those (few?) platforms that support it a nuisance too -- this just encourages cargo cult code that's unnecessarily complicated and believed to be secure without anybody ever verifying.
Well, new enough Linux has getrandom(0), OpenBSD has getentropy(), Solaris has getrandom(), Windows has CryptGenRandom which all make it possible (or it’s the only way to invoke it) to get cryptographically secure random bytes or block and no in-between. So it’d likely be possible to have os.getrandom() with blocking semantics and no FD on all of the most popular platforms we support. If we relax the no FD then FreeBSD and OS X also have /dev/random (or /dev/urandom it’s the same thing) which will ensure that you give cryptographically secure random bytes. — Donald Stufft
On Sat, Jun 11, 2016 at 2:16 PM, Donald Stufft <donald@stufft.io> wrote:
On Jun 11, 2016, at 4:48 PM, Guido van Rossum <guido@python.org> wrote:
But I find an os.getrandom() that only exists on those (few?) platforms that support it a nuisance too -- this just encourages cargo cult code that's unnecessarily complicated and believed to be secure without anybody ever verifying.
Well, new enough Linux has getrandom(0), OpenBSD has getentropy(), Solaris has getrandom(), Windows has CryptGenRandom which all make it possible (or it’s the only way to invoke it) to get cryptographically secure random bytes or block and no in-between. So it’d likely be possible to have os.getrandom() with blocking semantics and no FD on all of the most popular platforms we support.
If we relax the no FD then FreeBSD and OS X also have /dev/random (or /dev/urandom it’s the same thing) which will ensure that you give cryptographically secure random bytes.
OK, so we should implement the best we can do for the secrets module, and leave os.urandom() alone. I think the requirement that the secrets module remain pure Python has to be dropped. I'm not sure what it should do if even blocking can't give it sufficiently strong random bytes, but I care much less -- it's a new API and it doesn't resemble any OS function, so as long as it is documented it should be fine. An alternative would be to keep the secrets module linked to SystemRandom, and improve the latter. Its link with os.random() is AFAIK undocumented. Its API is clumsy but for code that needs some form of secret-ish bytes and requires platform and Python version independence it might be better than anything else. Then the secrets module is just what we recommend new users on Python 3.6. -- --Guido van Rossum (python.org/~guido)
[Guido]
... An alternative would be to keep the secrets module linked to SystemRandom, and improve the latter. Its link with os.random() is AFAIK undocumented. Its API is clumsy but for code that needs some form of secret-ish bytes and requires platform and Python version independence it might be better than anything else. Then the secrets module is just what we recommend new users on Python 3.6.
There's an issue currently open about this: http://bugs.python.org/issue27288 The docs for SystemRandom are very brief, so people may have actually noticed ;-) the first sentence: Class that uses the os.urandom() function for generating random numbers ... IOW, "uses os.urandom()" has been one of its only advertised qualities.
On 06/11/2016 01:48 PM, Guido van Rossum wrote:
So I still don't see why we need os.getrandom() -- it has nothing to recommend it over the secrets module (since both won't happen before 3.6).
I have two reasons, neither of which I think are necessarily all that persuasive. Don't consider this an argument--merely some observations. First, simply as a practical matter: the secrets module is currently pure Python. ISTM that the os module is where we put miscellaneous bits of os functionality; getrandom() definitely falls into that category. Rather than adding a new _secrets module or whatever it seemed easiest just to add it there. Second, I'd put this under the "consenting adults" rule. Clearly cryptography is a contentious subject with sharply differing opinions. There are many, many cryptography libraries available on PyPi; perhaps those libraries would like to use getrandom(), or /dev/urandom, or even getentropy(), in a way different than how secrets does it. My thinking is, the os module should provide platform support, the secrets module should be our codified best-practices, and we encourage everyone to use secrets. I'd go so far as to add that recommendation to the doc *and* the docstrings of os.urandom(), random.SystemRandom, and os.getrandom() (and os.getentropy()) if we add it. But by providing the OS functionality in a neutral way we allow external cryptographers to write what *they* view as best-practices code without wading into implementation detalis of secrets, or using ctypes, or whatnot. But like I said I don't have a strong opinion. As long as we're not adding mysterious flags to os.urandom() I'll probably sit the rest of this one out. //arry/
Fortunately, 3.6 feature freeze isn't until September, so we can all cool off and figure out the best way forward. I'm going on vacation for a week, and after sending this I'm going to mute the thread so I won't be pulled into it while I'm supposed to be relaxing. -- --Guido van Rossum (python.org/~guido)
On 6/11/2016 11:34 AM, Guido van Rossum wrote:
In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits
+100 ;-) I proposed exactly this 2 days ago, 5 hours after Larry's initial post. ''' I think the 'new API' should be a parameter, not a new function. With just two choices, 'wait' = True/False could work. If 'raise an exception' were added, then 'action (when good bits are not immediately available' = 'return (best possible)' or 'wait (until have good bits)' or 'raise (CryptBitsNotAvailable)' In either case, there would then be the question of whether the default should match 3.5.0/1 or 3.4 and before. ''' Deciding on this then might have saved some hurt feelings, to the point where two contributors feel like disappearing, and a release manager must feel the same. In any case, Guido already picked 3.4 behavior as the default. Can we agree and move on? -- Terry Jan Reedy
You can add me to the list of people who feel like disappearing. On Sat, Jun 11, 2016 at 10:28 AM, Terry Reedy <tjreedy@udel.edu> wrote:
On 6/11/2016 11:34 AM, Guido van Rossum wrote:
In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits
+100 ;-)
I proposed exactly this 2 days ago, 5 hours after Larry's initial post.
''' I think the 'new API' should be a parameter, not a new function. With just two choices, 'wait' = True/False could work. If 'raise an exception' were added, then 'action (when good bits are not immediately available' = 'return (best possible)' or 'wait (until have good bits)' or 'raise (CryptBitsNotAvailable)'
In either case, there would then be the question of whether the default should match 3.5.0/1 or 3.4 and before. '''
Deciding on this then might have saved some hurt feelings, to the point where two contributors feel like disappearing, and a release manager must feel the same. In any case, Guido already picked 3.4 behavior as the default. Can we agree and move on?
-- Terry Jan Reedy
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Nathaniel Smith <njs@pobox.com> wrote:
(This is based on the assumption that the only time that explicitly calling os.urandom is the best option is when one cares about the cryptographic strength of the result -- I'm explicitly distinguishing here between the hash seeding issue that triggered the original bug report and explicit calls to os.urandom.)
I disagree with that assumption. I've often found myself to use os.urandom for non-secure random data and seen it as the best option simply because it directly returns the type I wanted: bytes. The last time I looked the random module didn't have a function to directly give me bytes, so I would have to wrap it in something like: bytearray(random.getrandbits(8) for _ in range(size)) Or maybe the function exists, but then it doesn't seem very discoverable. Ideally I would only want to use the random module for non-secure and (in 3.6) the secrets module (which could block) for secure random data and never bother with os.urandom (and knowing how it behaves). But then those modules should probably get new functions to directly return bytes. Sebastian
[Sebastian Krause]
... Ideally I would only want to use the random module for non-secure and (in 3.6) the secrets module (which could block) for secure random data and never bother with os.urandom (and knowing how it behaves). But then those modules should probably get new functions to directly return bytes.
`secrets.token_bytes()` does just that, and other token_XXX() functions return bytes too but with different spellings (e.g., if you want, with the byte values represented as ASCII hex digits).. I believe everyone agrees token_bytes() will potentially block in 3.6 (along with all the other `secrets` facilities) on platforms supporting getrandom(). You're right that `random` doesn't expose such a function, and that the closest it gets is .getrandbits() (which returns a potentially giant int). So far, nobody has proposed adding new functions to `random`.
On Jun 09, 2016, at 03:22 PM, Larry Hastings wrote:
On 06/09/2016 08:52 AM, Guido van Rossum wrote:
That leaves direct calls to os.urandom(). I don't think this should > block either.
Then it's you and me against the rest of the world ;-)
FWIW, I agree with you and Guido. I'm also not opposed to adding a more direct exposure of getrandom(), but in Python 3.6 only. Like it or not, that's the right approach for our backward compatibility policies. Cheers, -Barry
On Thu, Jun 9, 2016 at 6:53 PM, Barry Warsaw <barry@python.org> wrote:
On Jun 09, 2016, at 03:22 PM, Larry Hastings wrote:
On 06/09/2016 08:52 AM, Guido van Rossum wrote:
That leaves direct calls to os.urandom(). I don't think this should > block either.
Then it's you and me against the rest of the world ;-)
FWIW, I agree with you and Guido. I'm also not opposed to adding a more direct exposure of getrandom(), but in Python 3.6 only. Like it or not, that's the right approach for our backward compatibility policies.
I suspect the crypto folks would be okay with pushing this back to 3.6, so long as the final resolution is that os.urandom remains the standard interface for, as the docstring says, "Return[ing] a string of n random bytes suitable for cryptographic use" using the OS-recommended method, and they don't have to go change all their code. After all, 3.4 and 2.7 will still have this subtle brokenness for some time. But I'm a little uncertain what you think would need to happen to satisfy the backwards compatibility policies. If we can change it in 3.6 without having a warning in 3.5, then presumably we can also change it in 3.5 without a warning in 3.4, which is what already happened accidentally :-). Would it be acceptable for 3.5.2 to start raising a warning "urandom returning non-random bytes -- in 3.6 this will be an error", and then make it an error in 3.6? (And it would probably be good even in the long run to issue a prominent warning if hash seeding fails.) -n -- Nathaniel J. Smith -- https://vorpus.org
On 06/09/2016 07:58 PM, Nathaniel Smith wrote:
I suspect the crypto folks would be okay with pushing this back to 3.6, so long as the final resolution is that os.urandom remains the standard interface for, as the docstring says, "Return[ing] a string of n random bytes suitable for cryptographic use" using the OS-recommended method, and they don't have to go change all their code.
The Linux core devs didn't like the behavior of /dev/urandom. But they couldn't change its behavior without breaking userspace code. Linux takes backwards-compatibility very seriously, so they left /dev/urandom exactly the way it was and added new functionality (the getrandom() system call) that had the semantics they felt were best. I don't understand why so many people seem to think it's okay to break old code in new versions of Python, when Python's history has shown a similarly strong commitment to backwards-compatibility. os.urandom() was added in Python 2.4, in 2004, and remained unchanged for about thirteen years. That's thirteen years of people calling it and assuming its semantics were identical to the local "urandom" man page, which was correct. I don't think we should change os.urandom() to block on Linux even in 3.6. Happily, that's no longer my fight, as I'm not 3.6 RM.
Would it be acceptable for 3.5.2 to start raising a warning "urandom returning non-random bytes -- in 3.6 this will be an error", and then make it an error in 3.6?
No. In 3.5.2 and the remaining 3.5 releases, os.urandom() must behave identically to how it behaved in 3.4 and the previous releases. //arry/
On Jun 9, 2016, at 11:11 PM, Larry Hastings <larry@hastings.org> wrote:
I don't understand why so many people seem to think it's okay to break old code in new versions of Python, when Python's history has shown a similarly strong commitment to backwards-compatibility.
Python *regularly* breaks compatibility in X.Y+1 releases, and does it on purpose. An example from Python 3.5 would be PEP 479. I think breaking compatibility is a good thing from time to time, as long as it’s not done so with wanton disregard and as long as the cost is carefully weighed against the benefits. One of the more frustrating aspects of trying to discuss security sensitive topics on python-dev is a feeling (at least from my end) that whenever someone wants to make something more secure [1] folks come in and try to anchor the discussion by treating backwards compatibility as some sort of sacred duty that can never be broken and the discussion ends up feeling (from the security side that I’m typically on) being try to justify the idea of ever breaking backwards compatibility, instead of weighing the cost/benefit of a particular change. On the flip side, when a different kind of change that breaks compatibility , say to make some behavior less confusing, gets brought up it feels like the discussion instead focuses on whether or not breaking compatibility is worth it in that particular instance. I’m perfectly happy to accept that Python has decided to make a trade off differently than what I would prefer it, but the rhetoric that is employed makes trying to improve Python’s security an extremely frustrating experience for myself and others [2]. Feeling like you have to litigate that it’s *ever* OK to break compatibility before you can even get to the point of discussing if it makes sense in any particular instance, while watching other kinds proposals not have to do that is a pretty disheartening experience. [1] Making code more secure pretty much by definition means taking some code that previously executed and making it so it no longer executes, ideally only in degenerate and dangerous conditions, but in general, that’s always the case. [2] I don’t want to name names, as they didn’t give me permission to do so, but these discussions have caused more than one person who tends to fall on the security side of things to consider avoiding contributing to Python at all, because of this kind of rhetoric. — Donald Stufft
On 2016-06-10 05:48, Donald Stufft wrote:
On Jun 9, 2016, at 11:11 PM, Larry Hastings <larry@hastings.org <mailto:larry@hastings.org>> wrote:
I don't understand why so many people seem to think it's okay to break old code in new versions of Python, when Python's history has shown a similarly strong commitment to backwards-compatibility.
Python *regularly* breaks compatibility in X.Y+1 releases, and does it on purpose. An example from Python 3.5 would be PEP 479. I think breaking compatibility is a good thing from time to time, as long as it’s not done so with wanton disregard and as long as the cost is carefully weighed against the benefits.
One of the more frustrating aspects of trying to discuss security sensitive topics on python-dev is a feeling (at least from my end) that whenever someone wants to make something more secure [1] folks come in and try to anchor the discussion by treating backwards compatibility as some sort of sacred duty that can never be broken and the discussion ends up feeling (from the security side that I’m typically on) being try to justify the idea of ever breaking backwards compatibility, instead of weighing the cost/benefit of a particular change. On the flip side, when a different kind of change that breaks compatibility , say to make some behavior less confusing, gets brought up it feels like the discussion instead focuses on whether or not breaking compatibility is worth it in that particular instance.
I’m perfectly happy to accept that Python has decided to make a trade off differently than what I would prefer it, but the rhetoric that is employed makes trying to improve Python’s security an extremely frustrating experience for myself and others [2]. Feeling like you have to litigate that it’s *ever* OK to break compatibility before you can even get to the point of discussing if it makes sense in any particular instance, while watching other kinds proposals not have to do that is a pretty disheartening experience.
[1] Making code more secure pretty much by definition means taking some code that previously executed and making it so it no longer executes, ideally only in degenerate and dangerous conditions, but in general, that’s always the case.
[2] I don’t want to name names, as they didn’t give me permission to do so, but these discussions have caused more than one person who tends to fall on the security side of things to consider avoiding contributing to Python at all, because of this kind of rhetoric.
Donald, feel free to name me. I'm mentally exhausted and frustrated by the discussions over the last days and weeks. As of now I'm considering to step down from PSRT and take a long break from Python core development. My frustration is mostly rooted in Dunning-Kruger effects. If you still think that a CSPRNG can run out of entropy or that it is a good idea to implement a crypto hash function in pure Python, then please go back to the children table and let the grown-ups talk. You are still struggling with basic addition and multiplication, while we discuss Laplace transformation for linear ODEs and consult experts, who do quantum fourier transformation to solve a hidden subgroup problem by converting it from finite Abelian groups to Shor's quantum algorithm [1]. Quoting Larry: "You must be this tall to ride the security train." </rant> I'm well aware that I'm not a trained and studied cryptographer. Cory and Donald repeatedly stated the same. However we are aware of our shortcomings, know our limits and constantly follow the advice of trusted experts. At least we combine enough experience to recognize bad ideas. Please, please don't add unnecessary noise to security discussions. os.urandom() is not about the concrete foundation of a bike shed. It's the f...reaking core catcher [2] of a nuclear power plant. You want to have a secure core catcher when the nuclear reactor goes BOOOM and spills hot molten, extremely radioactive Corium. Christian [1] Yes, that is a real thing. It will break all current asymmetric ciphers like RSA and EC. [2] https://en.wikipedia.org/wiki/Core_catcher
I somehow feel compelled to clarify that (perhaps unlike Larry) my concern is not the strict rules of backwards compatibility (if that was the case I would have objected to changing this in 3.5.2). I just don't like the potentially blocking behavior, and experts' opinions seem to widely vary on how insecure the fallback bits really are, how likely you are to find yourself in that situation, and how probable an exploit would be. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum <guido@python.org> wrote:
I just don't like the potentially blocking behavior, and experts' opinions seem to widely vary on how insecure the fallback bits really are, how likely you are to find yourself in that situation, and how probable an exploit would be.
This is not just a theoretical problem being discussed by security experts that *could* be exploited, there have already been multiple real-life cases of devices (mostly embedded Linux machines) generating predicatable SSH keys because they read from an uninitialized /dev/urandom at first boot. Most recently in the Raspbian distribution for the Raspberry Pi: https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892 At least in 3.6 there should be obvious way to get random data that *always* guarantees to be secure and either fails or blocks if it can't guarantee that. Sebastian
-----Original Message----- From: Python-Dev [mailto:python-dev-bounces+tritium- list=sdamon.com@python.org] On Behalf Of Sebastian Krause Sent: Friday, June 10, 2016 1:01 PM To: python-dev@python.org Subject: Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
Guido van Rossum <guido@python.org> wrote:
I just don't like the potentially blocking behavior, and experts' opinions seem to widely vary on how insecure the fallback bits really are, how likely you are to find yourself in that situation, and how probable an exploit would be.
This is not just a theoretical problem being discussed by security experts that *could* be exploited, there have already been multiple real-life cases of devices (mostly embedded Linux machines) generating predicatable SSH keys because they read from an uninitialized /dev/urandom at first boot. Most recently in the Raspbian distribution for the Raspberry Pi: https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892
At least in 3.6 there should be obvious way to get random data that *always* guarantees to be secure and either fails or blocks if it can't guarantee that.
Sebastian
And that should live in the secrets module.
This is fairly academic, since I do not anticipate needing to do this myself, but I have a specific question. I'll assume that Python 3.5.2 will go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux. Moreover, I understand that the case where the insecure bits might be returned are limited to Python scripts that run on system initialization on Linux. If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Jun 10, 2016, at 2:29 PM, David Mertz <mertz@gnosis.cx> wrote:
If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all.
Do you mean if os.urandom blocked and you wanted to call os.urandom from your boot script? Or if os.urandom doesn’t block and you wanted to ensure you got good random numbers on boot? — Donald Stufft
My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), and block rather than allow bad bits." I'm not quite sure I understand all of your question, Donald. On Python 3.4—and by BDFL declaration on 3.5.2—os.urandom() *will not* block, although it might on 3.5.1. On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft <donald@stufft.io> wrote:
On Jun 10, 2016, at 2:29 PM, David Mertz <mertz@gnosis.cx> wrote:
If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all.
Do you mean if os.urandom blocked and you wanted to call os.urandom from your boot script? Or if os.urandom doesn’t block and you wanted to ensure you got good random numbers on boot?
— Donald Stufft
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Ok, so you’re looking for how would you replicate the blocking behavior of os.urandom that exists in 3.5.0 and 3.5.1? In that case, it’s hard. I don’t think linux provides any way to externally determine if /dev/urandom has been initialized or not. Probably the easiest thing to do would be to interface with the getrandom() function using a c-ext, CFFI, or ctypes. If you’re looking for a way of doing this without calling the getrandom() function.. I believe the answer is you can’t. The closest thing you can get is checking the /proc/sys/kernel/random/entropy_avail file, but that tells you how much entropy the system currently thinks it has (which will go up and down over time) and corresponds to /dev/random on Linux not /dev/urandom. You could read from /dev/random, but that’s going to randomly block outside of the pool initialization whenever the kernel things it doesn’t have enough entropy. Cryptographers and security experts alike consider this to be pretty stupid behavior and don’t recommend using it because of this “randomly block throughout the use of your application” behavior. So really, out of the recommended solutions you really only have find a way to interface with the getrandom() function, or just consume /dev/urandom and hope it’s been initialized.
On Jun 10, 2016, at 2:43 PM, David Mertz <mertz@gnosis.cx> wrote:
My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), and block rather than allow bad bits."
I'm not quite sure I understand all of your question, Donald. On Python 3.4—and by BDFL declaration on 3.5.2—os.urandom() *will not* block, although it might on 3.5.1.
On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft <donald@stufft.io <mailto:donald@stufft.io>> wrote:
On Jun 10, 2016, at 2:29 PM, David Mertz <mertz@gnosis.cx <mailto:mertz@gnosis.cx>> wrote:
If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all.
Do you mean if os.urandom blocked and you wanted to call os.urandom from your boot script? Or if os.urandom doesn’t block and you wanted to ensure you got good random numbers on boot?
— Donald Stufft
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
— Donald Stufft
On Jun 10, 2016, at 2:55 PM, Donald Stufft <donald@stufft.io> wrote:
So really, out of the recommended solutions you really only have find a way to interface with the getrandom() function, or just consume /dev/urandom and hope it’s been initialized.
I’d note, this is one of the reasons why I felt like blocking (or raising an exception) on os.urandom was the right solution— because it’s hard to get that behavior on Linux otherwise. However, if we instead kept the blocking (or exception) behavior, getting the old behavior back on Linux is trivial, since it would only require open(“/dev/urandom”).read(…). — Donald Stufft
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6? It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that. On Fri, Jun 10, 2016 at 11:55 AM, Donald Stufft <donald@stufft.io> wrote:
Ok, so you’re looking for how would you replicate the blocking behavior of os.urandom that exists in 3.5.0 and 3.5.1?
In that case, it’s hard. I don’t think linux provides any way to externally determine if /dev/urandom has been initialized or not. Probably the easiest thing to do would be to interface with the getrandom() function using a c-ext, CFFI, or ctypes. If you’re looking for a way of doing this without calling the getrandom() function.. I believe the answer is you can’t.
The closest thing you can get is checking the /proc/sys/kernel/random/entropy_avail file, but that tells you how much entropy the system currently thinks it has (which will go up and down over time) and corresponds to /dev/random on Linux not /dev/urandom.
You could read from /dev/random, but that’s going to randomly block outside of the pool initialization whenever the kernel things it doesn’t have enough entropy. Cryptographers and security experts alike consider this to be pretty stupid behavior and don’t recommend using it because of this “randomly block throughout the use of your application” behavior.
So really, out of the recommended solutions you really only have find a way to interface with the getrandom() function, or just consume /dev/urandom and hope it’s been initialized.
On Jun 10, 2016, at 2:43 PM, David Mertz <mertz@gnosis.cx> wrote:
My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), and block rather than allow bad bits."
I'm not quite sure I understand all of your question, Donald. On Python 3.4—and by BDFL declaration on 3.5.2—os.urandom() *will not* block, although it might on 3.5.1.
On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft <donald@stufft.io> wrote:
On Jun 10, 2016, at 2:29 PM, David Mertz <mertz@gnosis.cx> wrote:
If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all.
Do you mean if os.urandom blocked and you wanted to call os.urandom from your boot script? Or if os.urandom doesn’t block and you wanted to ensure you got good random numbers on boot?
— Donald Stufft
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
— Donald Stufft
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
[David Mertz]
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
secrets.token_bytes() is already the way to spell "get a string of messed-up bytes", and that's the dead obvious (according to me) place to add the potentially blocking implementation. Indeed, everything in the `secrets` module should block when the OS thinks that's needed.
Tim Peters <tim.peters@gmail.com> wrote:
secrets.token_bytes() is already the way to spell "get a string of messed-up bytes", and that's the dead obvious (according to me) place to add the potentially blocking implementation.
I honestly didn't think that this was the dead obvious function to use. To me the naming kind of suggested that it would do some special magic that tokens needed, instead of just returning random bytes (even though the best token is probably just perfectly random data). If you want to provide a general function for secure random bytes I would suggest at least a better naming. Sebastian
[Tim]
secrets.token_bytes() is already the way to spell "get a string of messed-up bytes", and that's the dead obvious (according to me) place to add the potentially blocking implementation.
[Sebastian Krause]
I honestly didn't think that this was the dead obvious function to use. To me the naming kind of suggested that it would do some special magic that tokens needed, instead of just returning random bytes (even though the best token is probably just perfectly random data). If you want to provide a general function for secure random bytes I would suggest at least a better naming.
There was ample bikeshedding over the names of `secrets` functions at the time. If token_bytes wasn't the obvious function to you, I suspect you have scant idea what _is_ in the `secrets` module. The naming is logical in context, where various "token_xxx" functions supply random-ish bytes in different formats. In that context, xxx=bytes is the obvious way to get raw bytes.
On Jun 10, 2016, at 3:05 PM, David Mertz <mertz@gnosis.cx> wrote:
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
Well we have https://docs.python.org/dev/library/secrets.html#secrets.token_bytes <https://docs.python.org/dev/library/secrets.html#secrets.token_bytes> so adding a getrandom() function to secrets would largely be the same as that function. The problem of course is that the secrets library in 3.6 uses os.urandom under the covers, so it’s security rests on the security of os.urandom. To ensure that the secrets library is actually safe even in early boot it’ll need to stop using os.urandom on Linux and use the getrandom() function. That same library exposes random.SystemRandom as secrets.SystemRandom [1], and of course SystemRandom uses os.urandom too. So if we want people to treat secrets.SystemRandom as “always secure” then it would need to stop using os.urandom and start using the get random() function on Linux as well. [1] This is actually documented as "using the highest-quality sources provided by the operating system” in the secrets documentation, and I’d argue that it is not using the highest-quality source if it’s reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of course, it’s just an alias for random.SystemRandom, and that is documented as using os.urandom. — Donald Stufft
I believe that secrets.token_bytes() and secrets.SystemRandom() should be changed even for 3.5.1 to use getrandom() on Linux. Thanks for fixing my spelling of the secrets API, Donald. :-) On Fri, Jun 10, 2016 at 12:17 PM, Donald Stufft <donald@stufft.io> wrote:
On Jun 10, 2016, at 3:05 PM, David Mertz <mertz@gnosis.cx> wrote:
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
Well we have https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding a getrandom() function to secrets would largely be the same as that function.
The problem of course is that the secrets library in 3.6 uses os.urandom under the covers, so it’s security rests on the security of os.urandom. To ensure that the secrets library is actually safe even in early boot it’ll need to stop using os.urandom on Linux and use the getrandom() function.
That same library exposes random.SystemRandom as secrets.SystemRandom [1], and of course SystemRandom uses os.urandom too. So if we want people to treat secrets.SystemRandom as “always secure” then it would need to stop using os.urandom and start using the get random() function on Linux as well.
[1] This is actually documented as "using the highest-quality sources provided by the operating system” in the secrets documentation, and I’d argue that it is not using the highest-quality source if it’s reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of course, it’s just an alias for random.SystemRandom, and that is documented as using os.urandom.
— Donald Stufft
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Ooops.... thinko there! Of course `secrets` won't exist in 3.5.1, so that's a 3.6 matter instead. On Fri, Jun 10, 2016 at 12:29 PM, David Mertz <mertz@gnosis.cx> wrote:
I believe that secrets.token_bytes() and secrets.SystemRandom() should be changed even for 3.5.1 to use getrandom() on Linux.
Thanks for fixing my spelling of the secrets API, Donald. :-)
On Fri, Jun 10, 2016 at 12:17 PM, Donald Stufft <donald@stufft.io> wrote:
On Jun 10, 2016, at 3:05 PM, David Mertz <mertz@gnosis.cx> wrote:
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
Well we have https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding a getrandom() function to secrets would largely be the same as that function.
The problem of course is that the secrets library in 3.6 uses os.urandom under the covers, so it’s security rests on the security of os.urandom. To ensure that the secrets library is actually safe even in early boot it’ll need to stop using os.urandom on Linux and use the getrandom() function.
That same library exposes random.SystemRandom as secrets.SystemRandom [1], and of course SystemRandom uses os.urandom too. So if we want people to treat secrets.SystemRandom as “always secure” then it would need to stop using os.urandom and start using the get random() function on Linux as well.
[1] This is actually documented as "using the highest-quality sources provided by the operating system” in the secrets documentation, and I’d argue that it is not using the highest-quality source if it’s reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of course, it’s just an alias for random.SystemRandom, and that is documented as using os.urandom.
— Donald Stufft
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Fri, Jun 10, 2016 at 12:55 PM, Larry Hastings <larry@hastings.org> wrote:
On 06/10/2016 12:29 PM, David Mertz wrote:
I believe that secrets.token_bytes() and secrets.SystemRandom() should be changed even for 3.5.1 to use getrandom() on Linux.
Surely you meant 3.5.2? 3.5.1 shipped last December.
Yeah, that combines a couple thinkos even. I had intended to write "for 3.5.2" ... but that is also an error, since the secrets module doesn't exist until 3.6. So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(), and the NEW APIs in secrets should use the "best available randomness (even if it blocks)" Donald is correct that we have the spelling secrets.token_bytes() available in 3.6a1, so the spellings secrets.getrandom() or secrets.randbytes() are not needed. However, Sebastian's (adapted) suggestion to allow secrets.token_bytes(k, *, nonblock=False) as the signature makes sense to me (i.e. it's a choice of "block or raise exception", not an option to get non-crypto bytes). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On 06/10/2016 01:01 PM, David Mertz wrote:
So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(),
That makes... five of us I think ;-) (Larry Guido Barry Tim David)
and the NEW APIs in secrets should use the "best available randomness (even if it blocks)"
I'm not particular about how the new API is spelled. However, I do think os.getrandom() should be exposed as a thin wrapper over getrandom() in 3.6. That would permit Python programmers to take maximal advantage of the features offered by their platform. It would also permit the secrets module to continue to be written in pure Python. //arry/
On Fri, Jun 10, 2016 at 01:06:45PM -0700, Larry Hastings wrote:
On 06/10/2016 01:01 PM, David Mertz wrote:
So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(),
That makes... five of us I think ;-) (Larry Guido Barry Tim David)
and the NEW APIs in secrets should use the "best available randomness (even if it blocks)"
I'm not particular about how the new API is spelled. However, I do think os.getrandom() should be exposed as a thin wrapper over getrandom() in 3.6. That would permit Python programmers to take maximal advantage of the features offered by their platform. It would also permit the secrets module to continue to be written in pure Python.
A big +1 for that. Will there be platforms where os.getrandom doesn't exist? If not, then secrets can just rely on it, otherwise what should it do? if hasattr(os, 'getrandom'): return os.getrandom(n) else: # Fail? Fall back on os.urandom? -- Steve
On 06/11/2016 12:49 AM, Steven D'Aprano wrote:
Will there be platforms where os.getrandom doesn't exist? If not, then secrets can just rely on it, otherwise what should it do?
if hasattr(os, 'getrandom'): return os.getrandom(n) else: # Fail? Fall back on os.urandom?
AFAIK: * Only Linux and Solaris have getrandom() right now. IIUC Solaris duplicated Linux's API, but I don't know that for certain, and I don't know in particular what GRND_RANDOM does on Solaris. (Of course, you don't need GRND_RANDOM for secrets.token_bytes().) * Only Linux and OS X have never-blocking /dev/urandom. On Linux, you can choose to block by calling getrandom(). On OS X you have no choice, you can only use the never-blocking /dev/urandom. (OS X also has a /dev/random but it behaves identically to /dev/urandom.) OS X's man page reassuringly claims blocking is never necessary; the blogosphere disagrees. If I were writing the function for the secrets module, I'd write it like you have above: call os.getrandom() if it's present, and os.urandom() if it isn't. I believe that achieves current-best-practice everywhere: it does the right thing on Linux, it does the right thing on Solaris, it does the right thing on all the other OSes where reading from /dev/urandom can block, and it uses the only facility available to us on OS X. //arry/
On 11 Jun 2016, at 09:24, Larry Hastings <larry@hastings.org> wrote: Only Linux and OS X have never-blocking /dev/urandom. On Linux, you can choose to block by calling getrandom(). On OS X you have no choice, you can only use the never-blocking /dev/urandom. (OS X also has a /dev/random but it behaves identically to /dev/urandom.) OS X's man page reassuringly claims blocking is never necessary; the blogosphere disagrees. If I were writing the function for the secrets module, I'd write it like you have above: call os.getrandom() if it's present, and os.urandom() if it isn't. I believe that achieves current-best-practice everywhere: it does the right thing on Linux, it does the right thing on Solaris, it does the right thing on all the other OSes where reading from /dev/urandom can block, and it uses the only facility available to us on OS X.
Sorry Larry, but as far as I know this is misleading (it’s not *wrong*, but it suggests that OS X’s /dev/urandom is the same as Linux’s, which is emphatically not true). I’ve found the discussion around OS X’s random devices to be weirdly abstract, given that the source code for it is public, so I went and took a look. My initial reading of it (and, to be clear, this is a high-level read of a codebase I don’t know well, so please take this with the grain of salt that is intended) is that the operating system literally will not boot without at least 128 bits of entropy to read from the EFI boot loader. In the absence of 128 bits of entropy the kernel will panic, rather than continue to boot. Generally speaking that entropy will come from RDRAND, given the restrictions on where OS X can be run (Intel CPUs for real OS X, virtualised on top of OS X, and so on top of Intel CPUs, for VMs), which imposes a baseline on the quality of the entropy you can get. Assuming that OS X is being run in a manner that is acceptable from the perspective of its license agreement (and we can all agree that no-one would violate the terms of OS X’s license agreement, right?), I think it’s reasonable to assume that OS X, either virtualised or not, is getting 128 bits of somewhat sensible entropy from the boot loader/CPU before it boots. That means we can say this about OS X’s /dev/urandom: the reason it never blocks is because the situation of “not enough entropy to generate good random numbers” is synonymous with “not enough entropy to boot the OS”. So maybe we can stop casting aspersions on OS X’s RNG now. Cory
On Fri, 10 Jun 2016 at 12:20 Donald Stufft <donald@stufft.io> wrote:
On Jun 10, 2016, at 3:05 PM, David Mertz <mertz@gnosis.cx> wrote:
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
Well we have https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding a getrandom() function to secrets would largely be the same as that function.
The problem of course is that the secrets library in 3.6 uses os.urandom under the covers, so it’s security rests on the security of os.urandom. To ensure that the secrets library is actually safe even in early boot it’ll need to stop using os.urandom on Linux and use the getrandom() function.
That same library exposes random.SystemRandom as secrets.SystemRandom [1], and of course SystemRandom uses os.urandom too. So if we want people to treat secrets.SystemRandom as “always secure” then it would need to stop using os.urandom and start using the get random() function on Linux as well.
[1] This is actually documented as "using the highest-quality sources provided by the operating system” in the secrets documentation, and I’d argue that it is not using the highest-quality source if it’s reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of course, it’s just an alias for random.SystemRandom, and that is documented as using os.urandom.
If that's the case then we should file a bug so we are sure this is the case and we need to decouple the secrets documentation from random so that they can operate independently with secrets always doing whatever is required to be as secure as possible.
On Jun 10, 2016, at 3:33 PM, Brett Cannon <brett@python.org> wrote:
If that's the case then we should file a bug so we are sure this is the case and we need to decouple the secrets documentation from random so that they can operate independently with secrets always doing whatever is required to be as secure as possible.
https://bugs.python.org/issue27288 <https://bugs.python.org/issue27288> — Donald Stufft
On 10.06.2016 21:17, Donald Stufft wrote:
On Jun 10, 2016, at 3:05 PM, David Mertz <mertz@gnosis.cx <mailto:mertz@gnosis.cx>> wrote:
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
I am not a security expert but your reply makes it clear to me. So, for me this makes: os -> os-dependent and because of this varies from os to os (also quality-wise) random -> pseudo-random, but it works for most non-critical use-cases secret -> that's for crypto If don't need crypto, secret would be a waste of resources, but if you need crypto, then os and random are unsafe. I think that's simple enough. At least, I would understand it. Just my 2 cents: if I need crypto, I would pay the price of blocking rather then to get an exception (what are my alternatives? I need those bits! ) or get unsecure bits. Sven
Well we have https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding a getrandom() function to secrets would largely be the same as that function.
The problem of course is that the secrets library in 3.6 uses os.urandom under the covers, so it’s security rests on the security of os.urandom. To ensure that the secrets library is actually safe even in early boot it’ll need to stop using os.urandom on Linux and use the getrandom() function.
That same library exposes random.SystemRandom as secrets.SystemRandom [1], and of course SystemRandom uses os.urandom too. So if we want people to treat secrets.SystemRandom as “always secure” then it would need to stop using os.urandom and start using the get random() function on Linux as well.
[1] This is actually documented as "using the highest-quality sources provided by the operating system” in the secrets documentation, and I’d argue that it is not using the highest-quality source if it’s reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of course, it’s just an alias for random.SystemRandom, and that is documented as using os.urandom.
— Donald Stufft
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de
David Mertz <mertz@gnosis.cx> wrote:
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
Since there already is a secrets.randbits(k), I would keep the naming similar and suggest something like: secrets.randbytes(k, *, nonblock=False) With the argument "nonblock" you can control what happens when not enough entropy is available: It either blocks or (if nonblock=True) raises an exception. The third option, getting unsecure random data, is simply not available in this function. Then you can keep os.urandom() as it was in Python 3.4 and earlier, but update the documentation to better warn about its behavior and point developers to the secrets module. Sebastian
On Jun 10, 2016, at 12:05 PM, David Mertz wrote:
OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6?
I would.
It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that.
ISTM that secrets is a somewhat higher level API while it makes sense that a fairly simple plumbing of the underlying C call should go in os. But I wouldn't argue much if folks had strong opinions to the contrary. Cheers, -Barry
On 06/10/2016 11:55 AM, Donald Stufft wrote:
Ok, so you’re looking for how would you replicate the blocking behavior of os.urandom that exists in 3.5.0 and 3.5.1?
In that case, it’s hard. I don’t think linux provides any way to externally determine if /dev/urandom has been initialized or not. Probably the easiest thing to do would be to interface with the getrandom() function using a c-ext, CFFI, or ctypes. If you’re looking for a way of doing this without calling the getrandom() function.. I believe the answer is you can’t.
I'm certain you're correct: you can't perform any operation on /dev/urandom to determine whether or not the urandom device has been initialized. That's one of the reasons why Mr. Ts'o added getrandom()--you can use it to test exactly that (getrandom(GRND_NONBLOCK)). That's also why I proposed adding os.getrandom() in 3.5.2, to make it possible to block until urandom was initialized (without using ctypes etc as you suggest). However, none of the cryptography guys jumped up and said they wanted it, and in any case it was overruled by Guido, so we're not adding it to 3.5.2. //arry/
On 10.06.2016 20:55, Donald Stufft wrote:
Ok, so you’re looking for how would you replicate the blocking behavior of os.urandom that exists in 3.5.0 and 3.5.1?
In that case, it’s hard. I don’t think linux provides any way to externally determine if /dev/urandom has been initialized or not. Probably the easiest thing to do would be to interface with the getrandom() function using a c-ext, CFFI, or ctypes. If you’re looking for a way of doing this without calling the getrandom() function.. I believe the answer is you can’t.
Well, you can see the effect by running Python early in the boot process. See e.g. http://bugs.python.org/issue26839#msg267749 and if you look at the system log file, you'll find a notice entry "random: %s pool is initialized" which gets written once the pool is initialized: http://lxr.free-electrons.com/source/drivers/char/random.c#L684 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 10 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On Fri, Jun 10, 2016 at 11:29 AM, David Mertz <mertz@gnosis.cx> wrote:
This is fairly academic, since I do not anticipate needing to do this myself, but I have a specific question. I'll assume that Python 3.5.2 will go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux. Moreover, I understand that the case where the insecure bits might be returned are limited to Python scripts that run on system initialization on Linux.
If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all.
Good question. And going back to Larry's original e-mail, where he said-- On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings <larry@hastings.org> wrote:
THE PROBLEM ... The issue author had already identified the cause: CPython was blocking on getrandom() in order to initialize hash randomization. On this fresh virtual machine the entropy pool started out uninitialized. And since the only thing running on the machine was CPython, and since CPython was blocked on initialization, the entropy pool was initializing very, very slowly.
it seems to me that you'd want such a solution to have code that causes the initialization of the entropy pool to be sped up so that it happens as quickly as possible (if that is even possible). Is it possible? (E.g. by causing the machine to start doing things other than just CPython?) --Chris
On Fri, Jun 10, 2016 at 11:42:40AM -0700, Chris Jerdonek wrote:
And going back to Larry's original e-mail, where he said--
On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings <larry@hastings.org> wrote:
THE PROBLEM ... The issue author had already identified the cause: CPython was blocking on getrandom() in order to initialize hash randomization. On this fresh virtual machine the entropy pool started out uninitialized. And since the only thing running on the machine was CPython, and since CPython was blocked on initialization, the entropy pool was initializing very, very slowly.
it seems to me that you'd want such a solution to have code that causes the initialization of the entropy pool to be sped up so that it happens as quickly as possible (if that is even possible). Is it possible? (E.g. by causing the machine to start doing things other than just CPython?)
I don't think that's something which the Python interpreter ought to do for you, but you can write to /dev/urandom or /dev/random (both keep their own, separate, entropy pools): open("/dev/urandom", "w").write("hello world") But of course there's the question of where you're going to get a source of noise to write to the file. While it's (probably?) harmless to write a hard-coded string to it, I don't think its going to give you much entropy. -- Steve
Steven D'Aprano <steve@pearwood.info> wrote:
it seems to me that you'd want such a solution to have code that causes the initialization of the entropy pool to be sped up so that it happens as quickly as possible (if that is even possible). Is it possible? (E.g. by causing the machine to start doing things other than just CPython?)
I don't think that's something which the Python interpreter ought to do for you, but you can write to /dev/urandom or /dev/random (both keep their own, separate, entropy pools):
There are projects like http://www.issihosts.com/haveged/ that use some tiny timing fluctuations in CPUs to feed the entropy pool and which are available in most Linux distributions. But as you said, that is something completely outside of Python's scope.
This is related to David Mertz's request for backward compatible initialization, not to the bdfl decision. Steven D'Aprano writes:
I don't think that's something which the Python interpreter ought to do for you, but you can write to /dev/urandom or /dev/random (both keep their own, separate, entropy pools):
open("/dev/urandom", "w").write("hello world")
This fails for unprivileged users on Mac. I'm not sure what happens on Linux; it appears to succeed, but the result wasn't what I expected. Also, when entropy gets low, it's not clear how additional entropy is allocated between the /dev/random and /dev/urandom pools.
But of course there's the question of where you're going to get a source of noise to write to the file. While it's (probably?) harmless to write a hard-coded string to it, I don't think its going to give you much entropy.
Use a Raspberry-Pi, or other advanced expensive<wink/> hardware. There's no real excuse for not having a hardware generator if the Pi has one! I would guess you can probably get something with a USB interface for $20 or so. http://scruss.com/blog/2013/06/07/well-that-was-unexpected-the-raspberry-pis...
On 06/11/2016 05:16 PM, Stephen J. Turnbull wrote:
Use a Raspberry-Pi, or other advanced expensive<wink/> hardware. There's no real excuse for not having a hardware generator if the Pi has one!
Intel CPUs added the RDRAND instruction as of Ivy Bridge, although there's an ongoing debate as to whether or not it's a suitable source of entropy to use for seeding urandom. https://en.wikipedia.org/wiki/RdRand#Reception Wikipedia goes on to describe the very-new RDSEED instruction which might be more suitable. //arry/
On Jun 11, 2016, at 8:16 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
This fails for unprivileged users on Mac. I'm not sure what happens on Linux; it appears to succeed, but the result wasn't what I expected.
I think that on Linux it will mix in whatever you write into the entropy, but it won’t increase the entropy counter for it. — Donald Stufft
On 2016-06-10 20:42, Chris Jerdonek wrote:
On Fri, Jun 10, 2016 at 11:29 AM, David Mertz <mertz@gnosis.cx> wrote:
This is fairly academic, since I do not anticipate needing to do this myself, but I have a specific question. I'll assume that Python 3.5.2 will go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux. Moreover, I understand that the case where the insecure bits might be returned are limited to Python scripts that run on system initialization on Linux.
If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all.
Good question. And going back to Larry's original e-mail, where he said--
On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings <larry@hastings.org> wrote:
THE PROBLEM ... The issue author had already identified the cause: CPython was blocking on getrandom() in order to initialize hash randomization. On this fresh virtual machine the entropy pool started out uninitialized. And since the only thing running on the machine was CPython, and since CPython was blocked on initialization, the entropy pool was initializing very, very slowly.
I repeat for like the fifth time: os.urandom() and Python startup are totally unrelated. They just happen to use the same internal function to set the hash randomization state. The startup problem can be solved without f... up the security properties of os.urandom(). The correct questions to ask are: 1) Does hash randomization for bytes, text and XML always require cryptographically strong random values from a potentially blocking CPRNG? 2) Does the initial state of the Mersenne-Twister of the default random.Random instance really need cryptographically strong values? 3) Should os.urandom() always use the best CSPRNG source available and make sure it never returns weak, predictable values (when possible)? The answers are: 1) No 2) No 3) HELL YES! If you think that the answer to 3 is "No" and that a CSPRNG is permitted to return predictable values, then you are *by definition* ineligible to vote on security issues. Christian
On 2016-06-11 16:37, Victor Stinner wrote:
I repeat for like the fifth time:
So, is there a candidate to write a PEP?
I didn't read the thread. As expected, the discussion restarted for the 3rd time, there are almost 100 emails in this thread.
Sorry, I'm out. I simply lack the necessary strength and mental energy to persuade the issue any further. Donald Stufft just forwarded a quote that resonates with my current state of mind (replace 'lists' with 'current topic'): "I feel I no longer possess either the necessary strength or perhaps the necessary faith to continue rolling the stone of Sisyphus against the forces of reaction which are triumphing everywhere. I am therefore retiring from the lists, and ask if my dear contemporaries only one thing — oblivion." Christian
I will observe that feelings have gotten a little heated, so without making any suggestions to how the python-dev community should decide things, let me offer some observations that might perhaps shed a little light, and perhaps dispell a little bit of the heat. As someone who has been working in security for a long time --- before I started getting paid to hack Linux full-time, worked on Kerberos, was on the Security Area Directorate of the IETF, where among other things I was one of the working group chairs for the IP Security (ipsec) working group --- I tend to cringe a bit when people talk about security in terms of absolutes. For example, the phrase "improving Python's security". Security is something that is best talked about given a specific threat environment, where the value of what you are trying to protect, the capabilities and resources of the attackers, etc., are all well known. This gets hard for those of us who work on infrastructure which can get used in many different arenas, and so that's something that applies both to the Linux Kernel and to C-Python, because how people will use the tools that we spend so much of our passion crafting is largely out of our control, and we may not even know how they are using it. As far as /dev/urandom is concerned, it's true that it doesn't block before it has been initialized. If you are a security academic who likes to write papers about how great you are at finding defects in other people's work. This is definitely a weakness. Is it a fatal weakness? Well, first of all, on most server and desktop deployments, we save 1 kilobyte or so of /dev/urandom output during the shutdown sequence, and immediately after the init scripts are completed. This saved entropy is then piped back into /dev/random infrastructure and used initialized /dev/random and /dev/urandom very early in the init scripts. On a freshly instaled machine, this won't help, true, but in practice, on most systems, /dev/urandom will get initialized from interrupt timing sampling within a few seconds after boot. For example, on a sample Google Compute Engine VM which is booted into Debian and then left idle, /dev/urandom was initialized within 2.8 seconds after boot, while the root file system was remounted read-only 1.6 seconds after boot. So even on Python pre-3.5.0, realistically speaking, the "weakness" of os.random would only be an issue (a) if it is run within the first few seconds of boot, and (b) os.random is used to directly generate a long-term cryptographic secret. If you are fork openssl or ssh-keygen to generate a public/private keypair, then you aren't using os.random. Furthermore, if you are running on a modern x86 system with RDRAND, you'll also be fine, because we mix in randomness from the CPU chip via the RDRAND instruction. So this whole question of whether os.random should block *is* important in certain very specific cases, and if you are generating long-term cryptogaphic secrets in Python, maybe you should be worrying about that. But to be honest, there are lots of other things you should be worrying about as well, and I would hope that people writing cryptographic code would be asking questions of how the random nunmber stack is working, not just at the C-Python interpretor level, but also at the OS level. My preference would be that os.random should block, because the odds that people would be trying to generate long-term cryptographic secrets within seconds after boot is very small, and if you *do* block for a second or two, it's not the end of the world. The problem that triggered this was specifically because systemd was trying to use C-Python very early in the boot process to initialize the SIPHASH used for the dictionary, and it's not clear that really needed to be extremely strong because it wasn't a long-term cryptogaphic secret --- certainly not how systemd was using that specific script! The reason why I think blocking is better is that once you've solved the "don't hang the VM for 90 seconds until python has started up", someone who is using os.random will almost certainly not be on the blocking path of the system boot sequence, and so blocking for 2 seconds before generating a long-term cryptographic secret is not the end of the world. And if it does block by accident, in a security critical scenario it will hopefully force the progammer to think, and and in a non-security critical scenario, it should be easy to switch to either a totally non-blocking interface, or switch to a pseudo-random interface hwich is more efficient. *HOWEVER*, on the flip side, if os.random *doesn't* block, in 99.999% percent of the cases, the python script that is directly generating a long-term secret will not be started 1.2 seconds after the root file system is remounted read/write, so it is *also* not the end of the world. Realistically speaking, we do know which processes are likely to be generating long-term cryptographic secrets imnmediately after boot, and they'll most likely be using progams like openssl or openssh-keygen, to actually generate the cryptogaphic key, and in both of those places, (a) it's there problem to get it right, and (b) blocking for two seconds is a completely reasonable thing to do, and they will probably do it, so we're fine. So either way, I think it will be fine. I may have a preference, but if Python choses another path, all will be well. There is an old saying that Academic politics are often so passionate because the stakes are so small. It may be that one of the reasons why this topic has been so passionate is precisely because of Sayre's Law. Peace, - Ted
On 06/10/2016 12:54 PM, Theodore Ts'o wrote:
So even on Python pre-3.5.0, realistically speaking, the "weakness" of os.random would only be an issue (a) if it is run within the first few seconds of boot, and (b) os.random is used to directly generate a long-term cryptographic secret. If you are fork openssl or ssh-keygen to generate a public/private keypair, then you aren't using os.random.
Just a gentle correction: wherever Mr. Ts'o says "os.random", he means "os.urandom()". We don't have an "os.random" in Python. My thanks to today's celebrity guest correspondent, Mr. Theodore Ts'o! //arry/
On Fri, Jun 10, 2016, at 15:54, Theodore Ts'o wrote:
So even on Python pre-3.5.0, realistically speaking, the "weakness" of os.random would only be an issue (a) if it is run within the first few seconds of boot, and (b) os.random is used to directly generate a long-term cryptographic secret. If you are fork openssl or ssh-keygen to generate a public/private keypair, then you aren't using os.random.
So, I have a question. If this "weakness" in /dev/urandom is so unimportant to 99% of situations... why isn't there a flag that can be passed to getrandom() to allow the same behavior?
[Random832]
So, I have a question. If this "weakness" in /dev/urandom is so unimportant to 99% of situations... why isn't there a flag that can be passed to getrandom() to allow the same behavior?
Isn't that precisely the purpose of the GRND_NONBLOCK flag? http://man7.org/linux/man-pages/man2/getrandom.2.html
On Jun 10, 2016, at 5:21 PM, Tim Peters <tim.peters@gmail.com> wrote:
Isn't that precisely the purpose of the GRND_NONBLOCK flag?
It doesn’t behave exactly the same as /dev/urandom. If the pool hasn’t been initialized yet /dev/urandom will return possibly predictable data whereas getrandom(GRND_NONBLOCK) will EAGAIN. — Donald Stufft
On Fri, Jun 10, 2016 at 05:14:50PM -0400, Random832 wrote:
On Fri, Jun 10, 2016, at 15:54, Theodore Ts'o wrote:
So even on Python pre-3.5.0, realistically speaking, the "weakness" of os.random would only be an issue (a) if it is run within the first few seconds of boot, and (b) os.random is used to directly generate a long-term cryptographic secret. If you are fork openssl or ssh-keygen to generate a public/private keypair, then you aren't using os.random.
So, I have a question. If this "weakness" in /dev/urandom is so unimportant to 99% of situations... why isn't there a flag that can be passed to getrandom() to allow the same behavior?
The intention behind getrandom() is that it is intended *only* for cryptographic purposes. For that use case, there's no point having a "return a potentially unseeded cryptographic secret" option. This makes this much like FreeBSD's /dev/random and getentropy system calls. (BTW, I've seen an assertion on this thread that FreeBSD's getentropy(2) never blocks. As far as I know, this is **not** true. FreeBSD's getentropy(2) works like its /dev/random device, in that if it is not fully seeded, it will block. The only reason why OpenBSD's getentropy(2) and /dev/random devices will never block is because they only support architectures where they can make sure that entropy is passed from a previous boot session to the next, given specialized bootloader support. Linux can't do this because we support a very large number of bootloaders, and the bootloaders are not under the kernel developers' control. Fundamentally, you can't guarantee both (a) that your RNG will never block, and (b) will always be of high cryptographic quality, in a completely general sense. You can if you make caveats about your hardware or when the code runs, but that's fundamentally the problem with the documentation of os.urandom(); it's making promises which can't be true 100% of the time, for all hardware, operating environments, etc.) Anyway, if you don't need cryptographic guarantees, you don't need getrandom(2) or getentropy(2); something like this will do just fine: long getrand() { static int initialized = 0; struct timeval tv; if (!initialized) { gettimeofday(&tv, NULL); srandom(tv.tv_sec ^ tv.tv_usec ^ getpid()); initialized++; } return random(); } So this is why I did what I did. If Python decides to go down this same path, you could define a new interface ala getrandom(2), which is specifically designed for cryptogaphic purposes, and perhaps a new, more efficient interface for those people who don't need cryptogaphic guarantees --- and then keep the behavior of os.urandom consistent with Python 3.4, but update the documentation to reflect the reality. Alternatively, you could keep the implementation of os.urandom consistent with Python 3.5, and then document that under some circumstances, it will block. Both approaches have certain tradeoffs, but it's not going to be the end of the world regardless of which way you decide to go. I'd suggest that you use your existing mechanisms to decide on which approach is more Pythony, and then make sure you communicate and over-communicate it to your user/developer base. And then --- relax. It may seem like a big deal today, but in a year or so people will have gotten used to whatever interface or documentation changes you decide to make, and it will be all fine. As Dame Julian of Norwich once said, "All shall be well, and all shall be well, and all manner of things shall be well." Cheers, - Ted
On Sat, Jun 11, 2016, at 22:37, Theodore Ts'o wrote:
On Fri, Jun 10, 2016 at 05:14:50PM -0400, Random832 wrote:
So, I have a question. If this "weakness" in /dev/urandom is so unimportant to 99% of situations... why isn't there a flag that can be passed to getrandom() to allow the same behavior?
The intention behind getrandom() is that it is intended *only* for cryptographic purposes.
I'm somewhat confused now because if that's the case it seems to accomplish multiple unrelated things. Why was this implemented as a system call rather than a device (or an ioctl on the existing ones)? If there's a benefit in not going through the non-atomic (and possibly resource limited) procedure of acquiring a file descriptor, reading from it, and closing it, why is that benefit not also extended to non-cryptographic users of urandom via allowing the system call to be used in that way?
Anyway, if you don't need cryptographic guarantees, you don't need getrandom(2) or getentropy(2); something like this will do just fine:
Then what's /dev/urandom *for*, anyway?
On Sun, Jun 12, 2016 at 01:49:34AM -0400, Random832 wrote:
The intention behind getrandom() is that it is intended *only* for cryptographic purposes.
I'm somewhat confused now because if that's the case it seems to accomplish multiple unrelated things. Why was this implemented as a system call rather than a device (or an ioctl on the existing ones)? If there's a benefit in not going through the non-atomic (and possibly resource limited) procedure of acquiring a file descriptor, reading from it, and closing it, why is that benefit not also extended to non-cryptographic users of urandom via allowing the system call to be used in that way?
This design was taken from OpenBSD, and the goal with getentropy(2) (which is also designed only for cryptographic use cases), was so that a denial of service attack (fd exhaustion) could force an application to fall back to a weaker -- in some cases, very weak or non-existent --- source of randomness. Non-cryptographic users don't need to use this interface at all. They can just use srandom(3)/random(3) and be happy.
Anyway, if you don't need cryptographic guarantees, you don't need getrandom(2) or getentropy(2); something like this will do just fine:
Then what's /dev/urandom *for*, anyway?
/dev/urandom is a legacy interface. It was intended originally for cryptographic use cases, but it was intended for the days when very few programs needed a secure cryptographic random generator, and it was assumed that application programmers would be very careful in checking error codes, etc. It also dates back to a time when the NSA was still pushing very hard for cryptographic export controls (hence the use of SHA-1 versus an encryption algorithm) and when many people questioned whether or not the SHA-1 algorithm, as designed by the NSA, had a backdoor in it. (As it turns out, the NSA put a back door into DUAL-EC, so retrospect this concern really wasn't that unreasonable.) Because of those concerns, the assumption is those few applications who really wanted to get security right (e.g., PGP, which still uses /dev/random for long-term key generation), would want to use /dev/random and deal with entropy accounting, and asking the user to type randomness on the keyboard and move their mouse around while generating a random key. But times change, and these days people are much more likely to believe that SHA-1 is in fact cryptographically secure, and future crypto hash algorithms are designed by teams from all over the world and NIST/NSA merely review the submissions (along with everyone else). So for example, SHA-3 was *not* designed by the NSA, and it was evaluated using a much more open process than SHA-1. Also, we have a much larger set of people writing code which is sensitive to cryptographic issues (back when I wrote /dev/random, I probably had met, or at least electronically corresponded with a large number of the folks who were working on network security protocols, at least in the non-classified world), and these days, there is much less trust that people writing code to use /dev/[u]random are in fact careful and competent security engineers. Whether or not this is a fair concern or not, it is true that there has been a change in API design ethos away from the "Unix let's make things as general as possible, in case someone clever comes up use case we didn't think of", to "idiots are ingenious so they will come up with ways to misuse an idiot-proof interface, so we need to lock it down as much as possible." OpenBSD's getentropy(2) interface is a strong example of this new attitude towards API design, and getrandom(2) is not quite so doctrinaire (I added a flags field when getentropy(2) didn't even give those options to progammers), but it is following in the same tradition. Cheers, - Ted
On Thu, Jun 9, 2016 at 8:11 PM, Larry Hastings <larry@hastings.org> wrote:
On 06/09/2016 07:58 PM, Nathaniel Smith wrote:
I suspect the crypto folks would be okay with pushing this back to 3.6, so long as the final resolution is that os.urandom remains the standard interface for, as the docstring says, "Return[ing] a string of n random bytes suitable for cryptographic use" using the OS-recommended method, and they don't have to go change all their code.
The Linux core devs didn't like the behavior of /dev/urandom. But they couldn't change its behavior without breaking userspace code. Linux takes backwards-compatibility very seriously, so they left /dev/urandom exactly the way it was and added new functionality (the getrandom() system call) that had the semantics they felt were best.
I don't understand why so many people seem to think it's okay to break old code in new versions of Python, when Python's history has shown a similarly strong commitment to backwards-compatibility. os.urandom() was added in Python 2.4, in 2004, and remained unchanged for about thirteen years. That's thirteen years of people calling it and assuming its semantics were identical to the local "urandom" man page, which was correct.
I can only speak for myself, but the the reason it doesn't bother me is that the documentation for os.urandom has always been very clear that it is an abstraction over multiple OS-specific sources of cryptographic randomness -- even in the 2.4 docs [1] we read that its output "depends on the OS implementation", and that it might be /dev/urandom, it might be CryptGenRandom, and it might even raise an exception if "a randomness source is not found". So as a user I've always expected that it will make a best-effort attempt to use whatever the best source of cryptographic randomness is in a given environment, or else make a best-effort attempt to raise an error if it's determined that it can't give me cryptographic randomness, and it's been doing that unchanged for thirteen years too. But now Linux has moved forward and provided an improved OS-specific source of cryptographic randomness, and in particular one that actually signals to userspace when it doesn't have randomness available. So we have a choice: either we have to break the guarantee that os.urandom is identical to /dev/urandom, or we have to break the guarantee that os.urandom uses the best OS-specific source of cryptographic randomness. Either way we're breaking some guarantee we used to make. And AFAICT so far 100% of the people who actually maintain libraries that call os.urandom are asking python-dev to break the identical-to-/dev/urandom guarantee and preserve the uses-the-best-OS-specific-cryptographic-randomness guarantee. Disrupting working code is a bad thing, but in the long run, no-one is actually asking for an os.urandom that silently falls back on the xkcd #221 PRNG [2]. All that said, on the eve of the 3.5.2 release is a terrible time to be trying to decide this, and it makes perfect sense to me that maybe 3.5 should kick this can down the road. Your efforts as RM are appreciated and I'm glad I'm not in your spot :-). -n [1] https://docs.python.org/2.4/lib/os-miscfunc.html [2] https://xkcd.com/221/ -- Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>
On 06/09/2016 08:52 AM, Guido van Rossum wrote: That leaves direct calls to os.urandom(). I don't think this should block either.
On 9 June 2016 at 22:22, Larry Hastings <larry@hastings.org> wrote:
Then it's you and me against the rest of the world ;-)
Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. It's permissible to take advantage of getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from /dev/urandom.
So assuming this is the “final” decision, where to from here? I think the latest change by Colm and committed by Victor already implements this decision: https://hg.python.org/cpython/rev/9de508dc4837 Getrandom() is still called, but if it would block, we fall back to trying the less-secure Linux /dev/urandom, or fail if /dev/urandom is missing. The Python hash seed is still set using this code. And os.urandom() calls this code. Random.seed() and SystemRandom still use os.urandom(), as documented. So I suggest we close the original mega bug thread <https://bugs.python.org/issue26839> as fixed. Unless people think they can change Larry or Guido’s mind, we should focus further discussion on any changes for 3.6.
On 9 Jun 2016, at 14:48, Doug Hellmann <doug@doughellmann.com> wrote:
I agree those are the two options. I want the application developer to make the choice, not us.
Right, but right now those two options aren’t available: only one of them is. And one way or another we’re taking an action here: either we’re leaving os.urandom() as it stands now, or reverting it back to the way it was in 3.4.0. This means that you *do* want python-dev to make a choice: specifically, you want python-dev to make the choice that was made in 3.4.0, rather than the one that was made in 3.5.0. That’s fine, but we shouldn’t be pretending that either side is arguing for inaction or the status quo for Python 3.5 a choice was made with insufficient knowledge of the outcomes, and now we’re arguing about whether we can revert that choice. The difference is, now we *do* know about both outcomes, which means we are consciously choosing between them.
All of which fails to be backwards compatible (new exceptions and hanging behavior), which means you’re breaking apps.
Backwards compatible with what? Python 3.5.0 and 3.5.1 both have this behaviour, so I assume you mean “backward compatible with 3.4”. However, part of the point of a major release is that it doesn’t have to be backward compatible in this manner: Python breaks backward compatibility all the time in major releases. I should point out that as far as I'm aware there are exactly two applications that suffer from this problem. One of them is Debian’s autopkgtest, which has resolved this problem by invoking Python with PYTHONHASHSEED=0. The other is systemd-cron, and frankly it does not seem at all unreasonable to suggest that perhaps systemd-cron should *maybe* hold off until the system’s CSPRNG gets seeded before it starts executing. Cory
On 6/9/2016 9:48 AM, Doug Hellmann wrote:
On Jun 9, 2016, at 9:27 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
The problem here is that both definitions of ‘broken’ are unclear. If we leave os.urandom() as it is, there is a small-but-nonzero change that your program will hang, potentially indefinitely. If we change it back, there is a small-but-nonzero chance your program will generate you bad random numbers.
If we assume, for a moment, that os.urandom() doesn’t get called during Python startup (that is that we adopt Christian’s approach to deal with random and SipHash as separate concerns), what we’ve boiled down to is: your application called os.urandom() so early that you’ve got weak random numbers, does it hang or proceed? Those are literally our two options.
I agree those are the two options. I want the application developer to make the choice, not us.
I think the 'new API' should be a parameter, not a new function. With just two choices, 'wait' = True/False could work. If 'raise an exception' were added, then 'action (when good bits are not immediately available' = 'return (best possible)' or 'wait (until have good bits)' or 'raise (CryptBitsNotAvailable)' In either case, there would then be the question of whether the default should match 3.5.0/1 or 3.4 and before. -- Terry Jan Reedy
On Jun 9, 2016, at 7:25 AM, Larry Hastings <larry@hastings.org> wrote:
A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix".
Couple clarifications: random.py --------- In the abstract it doesn't hurt to seed MT with a CSPRNG, it just doesn't provide much (if any) benefit and in this case it is hurting us because of the cost on import (which will exist on other platforms as well no matter what we do here for Linux). There are a couple solutions to this problem: * Use getrandom(GRND_NONBLOCK) for random.Random since it doesn't matter if we get cryptographically secure random numbers or not. * Switch it to use something other than a CSPRNG by default since it doesn't need that. * Instead of seeding itself from os.urandom on import, have it lazily do that the first time one of the random.rand* functions are called. * Do nothing, and say that ``import random`` relies on having the kernel's urandom pool initialized. Between these options, I have a slight preference for switching it to use a non CSPRNG, but I really don't care that much which of these options we pick. Using random.Random is not secure and none of the above options meaningfully change the security posture of something that accidently uses it. SipHash and the Interpreter Startup ----------------------------------- I have complicated thoughts on what SipHash should do. For something like, a Django process, we never want it to be initialized with “bad” entropy, however reading straight from /dev/urandom, or getrandom(GRND_NONBLOCK) means that we might get that if we start the process early enough in the boot process. The rub here is that I cannot think of a situation where by the time you’re at the point you’re starting up something like Django, you’re even remotely likely to not have an initialized random pool. The other side of this issue is that we have Python scripts which do not need a secure random being passed to SipHash running early enough in the boot process with systemd that we need to be able to have SipHash initialization not block on waiting for /dev/urandom. So I’m torn between the “Practicality beats Purity” mindset, which says we should just let SipHash seed itself with whatever quality of random from the urandom pool is currently available and the “Special cases aren’t special enough to break the rules” mindset which says that we should just make it easier for scripts in this edge case to declare they don’t care about hash randomization to remove the need for it (in other words, a CLI flag that matches PYTHONHASHSEED in functionality). An additional wrinkle in the mix is that we cannot get non-blocking random on many (any?) modern OS besides Linux, so we're going to run into this same problem if say, FreeBSD decides to put a Python script early enough in the boot sequence. In the end, both of these choices make me happy and unhappy in different ways but I would lean towards adding a CLI flag for the special case and letting the systemd script that caused this problem invoke their Python with that flag. I think this because: * It leaves the interpreter so that it is secure by default, but provides the relevant knobs to turn off this default in cases where a user doesn't need or want it. * It solves the problem in a cross platform way, that doesn't rely on the nuances of the CSPRNG interface on one particular supported platform. os.urandom ---------- There have been a lot of proposals thrown around, and people pointing to different sections of the documentation to justify different opinions. This is easily the most contentious question we have here. It is my belief that reading from urandom is the right thing to do for generating cryptographically secure random numbers. This is a view point held by every major security expert and cryptographer that I'm aware of. Most (all?) major platforms besides Linux do not allow reading from their equivalent of /dev/urandom until it has been successfully initialized and it is widely held by all security experts and cryptographers that I'm aware of that this property is a good one, and the Linux behavior of /dev/urandom is a wart/footgun but that prior to getrandom() there simply wasn't a better option on Linux. With that in mind, I think that we should, to the best of our ability given the platform we're on, ensure that os.urandom does not return bytes that the OS does not think is cryptographically secure. In practice this means that os.urandom should do one of two things in the very early boot process on Linux: * Block waiting on the kernel to initialize the urandom pool, and then return the now secure random bytes given to us. * Raise an exception saying that the pool has not been initialized and thus os.urandom is not ready yet. The key point in both of these options is that os.urandom never [1] returns bytes prior to the OS believing that it can give us cryptographically secure random bytes. I believe I have a preference for blocking on waiting the kernel to intialize the urandom pool, because that makes Linux behave similarly to the other platforms that I'm aware of. I do not believe that adding additional public functions like some other people have expressed to be a good option. I think they muddy the waters and I think that it forces us to try and convince people that "no really, yes everyone says you should use urandom, but you actually want getrandom". Particularly since the outcome of these two functions would be exactly the same in all but a very narrow edge case on Linux. Larry has suggested that os.py should only ever be thin shells around OS provided functionality and thus os.urandom should simply mimic whatever the behavior of /dev/urandom is on that OS. For os.urandom in particular this is already not the case since it calls CryptGetRandom on Windows, but putting that aside since that's a Windows vs POSIX difference, we're not talking about adding a great amount of functionality around something provided by the OS. We're only talking about using a different interface to access the same underlying functionality. In this case, an interface that better suits the actual use of os.urandom in the wild and provides better properties all around. He's also pointed out that the documentation does not guarantee that the result of os.urandom will be cryptographically strong in the following quote: This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. My read of this quote, is that this is a hedge against operating systems that have implemented their urandom pool in such a way that it does not return cryptographically secure random numbers that you don't come back and yell at Python for it. In other words, it's a hedge against /dev/urandom being https://xkcd.com/221/. I do not think this documentation excuses us from using a weaker interface to the OS-specific randomness source simply because it's name happens to match the name of the function. Particularly since earlier on in that documentation it states: Return a string of n random bytes suitable for cryptographic use. and the Python standard library, and the entire ecosystem as I know it, as well as all security experts and crypto experts believe you should treat it as such. This is largely because if your urandom pool is implemented in a way that, in the general case it provides insecure random values, then you're beyond the pale and there's nothing that Python, or anyone but your OS vendor, can do to help you. Further more, I think that the behavior I want (that os.urandom is secure by default to the best of our abilities) is tricker to get right, and requires interfacing with C code. However, getting the exact semantics of /dev/urandom on Linux is trivial to do with a single line of Python code: def urandom(amt): open("/dev/urandom", "rb").read(amt) So if you're someone who is depending on the Linux urandom behavior in an edge case that almost nobody is going to hit, you can trivially get the old behavior back. Even better, if you're someone depending on this, you're going to get an *obvious* failure rather than silently getting insecure bytes. On top of all of that, this only matters in a small edge case, most likely to only ever been hit by OS vendors themselves, who are in the best position to make informed decisions about how to work around the fact the urandom entropy pool hasn't already been initialized rather than expecting every other user to have to try and ensure that they don't start their Python script too early. [1] To the best of our ability, given the interfaces and implementation provided to us by the OS. — Donald Stufft
On Thu, Jun 09, 2016 at 08:26:20AM -0400, Donald Stufft wrote:
random.py ---------
In the abstract it doesn't hurt to seed MT with a CSPRNG, it just doesn't provide much (if any) benefit and in this case it is hurting us because of the cost on import (which will exist on other platforms as well no matter what we do here for Linux). There are a couple solutions to this problem:
* Use getrandom(GRND_NONBLOCK) for random.Random since it doesn't matter if we get cryptographically secure random numbers or not.
+1 on this option (see below for rationale).
* Switch it to use something other than a CSPRNG by default since it doesn't need that. [...] Between these options, I have a slight preference for switching it to use a non CSPRNG, but I really don't care that much which of these options we pick. Using random.Random is not secure and none of the above options meaningfully change the security posture of something that accidently uses it.
I don't think that is quite right, although it will depend on your definition of "meaningful". PEP 506 says: Demonstrated attacks against MT are typically against PHP applications. It is believed that PHP's version of MT is a significantly softer target than Python's version, due to a poor seeding technique [17] . https://www.python.org/dev/peps/pep-0506/#id17 specifically that PHP seeds the MT with the time, while we use the output of a CSPRNG. Now, we all agree that MT is completely the wrong thing to use for secrets, good seeding or not, but *bad* seeding could make it a PHP-level soft target. The point of PEP 506 is to move people away from using random.Random for their secrets, but we should expect that whatever we do, there will be some late adopters who are slow to get the message and continue to use it. I would not like us to weaken the seeding technique to the point that those folks become an attractive target. I think that using getrandom(GRND_NONBLOCK) will be okay, provided that when the entropy pool is too low and getrandom falls back to something cryptographically weak, it's still better (hopefully significantly better) than seeding with the time. My reasoning is that the sort of applications that could be targets of attacks against MT are unlikely to be started up early in the boot process, so they're almost always going to get good crypto seeds. On the rare occasion that they don't, well, there's only so far that I'm prepared to stand up for developer's right to be ignorant of security concerns in 2016, and that's where I draw the line.
SipHash and the Interpreter Startup ----------------------------------- [...] In the end, both of these choices make me happy and unhappy in different ways but I would lean towards adding a CLI flag for the special case and letting the systemd script that caused this problem invoke their Python with that flag. I think this because:
* It leaves the interpreter so that it is secure by default, but provides the relevant knobs to turn off this default in cases where a user doesn't need or want it. * It solves the problem in a cross platform way, that doesn't rely on the nuances of the CSPRNG interface on one particular supported platform.
Makes sense to me. +1
os.urandom ---------- [...] With that in mind, I think that we should, to the best of our ability given the platform we're on, ensure that os.urandom does not return bytes that the OS does not think is cryptographically secure.
Just to be clear, you're talking about having it block rather than raise an exception, right? If so, that makes sense to me. That's already the behaviour on all major platforms except Linux, so you're just bringing Linux into line with the others. Those who want the non-blocking behaviour on Linux can just read from /dev/urandom. +1 -- Steve
On Jun 9, 2016, at 12:30 PM, Steven D'Aprano <steve@pearwood.info> wrote:
os.urandom ----------
[...]
With that in mind, I think that we should, to the best of our ability given the platform we're on, ensure that os.urandom does not return bytes that the OS does not think is cryptographically secure.
Just to be clear, you're talking about having it block rather than raise an exception, right?
If so, that makes sense to me. That's already the behaviour on all major platforms except Linux, so you're just bringing Linux into line with the others. Those who want the non-blocking behaviour on Linux can just read from /dev/urandom.
There are three options for what do with os.urandom by default: * Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like. * Raise an exception if we determine that the pool isn’t initialized enough to get secure random from it. * Block until the pool is initialized. Historically Python has done the first option on Linux (but not on other OSs) because that was simply the only interface that Linux offered at all. In 3.5.0 Victor changed the way os.urandom worked in a way that made it use the third option (he wasn’t attempting to change the security properties, just avoid using an FD, but it improved the security properties as well). My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure. — Donald Stufft
On 9 June 2016 at 12:39, Donald Stufft <donald@stufft.io> wrote:
On Jun 9, 2016, at 12:30 PM, Steven D'Aprano <steve@pearwood.info> wrote:
os.urandom ----------
[...]
With that in mind, I think that we should, to the best of our ability given the platform we're on, ensure that os.urandom does not return bytes that the OS does not think is cryptographically secure.
Just to be clear, you're talking about having it block rather than raise an exception, right?
If so, that makes sense to me. That's already the behaviour on all major platforms except Linux, so you're just bringing Linux into line with the others. Those who want the non-blocking behaviour on Linux can just read from /dev/urandom.
There are three options for what do with os.urandom by default:
* Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like. * Raise an exception if we determine that the pool isn’t initialized enough to get secure random from it. * Block until the pool is initialized.
Historically Python has done the first option on Linux (but not on other OSs) because that was simply the only interface that Linux offered at all. In 3.5.0 Victor changed the way os.urandom worked in a way that made it use the third option (he wasn’t attempting to change the security properties, just avoid using an FD, but it improved the security properties as well).
My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure.
I think an exception is much easier for a user to deal with from a practical point of view. Trying to work out why a process has hung is obviously possible, but not necessarily easy. Having a process crash due to an exception is very easy to diagnose by comparison. Cheers, Ben
On 9 June 2016 at 17:54, Ben Leslie <benno@benno.id.au> wrote:
My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure.
I think an exception is much easier for a user to deal with from a practical point of view. Trying to work out why a process has hung is obviously possible, but not necessarily easy.
If we put the specific issue of applications that run very early in system startup to one side, is there a possibility of running out of entropy during normal system use? Even for a tiny duration? An exception may be better than a hanging process, but a random process crash in place of a wait of a few microseconds for the entropy buffer to fill up again not so much. If we could predict whether the call was going to block for a microsecond, or for 20 minutes, I'd be OK with an exception for the latter case. But we can't predict the future, so unless the system call is guaranteed not to block except at system startup, then I prefer blocking over an exception. As for blocking vs returning less random results, I defer to others on that. On 9 June 2016 at 18:14, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote:
There are three options for what do with os.urandom by default:
* Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like.
Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows?
That's what I understood, certainly. But the place where this was an issue in real life was a Python program being run during the startup sequence of the OS. That's never going to be possible on Windows, so I'd be cautious about drawing parallels with Windows in this situation (blocking on Windows may be fine because Python can never run when Windows could possibly have low entropy available). Paul
On Jun 9, 2016, at 1:21 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 9 June 2016 at 17:54, Ben Leslie <benno@benno.id.au> wrote:
My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure.
I think an exception is much easier for a user to deal with from a practical point of view. Trying to work out why a process has hung is obviously possible, but not necessarily easy.
If we put the specific issue of applications that run very early in system startup to one side, is there a possibility of running out of entropy during normal system use? Even for a tiny duration? An exception may be better than a hanging process, but a random process crash in place of a wait of a few microseconds for the entropy buffer to fill up again not so much.
If we could predict whether the call was going to block for a microsecond, or for 20 minutes, I'd be OK with an exception for the latter case. But we can't predict the future, so unless the system call is guaranteed not to block except at system startup, then I prefer blocking over an exception.
/dev/urandom (and getrandom() on Linux) will never block once the pool has been initialized. The concept of “running out of entropy” doesn’t apply to it. Once it has entropy it’s good to go.
As for blocking vs returning less random results, I defer to others on that.
On 9 June 2016 at 18:14, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote:
There are three options for what do with os.urandom by default:
* Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like.
Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows?
That's what I understood, certainly. But the place where this was an issue in real life was a Python program being run during the startup sequence of the OS. That's never going to be possible on Windows, so I'd be cautious about drawing parallels with Windows in this situation (blocking on Windows may be fine because Python can never run when Windows could possibly have low entropy available).
Paul
— Donald Stufft
On Thu, Jun 09, 2016 at 06:21:32PM +0100, Paul Moore wrote:
If we put the specific issue of applications that run very early in system startup to one side, is there a possibility of running out of entropy during normal system use? Even for a tiny duration?
With /dev/urandom, I believe the answer to that is no. On most platforms other than Linux, /dev/urandom is exactly the same as /dev/random, and both can only block straight after the machine has booted up before enough entropy has been collected. Then they will run forever without blocking. (Or at least until you reboot.) On Linux, /dev/random *will* block, at unpredictable times, but fortunately we're not using /dev/random. We're using Urandom. Apart from just after boot up, /dev/urandom on Linux will also run forever without blocking, just like the other platforms. The critical difference is just after booting up: - Linux /dev/urandom doesn't block, but it might return predictable, poor-quality pseudo-random bytes (i.e. a potential exploit); - Other OSes may block for potentially many minutes (i.e. a potential DOS). Two links which may help explain what's happening: http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ http://security.stackexchange.com/a/42955 -- Steve
Steven D'Aprano wrote:
- Linux /dev/urandom doesn't block, but it might return predictable, poor-quality pseudo-random bytes (i.e. a potential exploit);
- Other OSes may block for potentially many minutes (i.e. a potential DOS).
It's even possible that it could block *forever*. There was a case here recently in the cosc dept where students were running Clojure programs in a virtual machine environment. When they updated to a newer version of Clojure, everyone's programs started hanging on startup. It turned out the Clojure library was initialising its RNG from /dev/random, and the VM didn't have any real spinning disks or other devices to provide entropy. -- Greg
On Thu, Jun 09, 2016 at 12:54:31PM -0400, Ben Leslie wrote:
I think an exception is much easier for a user to deal with from a practical point of view. Trying to work out why a process has hung is obviously possible, but not necessarily easy.
Having a process crash due to an exception is very easy to diagnose by comparison.
That only makes sense if the application is going to block for (say) five or ten minutes. If it's going to block for three seconds, you might not even notice. At least not on a server. But what are you going to do when you catch that exception? - Sleep for a few seconds, and try again? That's just blocking. - Stop waiting on secure randomness, and use something low quality and insecure? That's how you get exploits. - Crash? -- Steve
On 9 June 2016 at 13:29, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Jun 09, 2016 at 12:54:31PM -0400, Ben Leslie wrote:
I think an exception is much easier for a user to deal with from a practical point of view. Trying to work out why a process has hung is obviously possible, but not necessarily easy.
Having a process crash due to an exception is very easy to diagnose by comparison.
That only makes sense if the application is going to block for (say) five or ten minutes. If it's going to block for three seconds, you might not even notice. At least not on a server.
But what are you going to do when you catch that exception?
- Sleep for a few seconds, and try again? That's just blocking.
- Stop waiting on secure randomness, and use something low quality and insecure? That's how you get exploits.
- Crash?
What does a program do when on any exception? It really depends on the program and the circumstances in which it is running. But I would think that in most circumstances 'crash' is the answer. In the circumstances where this is most likely going to occur (server startup) you are almost certainly going to have some type of supervisory program restarting the failed process. It will almost certainly be logging the failure. Having logs filled with process restarts due to this error until there is finally entropy is better than it just hanging. At least that is what I'd prefer to diagnose. I think the real solution here would be outside of Python; starting a process that needs entropy when the system isn't ready yet is just as silly as running a 'mount' on a disk where the driver is still loading, or 'ifconfig' on a network interface where the network driver isn't yet loaded. But that isn't really a problem that can be solved in the context of Python. Cheers, Ben
On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote:
There are three options for what do with os.urandom by default:
* Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like.
Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows? -- Steve
On Jun 9, 2016, at 1:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote:
There are three options for what do with os.urandom by default:
* Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like.
Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows?
To my knowledge, all other major platforms block or otherwise ensure that /dev/urandom can never return anything but cryptographically secure random. [1]
-- Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
[1] I believe OpenBSD cannot block, but they inject randomness via the boot loader so that the system is never in a state where the kernel doesn’t have enough entropy. — Donald Stufft
On 06/09/2016 10:22 AM, Donald Stufft wrote:
On Jun 9, 2016, at 1:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows? To my knowledge, all other major platforms block or otherwise ensure that /dev/urandom can never return anything but cryptographically secure random. [1]
I've done some research into this over the past couple of days. To the best of my knowledge: * Linux: /dev/urandom will never block. If the entropy pool isn't initialized yet, it will return poor-quality random bits from what is effectively an unseeded PRNG. (Yes: it uses a custom algorithm which isn't considered CPRNG-strength, it is merely a PRNG seeded with entropy.) * OS X: AFAICT, /dev/urandom guarantees it will never block. It uses an old CSPRNG, 160-bit Yarrow. The documentation states that if the entropy pool is "drained", it won't block; instead it'll degrade ("output quality will suffer over time without any explicit indication from the random device itself"). It isn't clear how initialization of the entropy pool during early startup might affect this. http://www.manpages.info/macosx/random.4.html * FreeBSD: /dev/urandom may block. It also using Yarrow (but maybe with more bits? and possibly switching soon to Yarrow's successor, Fortuna?). Both devices guarantee high-quality random bits, and will block if they feel like they're running low on entropy. * OpenBSD 5.1 is like FreeBSD, except the algorithm used is ARC4. In OpenBSD 5.5 they changed to using ChaCha20. On all of those platforms *except* Linux, /dev/random and /dev/urandom are exactly the same. Also, regarding Windows: Victor Stinner did some experiments with a VM, and even in early startup he was able to get random bits from os.urandom(). But it's hard to have a "fresh" Windows VM, so it's possible it had residual entropy from a previous boot, so this isn't conclusive. //arry/
On 2016-06-09 19:14, Steven D'Aprano wrote:
On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote:
There are three options for what do with os.urandom by default:
* Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like.
Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows?
To best of my knowledge, Windows and OSX are already initialized when Python is started. On other BSD platforms it is possible to get the seeding state through the proc file system.
On Jun 9, 2016, at 7:25 AM, Larry Hastings <larry@hastings.org> wrote:
6) Guido and Tim Peters already decided once that os.urandom() should behave like /dev/urandom.
Issue #25003: http://bugs.python.org/issue25003 <http://bugs.python.org/issue25003> To be exceedingly clear, in this issue the problem wasn’t that os.urandom was blocking once, early on in the boot process before the kernel had initialized it’s urandom pool. The problem was that the getentropy() function on Solaris behaves more like /dev/random does on Linux. This behavior is something that myself, and most security experts/cryptographers that I know of, think is bad behavior (and indeed, most OSs have gotten rid of this behavior of /dev/random and made /dev/random and /dev/urandom behave the same... except again for Linux).
The ask here isn't to make Linux behave like Solaris did in that issue, it's to use the newer, better, interface to make Linux use the more secure behavior that most (all?) of the other modern OSs have already adopted. — Donald Stufft
On 06/09/2016 04:25 AM, Larry Hastings wrote:
A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix".
Having read the thread thus far, here is my take on fixing it: - Modify os.urandom() to raise an exception instead of blocking. Everyone seems to agree that this is a rare corner case, and being rare it would be easier (at least for me) to troubleshoot an exception instead of a VM (or whatever) hanging and then being killed. - Add a CLI knob to not raise, but instead wait for initialization. I think this should be under the control of the user, who knows (or should) the environment that Python is running under, and not the developer who may have never dreamed his/her little script would be called first thing during bootup. Maybe we just continue to use the hash seed parameter for this. - Modify the functions that don't need cryptographically strong random bits to use the old style (reading directly from /dev/urandom?). This seems like it should appease the security folks, yet still allow those in the trenches to (more) easily diagnose and work around the problem. -- ~Ethan~
To expand on my idea of printing a warning, in 3.6 we could add a new Warning exception for this purpose, so you'd have command-line control over the behavior of os.urandom() by specifying -WXXX on your Python command line. For 3.5.2 that's too fancy though -- we can't add a new exception. -- --Guido van Rossum (python.org/~guido)
On 9 June 2016 at 04:25, Larry Hastings <larry@hastings.org> wrote:
A user reports that when starting CPython soon after startup on a fresh virtual machine, the process would hang for a long time. Someone on the issue reported observed delays of over 90 seconds. Later we found out: it wasn't 90 seconds before CPython became usable, these 90 seconds delays were before systemd timed out and simply killed the process. It's not clear what the upper bound on the delay might be.
The issue author had already identified the cause: CPython was blocking on getrandom() in order to initialize hash randomization. On this fresh virtual machine the entropy pool started out uninitialized. And since the only thing running on the machine was CPython, and since CPython was blocked on initialization, the entropy pool was initializing very, very slowly.
Further analysis (mentioned later in the original Python-3.5-on-Linux bug report) suggested that this wasn't actually a generic "waiting for the entropy pool to initialise" problem. Instead, the problem appeared to be specifically that the Python script was being invoked *before the Linux kernel had initialised the entropy pool* and the boot process was waiting for that script to run before continuing on with other tasks (like initialising the entropy pool). That meant os.urandom() had nothing to do with it (since the affected script wasn't generating random numbers), and the entire problem was that we were blocking trying to initialise CPython's internal hashing. Born from Victor's proposal to add a "wait for entropy?" flag to os.urandom [1], the simplest proposal for a long term fix [2] posted so far has been to: 1. make os.urandom raise BlockingIOError if kernel entropy is not available 2. don't rely on os.urandom for internal hash initialisation 3. don't rely on os.urandom for MT seeding in the random module Linux is currently the only OS we know of where the BlockingIOError would be a possible result, and the only known scenarios where it could be raised are Linux init system scripts and some embedded systems where the kernel doesn't have any good sources of entropy. In both those cases, the lack of entropy is potentially a real problem, and an exception lets the software author make an informed decision to either wait for entropy (e.g. by polling os.urandom() until it succeeds, or selecting on /dev/random) or else read directly from /dev/urandom (potentially getting non-cryptographically secure bits) The virtue of this approach is that it's entirely invisible for almost all users, and the users that it does affect will start getting an exception in Python 3.6+ rather than silently being handed cryptographically non-secure random data. Cheers, Nick. [1] http://bugs.python.org/issue27266 [2] http://bugs.python.org/issue27282 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (35)
-
A.M. Kuchling
-
Alex Walters
-
Barry Warsaw
-
Ben Leslie
-
Brett Cannon
-
Chris Jerdonek
-
Christian Heimes
-
Cory Benfield
-
David Mertz
-
Donald Stufft
-
Doug Hellmann
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Ionel Cristian Mărieș
-
Larry Hastings
-
M.-A. Lemburg
-
Martin Panter
-
Nathaniel Smith
-
Nick Coghlan
-
Nikolaus Rath
-
Paul Moore
-
R. David Murray
-
Random832
-
Sebastian Krause
-
Stefan Krah
-
Stephen J. Turnbull
-
Stephen J. Turnbull
-
Steve Dower
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy
-
Theodore Ts'o
-
Tim Peters
-
Victor Stinner