PEP 504: Using the system RNG by default

Hi folks, Based on the feedback in the recent threads, I've written a draft PEP that dispenses with the userspace CSPRNG idea, and instead proposes: * defaulting to using the system RNG for the module level random API in Python 3.6+ * implicitly switching to the deterministic PRNG if you call random.seed(), random.getstate() or random.setstate() (this implicit fallback would trigger a silent-by-default deprecation warning in 3.6, and a visible-by-default runtime warning after 2.7 goes EOL) * providing random.system and random.seedable submodules so you can explicitly opt in to using the one you want without having to manage your own RNG instances That approach would provide a definite security improvement over the status quo, while restricting the compatibility break to a performance regression in applications that use the module level API without calling seed(), getstate() or setstate(). It would also allow the current security warning in the random module documentation to be moved towards the end of the module, in a section dedicated to determinism and reproducibility. The full PEP should be up shortly at https://www.python.org/dev/peps/pep-0504/, but caching is still a problem when uploading new PEPs, so if that 404s, try http://legacy.python.org/dev/peps/pep-0504/ Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15/09/15 16:47, Nick Coghlan wrote:
I do not think these names are helpful. The purpose was to increase security, not confuse the user even more. What does "seedable" mean? Secure as in ChaCha20? Insecure as in MT19937? Something else? A name like "seedable" does not convey any useful information about the security to an un(der)informed web developer. A name like "random.system" does not convey any information about numerical applicability to an un(der)informed researcher. The module names should rather indicate how the generators are intended to be used. I suggest: random.crypto.* (os.urandom, ChaCha20, Arc4Random) random.numeric.* (Mersenne Twister, PCG, XorShift) Deprecate random.random et al. with a visible warning. That should convey the message. Sturla

I had to check out of the mega-threads, but I really don't like the outcome (unless this PEP is just the first of several competing proposals). The random module provides a useful interface – a random() function and a large variety of derived functionality useful for statistics programming (e.g. uniform(), choice(), bivariate(), etc.). Many of these have significant mathematical finesse in their implementation. They are all accessing shared state that is kept in a global variable in the module, and that is a desirable feature (nobody wants to have to pass an extra variable just so you can share the state of the random number generator with some other code). I don’t want to change this API and I don’t want to introduce deprecation warnings – the API is fine, and the warnings will be as ineffective as the warnings in the documentation. I am fine with adding more secure ways of generating random numbers. But we already have random.SystemRandom(), so there doesn’t seem to be a hurry? How about we make one small change instead: a way to change the default instance used by the top-level functions in the random module. Say, random.set_random_generator(<instance>) This would require the global functions to use an extra level of indirection, e.g. instead of random = _inst.random we’d change that code to say def random(): return _inst.random() (and similar for all related functions). I am not worried of the cost of the indirection (and if it turns out too expensive we can reimplement the module in C). Then we could implement def set_random_generator(instance): global _inst _inst = instance We could also have a function random.use_secure_random() that calls set_random_generator() with an instance of a secure random number generator (maybe just SystemRandom()). We could rig things so that once use_secure_random() has been called called, set_random_generator() will throw an exception (to avoid situations where a library module attempts to make the shared random generator insecure in a program that has declared that it wants secure random). It would also be fine for SystemRandom (or at least whatever is used by use_secure_random(), if SystemRandom cannot change for backward compatibility reasons) to raise an exception when seed(), setstate() or getstate() are called. Of course modules are still free to use their own instances of the Random class. But I don’t see a reason to mess with the existing interface. -- --Guido van Rossum (python.org/~guido)

On September 15, 2015 at 1:34:56 PM, Guido van Rossum (guido@python.org) wrote:
The problem isn't so much that there isn't a way of securely generating random numbers, but that the module, as it is right now, guides you towards using an insecure source of random numbers rather than a secure one. This means that unless you're familar with the random module or reading the online documentation you don't really have any idea that ``random.random()`` isn't secure. This is an attractive nuisance for anyone who *doesn't* need deterministic output from their random numbers and leads to situations where people are incorrectly using MT when they should be using SystemRandom because they don't know any better. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Tue, Sep 15, 2015 at 10:50 AM, Donald Stufft <donald@stufft.io> wrote:
That feels condescending, as does the assumption that (almost) every naive use of randomness is somehow a security vulnerability. The concept of secure vs. insecure sources of randomness isn't *that* hard to grasp. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum writes:
It is, but it's also accurate: there's plenty of anecdotal evidence that this actually happens, specifically that most of the recipes for password generation on SO silently fall back to a deterministic PRNG if SystemRandom is unavailable, and the rest happily start with random.random. Not only are people apparently doing a wrong thing here, they are eagerly teaching others to do the same. (There's also the possibility that the bad guys are seeding SO with backdoors in this way, I guess.)
as does the assumption that (almost) every naive use of randomness is somehow a security vulnerability.
This is a strawman. None of the advocates of this change makes that assumption. The advocates proceed from the (basically unimpeachable) assumptions that (1) the attacker only has to win once, and (2) they are out there knocking on a lot of doors. Then the questionable assumption is that (3) the attackers are knocking on *this* door. RC4 was at one time one of the best crypto algorithms available, but it also induced the WEP fiasco, and a scramble for a new standard. The question is whether we wait for a "Python security fiasco" to do something about this situation. Waiting *is* an option; the arguments that RNGs won't be a "Python security fiasco" before Python 4 is released are very plausible[1], and the overhead of a compatibility break is not negligible (though Paul Moore himself admits it's probably not huge, either). But the risk of a security fiasco (probably in a scenario not mentioned in this thread) is real. The arguments of the opponents of the change amount to "I have confirmed that the probability it will happen to me is very small, therefore the probability it will happen to anyone is small", which is, of course, a fallacy.
The concept of secure vs. insecure sources of randomness isn't *that* hard to grasp.
Once one *tries*. Read some of Paul Moore's posts, and you will discover that the very mention of some practice "improving security" immediately induces a non-trivial subset of his colleagues to start thinking about how to avoid doing it. I am almost not kidding; according to his descriptions, the situation in the trenches is very nearly that bad. Security is evidently hated almost as much as spam. If random.random were to default to an unseedable nondeterministic RNG, the scientific users would very quickly discover that (if not on their own, when their papers get rejected). On the other hand, inappropriate uses are nowhere near so lucky. In the current situation, the programs Just Work Fine (they produce passwords that no human would choose for themselves, for example), and noone is the wiser unless they deliberately seek the information. It seems to me that, given the "in your face" level of discoverability that removing the state-access methods would provide, backward compatibility with existing programs is the only real reason not to move to "secure" randomness by default. In fact "secure" randomness is *higher*-quality for any purpose, including science. Footnotes: [1] Cf. Tim Peters' posts especially, they're few and where the information content is low the humor content is high. ;-)

Sorry for the self-followup; premature send. Stephen J. Turnbull writes:
In fact "secure" randomness is *higher*-quality for any purpose, including science.
It does need to be acknowedged that scientists need replicability for unscientific reasons: (1) some "scientists" lie (cf. the STAP cell controversy), and (2) as a regression test for their simulation software. But an exact replication of an "honest" simulation is scientifically useless!

On 16 September 2015 at 11:16, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Yep, hence things like http://stopdisablingselinux.com/ SELinux in enforcing mode operates on a very simple principle: we should know what system resources we expect our applications to access, and we should write that down in a form the computer understands so it can protect us against attackers trying to use that application to do something unintended (like steal user information). However, what we've realised as an industry is that effective security systems have to be *transparent* and they have to be *natural*. So in a containerised world, SELinux isolates containers from each other, but if you're writing code that runs *in* the container, you don't need to worry about it - from inside the container, it looks like SELinux isn't running. The traditional security engineering approach of telling people "You're doing it wrong" just encourages them to avoid talking to security people [1], rather than encouraging them to improve their practices [2]. Hence the proposal in PEP 504 - my goal is to make the default behaviour of the random module cryptographically secure, *without* unduly affecting the use cases that need reproducibility rather than cryptographic security, while still providing at least a nudge in the direction of promoting security awareness. Changing the default matters more to me than the nudge, so I'd be prepared to drop that part. Regards, Nick. [1] http://sobersecurity.blogspot.com.au/2015/09/everyone-is-afraid-of-us.html [2] http://sobersecurity.blogspot.com.au/2015/09/being-nice-security-person.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16 September 2015 at 05:00, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't know if it's still true, but most Oracle database installation instructions state "disable SELinux" as a basic pre-requisite. This is a special case of a more general issue, which is that the "assign only those privileges that you need" principle is impossible to implement when you are working with proprietary software that contains no documentation on what privileges it needs, other than "admin rights". (Actually, I just checked - it looks like the official Oracle docs no longer suggest disabling SELinux. But I bet they still don't give you all the information you need to implement a tight security policy without a fair amount of "try it and see what breaks"...) Even in open source, people routinely run "sudo pip install". Not "make the Python site-packages read/write", which is still wrong, but which at least adheres to the principle of least privilege, but "give me root access". How many people get an app for their phone, see "this app needs <long list of permissions>" and has any option other than to click "yes" or discard the app? Who does anything with UAC on Windows other than blindly click "yes" or disable it altogether? Not because they don't understand the issues (certainly, many don't, but some do) but rather because there's really no other option? In these contexts, "security" is the name for "those things I have to work around to do what I'm trying to do" - by disabling it, or blindly clicking "yes", or insisting I need admin rights. Or another example. Due to a password expiry policy combined with a lack of viable single sign on, I have to change upwards of 50 passwords at least once every 4 weeks in order to be able to do my job. And the time to do so is considered "overhead" and therefore challenged regularly. So I spend a lot of time looking to see if I can automate password changes (which is *definitely* not good practice). I'm sure others do things like using weak passwords or reusing passwords. Because the best practice simply isn't practical in that context. Nobody in the open source or security good practices communities even has an avenue to communicate with the groups involved in this sort of thing. At least as far as I know. I do what I can to raise awareness, but it's a "grass roots" exercise that typically doesn't reach the people with the means to actually change anything. Of course, nobody in this environment uses Python to build internet-facing web applications, either. So I'm not trying to argue that this should drive the question of the RNG used in Python. But at the same time, I am trying to sell Python as a good tool for automating business processes, writing administrative scripts and internal applications, etc. So there is a certain link... Sorry - but it's nice to vent sometimes :-) Paul

On 16 September 2015 at 19:42, Paul Moore <p.f.moore@gmail.com> wrote:
Fortunately, that's no longer the case. Open source based development models are going mainstream, and while there's still a lot of work to do, cases like the US Federal government requiring the creation of open source prototypes as part of a bidding process are incredibly heartening (https://18f.gsa.gov/2015/08/28/announcing-the-agile-BPA-awards/). On the security side, folks are realising that the "You can't do that, it's a security risk" model is a bad one, and hence favoring switching to a model more like "We can help you to minimise your risk exposure while still enabling you to do what you want to do". So while it's going to take time for practices like those described in https://playbook.cio.gov/ to become a description of "the way the IT industry typically works", the benefits are so remarkable that it's a question of "when" rather than "if".
Right, helping Red Hat's Python maintenance team to maintain that kind of balance is one aspect of my day job, hence my interest in https://www.python.org/dev/peps/pep-0493/ as a nicer migration path when backporting the change to verify HTTPS certificates by default. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries. Yes lots of companies got hacked. What's the evidence that a language's default RNG was involved? IIUC the best practice for password encryption (to make cracking using a large word list harder) is something called bcrypt; maybe next year something else will become popular, but the default RNG seems an unlikely candidate. I know that in the past the randomness of certain protocols was compromised because the seeding used a timestamp that an attacker could influence or guess. But random.py seeds MT from os.urandom(2500). So what's the class of vulnerabilities where the default RNG is implicated? Tim's proposal is simple: create a new module, e.g. safefandom, with the same API as random (less seed/state). That's it. Then it's a simple import change away to do the right thing, and we have years to seed StackOverflow with better information before that code even hits the road. (But a backport to Python 2.7 could be on PyPI tomorrow!) -- --Guido van Rossum (python.org/~guido)

[Guido]
There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries.
Which is a shame, since the chatter here is of much higher quality than in the actual primaries ;-)
Yes lots of companies got hacked. What's the evidence that a language's default RNG was involved?
Nobody cares whether there's evidence of actual harm. Just that there _might_ be, and even if none identifiable now, then maybe in the future. There is evidence of actual harm from RNGs doing poor _seeding_ by default, but Python already fixed that (I know, you already know that ;-) ). And this paper, from a few years ago, studying RNG vulnerabilities in PHP apps, is really good: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_... An interesting thing is that several of the apps already had a history of trying to fix security-related holes related to RNG (largely due to PHP's poor default seeding), but remained easily cracked. The primary recommendation there wasn't to make PHP's various PRNGs "crypto by magic", but for core PHP to supply "a standard" crypto RNG for people to use instead. As above, some of the app developers already knew darned well they had a history of RNG-related holes, but simply had no standard way to address it, and didn't have the _major_ expertise needed to roll their own.
1. Users doing their own poor seeding. 2. A hypothetical MT state-deducer (seemingly needing to be considerably more sophisticated than the already mondo sophisticated one in the paper above) to be of general use against Python. 3. "Prove there can't be any in the future. Ha! You can't." ;-)
Which would obviously be fine by me: make the distinction obvious at import time, make "the safe way" dead easy and convenient to use, give it anew name engineered to nudge newbies away from the "unsafe" (by contrast) `random`, and a new name easily discoverable by web search. There's something else here: some of these messages gave pointers to web pages where "security wonks" conceded that specific uses of SystemRandom were fine, but they couldn't recommend it anyway because it's too hard to explain what is or isn't "safe". "Therefore" users should only use urandom() directly. Which is insane, if for no other reason than that users would then invent their own algorithms to convert urandom() results into floats and ints, etc. Then they'll screw up _that_ part. But if "saferandom" were its own module, then over time it could implement its own "security wonk certified" higher level (than raw bytes) methods. I suspect it would never need to change anything from what the SystemRandom class does, but I'm not a security wonk, so I know nothing. Regardless, _whatever_ changes certified wonks deemed necessary in the future could be confined to the new module, where incompatibilities would only annoy apps using that module. Ditto whatever doc changes were needed. Also gone would be the inherent confusion from needing to draw distinctions between "safe" and "unsafe" in a single module's docs (which any by-magic scheme would only make worse). However, supplying a powerful and dead-simple-to-use new module would indeed do nothing to help old code entirely by magic. That's a non-goal to me, but appears to be the _only_ deal-breaker goal for the advocates. Which is why none of us is the BDFL ;-)

On September 16, 2015 at 11:48:12 AM, Tim Peters (tim.peters@gmail.com) wrote:
That was the documentation for PyCA's cryptography module, where the only use of random we needed was for an IV (which you can use the output of os.urandom directly) and for an integer, which you could just use int.from_bytes and the output of os.urandom (i.e. int.from_bytes(os.urandom(20), byteorder="big")). It wasn't so much a general recommendation against random.SystemRandom, just that for our particular use case os.urandom is either by itself fine, or with a tiny bit of code on top of it fine and that's easier to explain than to try to explain how to use the random module safely and just warn against it entirely. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

[Guido, on "saferandom"]
So if you or someone else (Chris?) wrote that up in PEP form I'd accept it.
I like Steven D'Aprano's "secrets" idea better, so it won't be me ;-) Indeed, if PHP had a secure "token generator" function built in, the paper in question would have found almost nothing of practical interest to write about.
I'd even accept adding a warning on calling seed() (but not setstate()).
Yield the width of an electron, and the universe itself will be too small to contain the eventual consequences ;-)

On 17 September 2015 at 00:26, Guido van Rossum <guido@python.org> wrote:
There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries.
There was still a fair bit of useful feedback in there, so I pushed a new version of the PEP that addresses it: * the submodule idea is gone * the module level API still delegates to random._inst at call time rather than import time * random._inst is a SystemRandom() instance by default * there's a new ensure_repeatable() API to switch it back to random.Random() * seed(), getstate() and setstate() all implicitly call ensure_repeatable() * the latter issue a warning recommending calling ensure_repeatable() explicitly The key user experience difference from the status quo is that this allows the "not suitable for security purposes" warning to be moved to a section specifically covering ensure_repeatable(), seed(), getstate() and setstate() rather than automatically applying to the entire random module. The reason it becomes reasonable to move the warning is that it changes the failure mode from "any use of the module API for security sensitive purposes" is a problem to "any use of the module API for security sensitive purposes is a problem if the application also calls random.ensure_repeatable()".
Reducing the search space for brute force attacks on things like: * randomly generated default passwords * password reset tokens * session IDs The PHP paper covered an attack on password reset tokens. Python's seeding is indeed much better, and Tim's mathematical skills are infinitely better than mine so I'm never personally going to win a war of equations with him. If you considered a conclusive proof of a break specifically targeting *CPython's* PRNG essential before considering changing the default behaviour (even given the almost entirely backwards compatible approach I'm now proposing), I'd defer the PEP with a note suggesting that practical attacks on security tokens generated with CPython's PRNG may be a topic of potential interest to the security community. The PEP would then stay deferred until someone actually did the research and demonstrated a practical attack.
If folks are reaching for a third party library anyway, we'd be better off point them at one of the higher levels ones like passlib or cryptography. There's also the aspect that something I'd now like to achieve is to eliminate the security warning that is one of the first things people currently see when they open up the random module documentation: https://docs.python.org/3/library/random.html While I think that warning is valuable given the current default behaviour, it's also inherently user hostile for beginners that actually *do* read the docs, as it raises questions they don't know how to answer: "The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator." Switching the default means that the question to be asked is instead "Do you need repeatability?", which is *much* easier question, and we only need to ask it in the documentation for ensure_repeatable() and the related functions that call that implicitly. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

[Guido]
[Nick Coghlan <ncoghlan@gmail.com>]
Note that, in context, "saferandom" _would_ be a standard module in a future Python 3 feature release. But it _could_ be used literally tomorrow by anyone who wanted a head start, whether in a current Python 2 or Python 3. And if pieces of `passlib` and/or `cryptography` are thought to be essential for best practice, cool, then `saferandom` could also become a natural home for workalikes. Would you really want to _ever_ put such functions in the catch-all "random" module? The docs would become an incomprehensible mess.

On 17 September 2015 at 02:09, Tim Peters <tim.peters@gmail.com> wrote:
My main objection here was the name, so Steven's suggestion of calling such a module "secrets" with a suitably crafted higher level API rather than replicating the entire random module API made a big difference. We may even be able to finally give hmac.compare_digest a more obvious home as something like "secrets.equal". I'll leave PEP 504 as Draft for now, but I currently expect I'll end up withdrawing it in favour of Steven's idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 16 Sep 2015 at 09:10 Tim Peters <tim.peters@gmail.com> wrote:
+1 on the overall idea, although I would rather the module be named random.safe in the stdlib ("namespaces are one honking great idea" and it helps keep the "safer" version of random near the "unsafe" version in the module index which makes discovery easier). And as long as the version on PyPI stays Python 2/3 compatible people can just rely on the saferandom name until they drop Python 2 support and then just update their imports.
So, a PEP for this to propose which random algorithm to use (I have at least heard chacha/ch4random and some AES thing bandied about as being fast)? And if yes to a PEP, who's writing it? And then who is writing the implementation in the end?

On Sep 16, 2015 9:54 AM, "Brett Cannon" <brett@python.org> wrote:
Without repeating my somewhat satirical long name, I think "safe" is a terrible name because it makes a false promise. However, the name "secrets" is a great name. I think a top-level module is better than "random.secrets" because not everything related to secrets is related to randomness. But that detail is minor. Letting the documentation of "secrets" discuss the current state of cryptanalysis on the algorithms and protocols contained therein is the right place for it. With prominent dates attached to those discussions.

[Tim]
[Brett Cannon <brett@python.org>]
+1 on the overall idea, although I would rather the module be named random.safe in the stdlib ("namespaces are one honking great idea"
Ah, grasshopper, there's a reason that one is last in PEP 20. "Flat is better than nested" is the one - and only one - that _obviously_ applies here ;-)
I'd much rather see Steven D'Aprano's "secrets" idea pursued: solve "the problems" on their own terms directly.
os.urandom() is the obvious thing to build on, and it's already there. If alternatives are desired (which they may well be - .urandom() is sloooooooow on many systems), that can be addressed later. Before then, speed probably doesn't matter for most plausibly appropriate uses.
And if yes to a PEP, who's writing it? And then who is writing the implementation in the end?
Did you just volunteer? Great! Thanks ;-) OK, Steven already volunteered to write a PEP for his proposal.

On 17 September 2015 at 04:55, Tim Peters <tim.peters@gmail.com> wrote:
As far as implementation goes, based on a separate discussion at https://github.com/pyca/cryptography/issues/2347, I believe the essential cases can all be covered by: def random_bits(bits): return os.urandom(bits//8) def random_int(bits): return int.from_bytes(random_bits(bits), byteorder="big") def random_token(bits): return base64.urlsafe_b64encode(random_bits(bits)).decode("ascii") def random_hex_digits(bits): return binascii.hexlify(random_bits(bits)).decode("ascii") So if you want a 128 bit (16 bytes) IV, you can just write "secrets.random_bits(128)". Examples of all four in action:
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

[Nick Coghlan <ncoghlan@gmail.com>]
Probably better to wait until Steven starts a new thread about his PEP (nobody is ever gonna look at _this_ thread again ;-) ). Just two things to note: 1. Whatever task-appropriate higher-level functions people want, as you've shown "secure" implementations are easy to write for someone who knows what's available to build on. It will take 10000 times longer for people to bikeshed what "secrets" should offer than to implement it ;-) 2. I'd personally be surprised if a function taking a "number of bits" argument silently replaced argument `bits` with `bits - bits % 8`. If the app-level programmers at issue can't think in terms of bytes instead (and use functions with a `bytes` argument), then, e.g., better to raise an exception if `bits % 8 != 0` to begin with. Or to round up, taking "bits" as meaning "a number of bytes covering _at least_ the number of bits asked for".

On 18 September 2015 at 01:11, Tim Peters <tim.peters@gmail.com> wrote:
Agreed, although the 4 I listed are fairly well-credentialed - the implementations of the first two (raw bytes and integers) are the patterns cryptography.io uses, the token generator is comparable to the Django one (with a couple of extra punctuation characters in the alphabet), and the hex digit generator is the Pyramid one. You can get more exotic with full arbitrary alphabet password and passphrase generators, but I think we're getting beyond stdlib level functionality at that point - it's getting into the realm of password managers and attack software.
Yeah, I took a shortcut to keep them all as pretty one liners. A proper rand_bits with that API would look something like: def rand_bits(bits): num_bytes, add_byte = divmod(bits) if add_byte: num_bytes += 1 return os.urandom(bits) Compared to the os.urandom() call itself, the bits -> bytes calculation should disappear into the noise from a speed perspective (and a JIT compiled runtime like PyPy could likely optimise it away entirely). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Sep 17, 2015, at 12:36, Nick Coghlan wrote:
I think it's important to at least have a way to get a random number in a range that isn't a power of two, since that's so easy to get wrong. Even the libc arc4random API has that in arc4random_uniform. At that point people can build their own arbitrary alphabet password generators as one-liners.

[Tim]
[Nick Coghlan <ncoghlan@gmail.com>]
I will immodestly claim that nobody needs to be a crypto-wonk to see that these implementations are exactly as secure (or insecure) as the platform urandom(): in each case, it's trivial to invert the output to recover the exact bytes urandom() returned. So if there's any attack against the outputs, that's also an attack against what urandom() returned. The outputs just spell what urandom returned using a different alphabet. For the same reason, e.g., it would be fine to replace each 0 bit in urandom's result with the string "egg", and each 1 bit with the string "turtle". An attack on the output of that is exactly as hard (or easy) as an attack on the output of urandom. Obvious, right? It's only a little harder to see that the same is true of even the fanciest of your 4 functions. Where you _may_ get in trouble is creating a non-invertible output. Like: def secure_int(nbytes): n = int.from_bytes(os.urandom(nbytes), "big") return n - n That's not likely to be useful ;-)
I'll leave that for the discussion of Steven's PEP. I think he was on the right track to, e.g., suggest a secure choice() as one his few base building blocks. It _does_ take some expertise to implement a secure choice() correctly, but not so much from the crypto view as from the free-from-statistical-bias view. SystemRandom.choice() already gets both right.
You should really be calling that with "num_bytes" now ;-)
Goodness - "premature optimization" already?! ;-) Fastest in pure Python is likely num_bytes = (bits + 7) >> 3 But if I were bikeshedding I'd question why the function weren't: def rand_bytes(nbytes): return os.urandom(nbytes) instead. A rand_bits(nbits) that meant what it said would likely also be useful:

Nick Coghlan <ncoghlan@...> writes:
I think you want a little bit more flexibility than that, because the allowed characters may depend on the specific protocol (of course, people can use the hex digits version, but the output is longer). (quite a good idea, that "secrets" library - I wonder why nobody proposed it before ;-)) Regards Antoine.

On Wed, Sep 16, 2015, at 11:54, Nick Coghlan wrote:
* random._inst is a SystemRandom() instance by default
He has a point on the performance issue. The difference between Random and SystemRandom on my machine is significantly more than an order of magnitude. (Calling libc's arc4random with ctypes was roughly in the middle, though I *suspect* a lot of that was due to ctypes overhead).

On September 16, 2015 at 1:10:09 PM, Random832 (random832@fastmail.com) wrote:
I did the benchmark already: https://bpaste.net/show/79cc134a12b1 Using this code: https://github.com/dstufft/randtest However, using anything except for urandom is warned against by most security experts I’ve talked to. Most of them are only OK with it, if it means we’re using a CSPRNG by default in the random.py module, but not if it’s not the default. Even then, one of them thought that using a userspace CSPRNG instead of urandom was a bad idea (The rest though it was better than the status quo). They all agreed that splitting the namespace was a good idea. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote:
The output of random.random today when it's not seeded / seeded with None isn't _really_ deterministic - you can't reproduce it, after all, without modifying the code (though in principle you could do seed(None)/getstate the first time and then setstate on subsequent executions - it may be worth supporting this use case?) - so changing it isn't likely to affect anyone - anyone needing MT is likely to also be using the seed functions.
random.set_random_generator(<instance>)
What do you think of having calls to seed/setstate(/getstate?) implicitly switch (by whatever mechanism) to MT? This could be done without a deprecation warning, and would allow existing code that relies on reproducible values to continue working without modification? [indirection in global functions]...
(and similar for all related functions).
global getstate/setstate should also save/replace the _inst or its type; at least if it's a different type than it was at the time the state was saved. For backwards compatibility in case these are pickled it could use the existing format when _inst is the current MT implementation, and accept these in setstate.
SystemRandom already raises an exception when getstate and setstate are called.

On Tue, Sep 15, 2015 at 11:25 AM, Random832 <random832@fastmail.com> wrote:
Yes, that's how I would do it (better than using a weak seed).
Or they could just make a lot of random() calls and find their performance down the drain (like what happened in the tracker issue that started all this: http://bugs.python.org/issue25003).
I happen to believe that MT's performance is a feature of the (default) API, and this would still be considered breakage (again, as in that issue). [indirection in global functions]...
Great! -- --Guido van Rossum (python.org/~guido)

I commonly use random.some_distribution() as a quick source of "randomness" knowing full well that it's not cryptographic. Moreover, I usually do so initially without setting a seed. The first question I want to answer is "does this random process behave roughly as I expect?" But in the back of my mind is always the thought, "If/when I want to reuse this I'll add a seed for reproducibility". It would never occur to me to reach for the random module if I want to do cryptography. It's a good and well established API that currently exists. Sure, add a submodule random.crypto (or whatever name), but I'm -1 on changing anything whatsoever on the module functions that are well known. On Sep 15, 2015 11:26 AM, "Random832" <random832@fastmail.com> wrote:

How about the following. We add a fast secure random generator to the stdlib as an option, and when it has proven its worth a few releases from now we consider again whether the default random() can be made secure without breaking anything. On Tue, Sep 15, 2015 at 12:43 PM, David Mertz <mertz@gnosis.cx> wrote:
-- --Guido van Rossum (python.org/~guido)

On Sep 15, 2015 1:19 PM, "Guido van Rossum" <guido@python.org> wrote:
How about the following. We add a fast secure random generator to the
stdlib as an option, and when it has proven its worth a few releases from now we consider again whether the default random() can be made secure without breaking anything. If we have a fast secure RNG, then the standard Random object might as well at least use it by default until someone actually sets or reads the state (and then switch to MT at that point). Until one of these events happens, the two RNGs are indistinguishable, and this would be a 100% backwards compatible change. (It might even make sense to backport to 2.7.) The limitation is that if library A uses the global random object without seeding in a security sensitive context, and library B uses seeding, then a program that just uses library A will be secure, but if it then starts using library B it will become insecure. But this is still better than the current situation where library A is always insecure. The only case where this would actually have a downside compared to status quo (assuming arc4random lives up to it's reputation for speed etc) is if people start assuming that the default random object is in fact secure and intentionally choosing to use it in security sensitive situations. But hopefully people who know enough to realize that this is a decision they need to make will also read the docs where it clearly states that this is only a best-effort kind of hardening mechanism and that using random.Random/the global methods for cryptographic purposes is still a bug. -n

A pseudo-randomly selected recent quote:
It would never occur to me to reach for the random module if I want to do cryptography.
It's sad that so many of the opponents of this change make this kind of comment sooner or later. Security is (rarely) about *any* of *us*! Most of *us* don't need it (if we do, our physical or corporate security has already been compromised), most of *us* understand it, a somewhat smaller fraction of *us* behave in habitually secure ways (at the level we practice oral hygiene, say). That doesn't mean that security has to be #1 always and everywhere in designing Python, but I find it pretty distressing that apparently a lot of people either don't understand or don't care about what's at stake in these kinds of decisions *for the rest of the world*. The reality is that security that is not on by default is not secure. Any break in a dike can flood a whole town. The flip side is that security has costs, specifically the compatibility break, and since security needs to be on by default, the aggregate burden should be *presumed large* (even a small burden is spread over many users). Nevertheless, I think that the arguments to justify this change are pretty good: (1) The cost of adapting per program seems small, and seems to restricted to a class of users (software engineers doing regression testing and scientists doing simulations) who probably can easily make the change locally. Nick's proto-PEP is specifically designed so that there will be no cost to naive users (kids writing games etc) who don't need access to state. Caveat: there may be a performance hit for some naive users. That can probably be avoided with an appropriate choice of secure RNG, but that hasn't actually been benchmarked AFAIK. (2) ISTM there are no likely attack vectors due to choice of default RNG in random.random, based on Tim's analysis, but AFAICS he's unwilling to say it's implausible that they exist. (Sorry for the double negative!) I take this to mean that there may be real risk. (3) The anecdotal evidence that the module's current default is frequently misused is strong (the StackOverflow recipes for password generation). Two out of three ain't bad. YMMV, of course.

[Stephen J. Turnbull <stephen@xemacs.org>]
Oh, _many_ attacks are possible. Many are even plausible. For example, while Python's _default_ seeding is based on urandom() setting MT's entire massive state (no more secure way exists), a user making up their own seed is quite likely to do so in a way vulnerable to a "poor seeding" attack. "Password generators" should be the least of our worries. Best I can tell, the PHP paper's highly technical MT attack against those has scant chance of working in Python except when random.choice(x) is known to have len(x) a power of 2. Then it's a very powerful attack. But in PHP's idiomatic way of spelling random.choice(x) ("by hand", spelled out in the paper), it's _always_ a very powerful attack. In general, the more technical the attack, the more details matter. It's just no _fun_ to drone on about simple universally applicable brute-force attacks, so I'll continue to drone on about the PHP paper's sophisticated MT state-deducer ;-)

Tim Peters writes:
I'm not sure what you mean to say, but I don't count that as "due to choice of default RNG". That's foot-shooting of the kind we can't do anything about anyway, and if *that* is what Nick is worried about, I'm worried about Nick. ;-) *I* am more worried about attacks we don't know about yet (or at least haven't been mentioned in this thread), and maybe even haven't been invented yet. I presume Nick is, too.
That's genuinely comforting to read (even though it's the second or third time I've read it ;-). But I'm still nervous about the unknown.

[Stephen J. Turnbull <stephen@xemacs.org>]
[Tim]
[Stephen]
I'm not sure what you mean to say,
That the most obvious and easiest of RNG attacks remain possible regardless of anything that may be done, short of refusing to provide a seedable generator.
Oh no, _nobody_ is worried enough to "do something" about it. Not really. Note that in the PHP paper, 10 of the 16 apps scored "full attack" via pure brute force against poor seeding (figure 13, column 4.3). That's probably mostly due to that the versions of PHP tested inflicted poor _default_ seeding on users. I hope so. But there's no accounting of which apps did and didn't set their own seeds. They did note that "Joomla" attempted to repair a security bug by _removing_ its own seeding, in 2010. Which left it open to PHP's poor default seeding instead - which was nevertheless an improvement.
Fundamentally, I just don't see the sense in saying that someone who does their own seeding deserves whatever they get, while someone who uses an inappropriate generator in a security context should be saved from themself. I know, I read all the posts about why I'm wrong. I just don't buy it. There's no real substitute for understanding what you're doing, regardless of field. Yes, incompetence can cause great damage. But I'm not sure it does the world a real favor to possibly help a programmer incompetent to do a task keep working in the field a little longer. This isn't the only damage they can cause, and the longer they keep working in an area they don't understand the more damage they can do. The alternative? Learn how to use frickin' SystemRandom. It's not hard. Or get work for which they are competent.
That's genuinely comforting to read (even though it's the second or third time I've read it ;-)
If you read everything I ever wrote, it's the second. Although you may have _inferred_ it before I ever wrote it, from Nathaniel's "if I use the base64 or hex alphabets", instinctively leaping from "hmm ... 2**6 and ... 2**4" to "power of 2". In which case it could feel like the third time. And I used the phrase "power of 2" in a reply to you before, but in a context wholly unrelated to the PHP paper. That may even make it feel like the fourth time. Always happy to clarify ;-)
But I'm still nervous about the unknown.
Huh! I've heard humans are prone to that. In which case, there will always be something to be nervous about :-)

On 16 September 2015 at 08:23, Tim Peters <tim.peters@gmail.com> wrote:
Because that's never how these things go. You usually don't write a password generator that uses a non-CS PRNG in a security context, get discovered in the short term, and fired/reprimanded/whatever. Instead, one of the following things happens: - you get code review from a reviewer who knows the problem space and spots the problem. It gets fixed, you get educated, you're better prepared for the field. - you get code review from a reviewer who knows the problem space but *doesn't* spot the problem because Python isn't their first language. It doesn't get fixed and no-one notices for ten years until the problem is exploited, but you left the company 8 years ago and are now Head of Security Engineering at CoolStartupInc. - you don't get code review, or your reviewer is no better informed on this topic than you are. The problem doesn't get fixed and no-one notices ever because your program isn't exploited, or is only exploited in ways you never find out about because the rest of your security process sucked too, but you never find out about this. This is the ongoing problem with incompetence when it comes to security: the feedback loop is long and the negative event fires rarely, so most programmers never experience it. Most engineers have *never* experienced a security vulnerability in their own project, let alone had one exploited. Thus, most engineers never get the negative feedback loop that tells them that they don't know enough to do the work they're doing. Look at all the people who get this wrong. Consider haveibeenpwned.com for a minute. They list a fraction of the website databases that have been exposed due to security errors. At last count, that list includes (I removed more than half for the sake of length): - Adobe - Ashley Madison - Snapchat - Gawker - NextGenUpdate - Yandex - Forbes - Stratfor - Domino's - Yahoo - Telecom Regulatory Authority of India - Vodafone - Sony - HackingTeam - Bell - Minecraft Forum - UN Internet Governance Forum - Tesco Are you telling me that every engineer responsible for these is not working in the industry any more? I doubt it. In fact, I think most of these places can't even account for which engineer is responsible, and if they can odds are good they left long before the problem was exploited. So you're right, there is no real substitute for knowing what you're doing. But we cannot prevent programmers who don't know this stuff from writing the code that does it. We don't get to set the bar. We cannot throw GoReadABookOrTwo exceptions when inexperienced programmers type random.random, much as we would like too. With that said, we *can* construct an environment where a programmer has to have actually tried to hurt themselves. They have to have taken the gun off the desk, loaded it, disabled the safety, pointed it at their foot, and pulled the trigger. At that point we can say that we took all reasonable precautions to stop you doing what you did and you did it anyway: that's entirely on you. If you disable the safety settings, then frankly you are taking on the mantle of an expert: you are claiming you knew more than the person who developed the system, and if you don't then the consequences are on you. But if you use the defaults then you're just doing the most obvious thing, and from my perspective that should not be a punishable offence.

Tim Peters writes:
Strawman, or imprecise quotation if you prefer. Nobody said they *deserve* it AFAICR; I said we can't stop them. Strictly speaking, yes, we could. We could (and *I* think we *should*) make it much less obvious how to do it by removing the seed method and the seed argument to __init__. The problem there is backward compatibility. I don't see that Guido would stand for it. Dis here homeboy not a-gonna stick mah neck out heeya, suh. I suspect we might also want to provide helper functions to construct a state from a seed as used by some other simulation package, such as Python 3.4. ;-) Name them and document them as for use in replicating simulations done from those seeds. Nice self-documenting names like "construct_rng_internal_state_from_python_3_4_compatible_seed". There should be one for each version of Python, too (sssh! don't confuse the users with abstractions like "identical implementation").
"Think of it as evolution in action." Yeah, I sympathize. But realistically, Darwinian selection will take geological time, no? That is, in almost all cases where disaster strikes, the culprit has long since moved on[1]. Whoever gets the sack is unlikely to be him or her. More likely it will be whoever has been telling the shop that their product is an accident waiting to happen. :-( The way I think about it, though, is a variation on a theme by Nick. Specifically, the more attractive nuisances we can eliminate, the fewer things the uninitiated need to learn. Footnotes: [1] That's especially true in Japan, where I live. "Whodunnit" also gets fuzzed up by the tendency to group work and group think, and a value system that promotes "getting along with others" more than expertise. Child-proof caps are a GoodThang[tm]. ;-)

[Tim]
Ha! That's actually its worse case, although everyone missed that. I wrote a solver, and bumped into this while testing it. The rub is this line in _randbelow(): k = n.bit_length() # don't use (n-1) here because n can be 1 If n == 2**i, k is i+1 then, and ._randbelow() goes on to throw away half of all 32-bit MT outputs. Everyone before assumed it wouldn't throw any away. The best case for this kind of solver is when .choice(x) has len(x) one less than a power of 2, say 2**i - 1. Then k = i, and ._randbelow() throws away 1 of each of 2**i MT outputs (on average). For small i (say, len(x) == 63), every time I tried then the solver (which can only record bits from MT outputs it _knows_ were produced) found itself stuck with inconsistent equations. If len(x) = 2**20 - 1, _then_ it has a great chance of succeeding. There's about a chance in a million then that a single .choice() call will consume 2 32-bit MT outputs, It takes around 1,250 consecutive observations (of .choice() results) to deduce the starting state then, assuming .choice() never skips an MT output. The chance that no output was in fact skipped is about:
(1 - 1./2**20) ** 1250 0.9988086167972104
So that attack is very likely to succeed. So, until the "secrets" module is released, and you're too dense to use os.urandom(), don't pick passwords from a million-character alphabet ;-)

On Sep 15, 2015 7:23 PM, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
This feels somewhere between disingenuous and dishonest. Just like I don't use the random module for cryptography, I also don't use the socket module or the threading module for cryptography. Could a program dealing with sockets have security issues?! Very likely! Could a multithreaded one expose vulnerabilities? Certainly! Should we try to "secure" these modules for users who don't need to our don't know to think about security? Absolutely not!

On 16 September 2015 at 14:27, David Mertz <mertz@gnosis.cx> wrote:
That's great that you already know not to use the random module for cryptography. Unfortunately, this is a lesson that needs to be taught developer by developer: "don't use the random module for security sensitive tasks". When they ask "Why not?", they get hit with a wall of confusing arcana about brute force search spaces, and cryptographically secure random number generators, and get left with a feeling of dissatisfaction with the explanation because cryptography is one of the areas of computing where our intuitions break down so it takes years to retrain our brains to adopt the relevant mindset. Beginners don't even get that far, as they have to ask "What's a security sensitive task?" while they're still at a stage where they're trying to grasp the basic concept of computer generated random numbers (this is a concrete problem with the current situation, as a warning that says "Don't use this for <X>" is equivalent to "Don't use this" is you don't yet know how to identify "<X>"). It's instinctive for humans to avoid additional work when it provides no immediate benefit to us personally. This is a sensible time management strategy, but it's proved to be a serious problem in the context of computer security. An analogy that came up in one of the earlier threads is this: * as an individual lottery ticket holder, assuming you're going to win is a bad assumption * as a lottery operator, assuming someone, somewhere, is going to win is a good assumption Infrastructure security engineers are lottery operators - with millions of software development projects, millions of businesses demanding online web presences, and tens of millions of developers worldwide (with many, many more on the way as computing becomes a compulsory part of schooling), any potential mistake is going to be made and exploited eventually, we just have no way of predicting when or where. Unlike lottery operators (who get to set their prize levels), we also have no way of predicting the severity of the consequences. The *problem* we have is that individual developers are lottery ticket holders - the probability of *our* particular component being the one that gets compromised is vanishingly small, so the incentive to inflict additional work on ourselves to mitigate security concerns is similarly small (although some folks do it anyway out of sheer interest, and some have professional incentives to do so). So let's assume any given component has a 1 in 10000 chance of being compromised (0.01%). We only have to get to 100k components before the aggregate chance of at least one component being compromised rises to almost 100% (around 99.54%). It's at this point the sheer scale of the internet starts working against us - while it's currently estimated that there are currently only around 30 million developers (both professionals and hobbyists) worldwide, it's further estimated that there are 3 *billion* people with access to the internet. Neither of those numbers is going to suddenly start getting smaller, so we start getting interested in security risks with a lower and lower probability of being exploited. Accordingly, we want "don't worry about security" to be the *right answer* in as many cases as possible - there's always going to be plenty of unavoidable security risks in any software development project, so eliminating the avoidable ones by default makes it easier to focus attention on other areas of potential concern. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16 September 2015 at 03:33, Guido van Rossum <guido@python.org> wrote:
The proposed runtime warnings are just an additional harder to avoid nudge for folks that don't read the documentation, so I'd be OK with dropping them from the proposal. However, it also occurs to me there may be a better solution to eliminating them than getting people to change their imports: add a "random.ensure_seedable()" API that flips the default instance to the deterministic RNG without triggering the warning. For applications that genuinely want the determinism, warnings free 3.6+ compatibility would then look like: if hasattr(random, "ensure_seedable"): random.ensure_seedable()
That was my previous proposal. The problem with it is that it's much harder to test and support, as you have to allow for the global instance changing multiple times, and in multiple different directions. With the proposal in the PEP, there's only a single idempotent change that's possible: from the system RNG (used by default to eliminate the silent security failure) to the seedable RNG (needed for reproducibility). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 15, 2015 at 8:40 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Good, because I really don't want the warnings, nor the hack based on whether you call any of the seed/state-related methods.
I don't believe that seedability is the only thing that matters. MT is also over an order of magnitude faster than os.urandom() or SystemRandom.
Actually part of my proposal was a use_secure_random() that was also a one-way flag flip, just in the opposite direction. :-) With the proposal in the PEP, there's only a single idempotent change
I'd be much more comfortable if in 3.6 we only introduced a new way to generate secure random numbers that was as fast as MT. Once that has been in use for a few releases we may have a discussion about whether it's time to make it the default. Security isn't served well by panicky over-reaction. -- --Guido van Rossum (python.org/~guido)

On 16 September 2015 at 14:12, Guido van Rossum <guido@python.org> wrote:
Security isn't served well by panicky over-reaction.
Proposing a change in 2015 that wouldn't be released to the public until early 2017 or so isn't exactly panicking. (And the thing that changed for me that prompted me to write the PEP was finally figuring out a remotely plausible migration plan to address the backwards compatibility concerns, rather than anything on the security side) As I wrote in the PEP, this kind of problem is a chronic one, not an acute one, where security engineers currently waste a *lot* of their (and other people's) time on remedial firefighting - a security audit (or a breach investigation) detects a vulnerability, high priority issues get filed with affected projects, nobody goes home happy. Accordingly, my proposal is aimed as much at eliminating the perennial "But *why* can't I use the random module for security sensitive tasks?" argument as it is at anything else. I'd like the answer to that question to eventually be "Sure, you can use the random module for security sensitive tasks, so let's talk about something more important, like why you're collecting and storing all this sensitive personally identifiable information in the first place". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 15, 2015 11:00 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
I believe this attitude makes overall security WORSE, not better. Giving a false assurance that simply using a certain cryptographic building block makes your application secure makes it less likely applications will fail to undergo genuine security analysis. Hence I affirmatively PREFER a random module that explicitly proclaims that it is non-cryptographic. Someone who figures out enough to use random.SystemRandom, or a future crypto.random, or the like is more likely to think about why they are doing so, and what doing so does and does NOT assure them off.

The point here is that the closest we can come to PROTECTING users is to avoid making false promises to them. All this talk of "maybe, possibly, secure RNGs" (until they've been analyzed longer) is just building a house on sand. Maybe ChaCha20 is completely free of all exploits... It's new-ish, and no one has found any. The API we really owe users is to create a class random.BelievedSecureIn2015, and let users utilize that if they like. All the rest of the proposals are just invitations to create more security breaches... The specific thing that random.random and MT DOES NOT do. On Sep 16, 2015 1:29 AM, "Cory Benfield" <cory@lukasa.co.uk> wrote:

On 16 September 2015 at 17:43, David Mertz <mertz@gnosis.cx> wrote:
You're *describing the status quo*. This isn't a new concept, as it's the way our industry has worked since forever: 1. All the security features are off by default 2. The onus is on individual developers to "just know" when the work they're doing is security sensitive 3. Once they realise what they're doing is security sensitive (probably because a security engineer pointed it out), the onus is *still* on them to educate themselves as to what to do about it Meanwhile, their manager is pointing at the project schedule demanding to know why the new feature hasn't shipped yet, and they're in turn pointing fingers at the information security team blaming them for blocking the release until the security vulnerabilities have been addressed. And that's the *good* scenario, since the only people it upsets are the people working on the project. In the more typical cases where the security team doesn't exist, gets overruled, or simply has too many fires to try to put out, we get http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-bre... and http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ On the community project side, we take the manager, the product schedule and the information security team out of the picture, so folks never even get to find out that there are any problems with the approach they're taking - they just ship and deploy software, and are mostly protected by the lack of money involved (companies and governments are far more interesting as targets than online communities, so open source projects mainly need to worry about protecting the software distribution infrastructure that provides an indirect attack vector on more profitable targets). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Sep 16, 2015 at 03:59:04PM +1000, Nick Coghlan wrote: [...]
The answer to that question is *already* "sure you can use the random module". You just have to use it correctly. [Aside: do you think that, having given companies and people a "secure by default" solution that will hopefully prevent data breaches, that they will be *more* or *less* open to the idea that they shouldn't be collecting this sensitive information?] We've spent a long time taking about random() as regards to security, but nobody exposes the output of random directly. They use it as a building block to generate tokens and passwords, and *that's* where the breech is occurring. We shouldn't care so much about the building blocks and care more about the high-level tools: the batteries included. Look at the example given by Nathaniel: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_... What was remarkable about this is how many individual factors were involved in the attacks. It wasn't merely an attack on Mersenne Twister, and it is quite possible that had any of the other factors been changed, the attacks would have failed. E.g. the applications used MD5 hashes. What if they had used SHA-1? They leaked sensitive information such as PIDs and exposed the time that the random numbers where generated. They allowed the attackers to get as many connections as they wanted. Someone might argue that none of those other problems would matter if the PRNG was more secure. That's true, up to a point: you never know when somebody will come up with an attack on the CSPRNG. Previous generations of CSPRNGs, including RC4, have been "retired", and we must expect that the current generation will be too. It is a good habit to avoid leaking this sort of information (times, PIDs etc) even if you don't have a concrete attack in place, because you don't know when a concrete attack will be discovered. Today's CSPRNG is tomorrow's hopelessly insecure PRNG, but defence is depth is always useful. I propose that instead of focusing on changing the building blocks that people will use by default, we provide them with ready made batteries for the most common tasks, and provide a clear example of acceptable practices for making their own batteries. (As usual, the standard lib will provide batteries, and third-party frameworks or libraries can provide heavy-duty nuclear reactors.) I propose: - The random module's API is left as-is, including the default PRNG. Backwards compatibility is important, code-churn is bad, and there are good use-cases for a non-CSPRNG. - We add at least one CSPRNG. I leave it to the crypto-wonks to decide which. - We add a new module, which I'm calling "secrets" (for lack of a better name) to hold best-practice security-related functions. To start with, it would have at least these three functions: one battery, and two building blocks: + secrets.token to create password recovery tokens or similar; + secrets.random calls the CSPRNG; it just returns a random number (integer?). There is no API for getting or setting the state, setting the seed, or returning values from non-uniform distributions; + secrets.choice similarly uses the CSPRNG. Developers will still have to make a choice: "do I use secrets, or random?" If they're looking for a random token (or password?), the answer is obvious: use secrets, because the battery is already there. For reasons that I will go into below, I don't think that requiring this choice is a bad thing. I think it is a *good* thing. secrets becomes the go-to module for things you want to keep secret. random remains the module you use for games and simulations. If there is interest in this proposed secrets module, I'll write up a proto-PEP over the weekend, and start a new thread for the benefit of those who have muted this one. You can stop reading now. The rest is motivational rather than part of the concrete proposal. Still here? Okay. I think that it is a good thing to have developers explicitly make a choice between random and secrets. I think it is important that we continue to involve developers in security thinking. I don't believe that "secure by default" is possible at the application level, and that's what really matters. It doesn't matter if the developer uses a "secure by default" CSPRNG if the application leaks information some other way. We cannot possibly hope to solve application security from the bottom-up (although providing good secure tools is part of the solution). I believe that computer security is to the IT world what occupational health and safety is to the farming, building and manufacturing industries (and others). The thing about security is that, like safety, it is not a product. There is no function you can call to turn security on, no secure=True setting. It is a process and a mind-set, and everyone involved needs to think about it, at least a little bit. It took a long time for the blue collar industries to accept that OH&S was something that *everyone* has to be involved in, from the government setting standards to individual workers who have to keep their eyes open while on the job. Like the IT industry today, management's attitude was that safety was a cost that just ate into profits and made projects late (sound familiar?), and the workers' attitude was all too often "it won't happen to us". It takes experience and training and education to recognise dangerous situations on the job, and people die when they don't get that training. It is part of every person's job to think about what they are doing. I don't believe that it is possible to have "zero-thought security" any more than it is possible to have "zero-thought safety". The security professionals can help by providing ready-to-use tools, but the individual developers still have to use those tools correctly, and cultivate a properly careful mindset: "If I wanted to break into this application, what information would I look for? How can I stop providing it? Am I using the right tool for this job? How can I check? Where's the security rep?" Until the IT industry treats security as the building industry treats OH&S, attempts to bail out the Titanic with a teacup with bottom-up "safe by default" functions will just encourage a false sense of security. -- Steve

On 17 September 2015 at 01:54, Steven D'Aprano <steve@pearwood.info> wrote:
Oh, *this* I like (minus the idea of introducing a CSPRNG - random.SystemRandom will be a good choice for this task). "Is it an important secret?" is a question anyone can answer, so simply changing the proposed name addresses all my concerns regarding having to ask people to learn how to answer a difficult question that isn't directly related to what they're trying to do. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On September 16, 2015 at 11:55:48 AM, Steven D'Aprano (steve@pearwood.info) wrote:
- We add at least one CSPRNG. I leave it to the crypto-wonks to decide which.
We already have a CSPRNG via os.urandom, and importantly we don't have to decide which implementation it is, because the OS provides it and is responsible for it. I am against adding a userspace CSPRNG as anything but a possible implementation detail of making a CSPRNG the default for random.py. If we're not going to change the default, then I think adding a userspace CSPRNG is jsut adding a different footgun. That's OK though, becuase os.urandom is a pretty great CSPRNG.
Forcing the user to make a choice isn’t a bad option from a security point of view. Most people will prefer to use the secure one by default even if they don't know better, the problem right now is that there is a "default", and that default is unsafe so people aren't forced to make a choice, they are given a choice with the option to go and make a choice later. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 16 September 2015 at 16:54, Steven D'Aprano <steve@pearwood.info> wrote:
I love this idea. The name is perfect, and your motivational discussion fits exactly how I think we should be approaching security. Would it also be worth having secrets.password(alphabet, length) - generate a random password of length "length" from alphabet "alphabet". It's not going to cover every use case, but it immediately becomes the obvious answer to all those "how do I generate a password" SO questions people keep pointing at. Also, a backport version could be made available via PyPI. I don't see why the module couldn't use random.SystemRandom as its CSPRNG (and as a result be pure Python) but that can be an implementation detail the security specialists can argue over if they want. No need to expose it here (although if it's useful, republishing (some more of) its API without exposing the implementation, just like the proposed secrets.choice, would be fine). Paul.

On 16.09.2015 17:54, Steven D'Aprano wrote:
+1 on the idea (not sure about the name, though :-)) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 16 2015)
2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 2 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

[Steven D'Aprano <steve@pearwood.info>, on "secrets"] +1 on everything. Glad _that's_ finally over ;-) One tech point:
The OpenBSD arc4random() has a very sparse API, but gets this part exactly right: uint32_t arc4random_uniform(uint32_t upper_bound); arc4random_uniform() will return a single 32-bit value, uniformly distributed but less than upper_bound. This is recommended over constructions like “arc4random() % upper_bound” as it avoids "modulo bias" when the upper bound is not a power of two. In the worst case, this function may consume multiple iterations to ensure uniformity; see the source code to understand the problem and solution. In Python, there's no point to the uint32_t restrictions, and the function is already implemented for arbitrary bigints via the current (but private) Random._randbelow() method, whose implementation could be simplified for this specific use. That in turn relies on the .getrandbits(number_of_bits) method, which SystemRandom overrides. So getrandbits() is the fundamental primitive. and SystemRandom already implements that based on .urandom() results. An OpenBSD-ish random_uniform(upper_bound) would be a "nice to have", but not essential.
+ secrets.choice similarly uses the CSPRNG.
Apart from error checking, that's just: def choice(seq): return seq[self.random_uniform(len(seq))] random.Random already does that (and SystemRandom inherits it), although spelled with _randbelow().

On 15/09/15 16:47, Nick Coghlan wrote:
I do not think these names are helpful. The purpose was to increase security, not confuse the user even more. What does "seedable" mean? Secure as in ChaCha20? Insecure as in MT19937? Something else? A name like "seedable" does not convey any useful information about the security to an un(der)informed web developer. A name like "random.system" does not convey any information about numerical applicability to an un(der)informed researcher. The module names should rather indicate how the generators are intended to be used. I suggest: random.crypto.* (os.urandom, ChaCha20, Arc4Random) random.numeric.* (Mersenne Twister, PCG, XorShift) Deprecate random.random et al. with a visible warning. That should convey the message. Sturla

I had to check out of the mega-threads, but I really don't like the outcome (unless this PEP is just the first of several competing proposals). The random module provides a useful interface – a random() function and a large variety of derived functionality useful for statistics programming (e.g. uniform(), choice(), bivariate(), etc.). Many of these have significant mathematical finesse in their implementation. They are all accessing shared state that is kept in a global variable in the module, and that is a desirable feature (nobody wants to have to pass an extra variable just so you can share the state of the random number generator with some other code). I don’t want to change this API and I don’t want to introduce deprecation warnings – the API is fine, and the warnings will be as ineffective as the warnings in the documentation. I am fine with adding more secure ways of generating random numbers. But we already have random.SystemRandom(), so there doesn’t seem to be a hurry? How about we make one small change instead: a way to change the default instance used by the top-level functions in the random module. Say, random.set_random_generator(<instance>) This would require the global functions to use an extra level of indirection, e.g. instead of random = _inst.random we’d change that code to say def random(): return _inst.random() (and similar for all related functions). I am not worried of the cost of the indirection (and if it turns out too expensive we can reimplement the module in C). Then we could implement def set_random_generator(instance): global _inst _inst = instance We could also have a function random.use_secure_random() that calls set_random_generator() with an instance of a secure random number generator (maybe just SystemRandom()). We could rig things so that once use_secure_random() has been called called, set_random_generator() will throw an exception (to avoid situations where a library module attempts to make the shared random generator insecure in a program that has declared that it wants secure random). It would also be fine for SystemRandom (or at least whatever is used by use_secure_random(), if SystemRandom cannot change for backward compatibility reasons) to raise an exception when seed(), setstate() or getstate() are called. Of course modules are still free to use their own instances of the Random class. But I don’t see a reason to mess with the existing interface. -- --Guido van Rossum (python.org/~guido)

On September 15, 2015 at 1:34:56 PM, Guido van Rossum (guido@python.org) wrote:
The problem isn't so much that there isn't a way of securely generating random numbers, but that the module, as it is right now, guides you towards using an insecure source of random numbers rather than a secure one. This means that unless you're familar with the random module or reading the online documentation you don't really have any idea that ``random.random()`` isn't secure. This is an attractive nuisance for anyone who *doesn't* need deterministic output from their random numbers and leads to situations where people are incorrectly using MT when they should be using SystemRandom because they don't know any better. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Tue, Sep 15, 2015 at 10:50 AM, Donald Stufft <donald@stufft.io> wrote:
That feels condescending, as does the assumption that (almost) every naive use of randomness is somehow a security vulnerability. The concept of secure vs. insecure sources of randomness isn't *that* hard to grasp. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum writes:
It is, but it's also accurate: there's plenty of anecdotal evidence that this actually happens, specifically that most of the recipes for password generation on SO silently fall back to a deterministic PRNG if SystemRandom is unavailable, and the rest happily start with random.random. Not only are people apparently doing a wrong thing here, they are eagerly teaching others to do the same. (There's also the possibility that the bad guys are seeding SO with backdoors in this way, I guess.)
as does the assumption that (almost) every naive use of randomness is somehow a security vulnerability.
This is a strawman. None of the advocates of this change makes that assumption. The advocates proceed from the (basically unimpeachable) assumptions that (1) the attacker only has to win once, and (2) they are out there knocking on a lot of doors. Then the questionable assumption is that (3) the attackers are knocking on *this* door. RC4 was at one time one of the best crypto algorithms available, but it also induced the WEP fiasco, and a scramble for a new standard. The question is whether we wait for a "Python security fiasco" to do something about this situation. Waiting *is* an option; the arguments that RNGs won't be a "Python security fiasco" before Python 4 is released are very plausible[1], and the overhead of a compatibility break is not negligible (though Paul Moore himself admits it's probably not huge, either). But the risk of a security fiasco (probably in a scenario not mentioned in this thread) is real. The arguments of the opponents of the change amount to "I have confirmed that the probability it will happen to me is very small, therefore the probability it will happen to anyone is small", which is, of course, a fallacy.
The concept of secure vs. insecure sources of randomness isn't *that* hard to grasp.
Once one *tries*. Read some of Paul Moore's posts, and you will discover that the very mention of some practice "improving security" immediately induces a non-trivial subset of his colleagues to start thinking about how to avoid doing it. I am almost not kidding; according to his descriptions, the situation in the trenches is very nearly that bad. Security is evidently hated almost as much as spam. If random.random were to default to an unseedable nondeterministic RNG, the scientific users would very quickly discover that (if not on their own, when their papers get rejected). On the other hand, inappropriate uses are nowhere near so lucky. In the current situation, the programs Just Work Fine (they produce passwords that no human would choose for themselves, for example), and noone is the wiser unless they deliberately seek the information. It seems to me that, given the "in your face" level of discoverability that removing the state-access methods would provide, backward compatibility with existing programs is the only real reason not to move to "secure" randomness by default. In fact "secure" randomness is *higher*-quality for any purpose, including science. Footnotes: [1] Cf. Tim Peters' posts especially, they're few and where the information content is low the humor content is high. ;-)

Sorry for the self-followup; premature send. Stephen J. Turnbull writes:
In fact "secure" randomness is *higher*-quality for any purpose, including science.
It does need to be acknowedged that scientists need replicability for unscientific reasons: (1) some "scientists" lie (cf. the STAP cell controversy), and (2) as a regression test for their simulation software. But an exact replication of an "honest" simulation is scientifically useless!

On 16 September 2015 at 11:16, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Yep, hence things like http://stopdisablingselinux.com/ SELinux in enforcing mode operates on a very simple principle: we should know what system resources we expect our applications to access, and we should write that down in a form the computer understands so it can protect us against attackers trying to use that application to do something unintended (like steal user information). However, what we've realised as an industry is that effective security systems have to be *transparent* and they have to be *natural*. So in a containerised world, SELinux isolates containers from each other, but if you're writing code that runs *in* the container, you don't need to worry about it - from inside the container, it looks like SELinux isn't running. The traditional security engineering approach of telling people "You're doing it wrong" just encourages them to avoid talking to security people [1], rather than encouraging them to improve their practices [2]. Hence the proposal in PEP 504 - my goal is to make the default behaviour of the random module cryptographically secure, *without* unduly affecting the use cases that need reproducibility rather than cryptographic security, while still providing at least a nudge in the direction of promoting security awareness. Changing the default matters more to me than the nudge, so I'd be prepared to drop that part. Regards, Nick. [1] http://sobersecurity.blogspot.com.au/2015/09/everyone-is-afraid-of-us.html [2] http://sobersecurity.blogspot.com.au/2015/09/being-nice-security-person.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16 September 2015 at 05:00, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't know if it's still true, but most Oracle database installation instructions state "disable SELinux" as a basic pre-requisite. This is a special case of a more general issue, which is that the "assign only those privileges that you need" principle is impossible to implement when you are working with proprietary software that contains no documentation on what privileges it needs, other than "admin rights". (Actually, I just checked - it looks like the official Oracle docs no longer suggest disabling SELinux. But I bet they still don't give you all the information you need to implement a tight security policy without a fair amount of "try it and see what breaks"...) Even in open source, people routinely run "sudo pip install". Not "make the Python site-packages read/write", which is still wrong, but which at least adheres to the principle of least privilege, but "give me root access". How many people get an app for their phone, see "this app needs <long list of permissions>" and has any option other than to click "yes" or discard the app? Who does anything with UAC on Windows other than blindly click "yes" or disable it altogether? Not because they don't understand the issues (certainly, many don't, but some do) but rather because there's really no other option? In these contexts, "security" is the name for "those things I have to work around to do what I'm trying to do" - by disabling it, or blindly clicking "yes", or insisting I need admin rights. Or another example. Due to a password expiry policy combined with a lack of viable single sign on, I have to change upwards of 50 passwords at least once every 4 weeks in order to be able to do my job. And the time to do so is considered "overhead" and therefore challenged regularly. So I spend a lot of time looking to see if I can automate password changes (which is *definitely* not good practice). I'm sure others do things like using weak passwords or reusing passwords. Because the best practice simply isn't practical in that context. Nobody in the open source or security good practices communities even has an avenue to communicate with the groups involved in this sort of thing. At least as far as I know. I do what I can to raise awareness, but it's a "grass roots" exercise that typically doesn't reach the people with the means to actually change anything. Of course, nobody in this environment uses Python to build internet-facing web applications, either. So I'm not trying to argue that this should drive the question of the RNG used in Python. But at the same time, I am trying to sell Python as a good tool for automating business processes, writing administrative scripts and internal applications, etc. So there is a certain link... Sorry - but it's nice to vent sometimes :-) Paul

On 16 September 2015 at 19:42, Paul Moore <p.f.moore@gmail.com> wrote:
Fortunately, that's no longer the case. Open source based development models are going mainstream, and while there's still a lot of work to do, cases like the US Federal government requiring the creation of open source prototypes as part of a bidding process are incredibly heartening (https://18f.gsa.gov/2015/08/28/announcing-the-agile-BPA-awards/). On the security side, folks are realising that the "You can't do that, it's a security risk" model is a bad one, and hence favoring switching to a model more like "We can help you to minimise your risk exposure while still enabling you to do what you want to do". So while it's going to take time for practices like those described in https://playbook.cio.gov/ to become a description of "the way the IT industry typically works", the benefits are so remarkable that it's a question of "when" rather than "if".
Right, helping Red Hat's Python maintenance team to maintain that kind of balance is one aspect of my day job, hence my interest in https://www.python.org/dev/peps/pep-0493/ as a nicer migration path when backporting the change to verify HTTPS certificates by default. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries. Yes lots of companies got hacked. What's the evidence that a language's default RNG was involved? IIUC the best practice for password encryption (to make cracking using a large word list harder) is something called bcrypt; maybe next year something else will become popular, but the default RNG seems an unlikely candidate. I know that in the past the randomness of certain protocols was compromised because the seeding used a timestamp that an attacker could influence or guess. But random.py seeds MT from os.urandom(2500). So what's the class of vulnerabilities where the default RNG is implicated? Tim's proposal is simple: create a new module, e.g. safefandom, with the same API as random (less seed/state). That's it. Then it's a simple import change away to do the right thing, and we have years to seed StackOverflow with better information before that code even hits the road. (But a backport to Python 2.7 could be on PyPI tomorrow!) -- --Guido van Rossum (python.org/~guido)

[Guido]
There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries.
Which is a shame, since the chatter here is of much higher quality than in the actual primaries ;-)
Yes lots of companies got hacked. What's the evidence that a language's default RNG was involved?
Nobody cares whether there's evidence of actual harm. Just that there _might_ be, and even if none identifiable now, then maybe in the future. There is evidence of actual harm from RNGs doing poor _seeding_ by default, but Python already fixed that (I know, you already know that ;-) ). And this paper, from a few years ago, studying RNG vulnerabilities in PHP apps, is really good: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_... An interesting thing is that several of the apps already had a history of trying to fix security-related holes related to RNG (largely due to PHP's poor default seeding), but remained easily cracked. The primary recommendation there wasn't to make PHP's various PRNGs "crypto by magic", but for core PHP to supply "a standard" crypto RNG for people to use instead. As above, some of the app developers already knew darned well they had a history of RNG-related holes, but simply had no standard way to address it, and didn't have the _major_ expertise needed to roll their own.
1. Users doing their own poor seeding. 2. A hypothetical MT state-deducer (seemingly needing to be considerably more sophisticated than the already mondo sophisticated one in the paper above) to be of general use against Python. 3. "Prove there can't be any in the future. Ha! You can't." ;-)
Which would obviously be fine by me: make the distinction obvious at import time, make "the safe way" dead easy and convenient to use, give it anew name engineered to nudge newbies away from the "unsafe" (by contrast) `random`, and a new name easily discoverable by web search. There's something else here: some of these messages gave pointers to web pages where "security wonks" conceded that specific uses of SystemRandom were fine, but they couldn't recommend it anyway because it's too hard to explain what is or isn't "safe". "Therefore" users should only use urandom() directly. Which is insane, if for no other reason than that users would then invent their own algorithms to convert urandom() results into floats and ints, etc. Then they'll screw up _that_ part. But if "saferandom" were its own module, then over time it could implement its own "security wonk certified" higher level (than raw bytes) methods. I suspect it would never need to change anything from what the SystemRandom class does, but I'm not a security wonk, so I know nothing. Regardless, _whatever_ changes certified wonks deemed necessary in the future could be confined to the new module, where incompatibilities would only annoy apps using that module. Ditto whatever doc changes were needed. Also gone would be the inherent confusion from needing to draw distinctions between "safe" and "unsafe" in a single module's docs (which any by-magic scheme would only make worse). However, supplying a powerful and dead-simple-to-use new module would indeed do nothing to help old code entirely by magic. That's a non-goal to me, but appears to be the _only_ deal-breaker goal for the advocates. Which is why none of us is the BDFL ;-)

On September 16, 2015 at 11:48:12 AM, Tim Peters (tim.peters@gmail.com) wrote:
That was the documentation for PyCA's cryptography module, where the only use of random we needed was for an IV (which you can use the output of os.urandom directly) and for an integer, which you could just use int.from_bytes and the output of os.urandom (i.e. int.from_bytes(os.urandom(20), byteorder="big")). It wasn't so much a general recommendation against random.SystemRandom, just that for our particular use case os.urandom is either by itself fine, or with a tiny bit of code on top of it fine and that's easier to explain than to try to explain how to use the random module safely and just warn against it entirely. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

[Guido, on "saferandom"]
So if you or someone else (Chris?) wrote that up in PEP form I'd accept it.
I like Steven D'Aprano's "secrets" idea better, so it won't be me ;-) Indeed, if PHP had a secure "token generator" function built in, the paper in question would have found almost nothing of practical interest to write about.
I'd even accept adding a warning on calling seed() (but not setstate()).
Yield the width of an electron, and the universe itself will be too small to contain the eventual consequences ;-)

On 17 September 2015 at 00:26, Guido van Rossum <guido@python.org> wrote:
There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries.
There was still a fair bit of useful feedback in there, so I pushed a new version of the PEP that addresses it: * the submodule idea is gone * the module level API still delegates to random._inst at call time rather than import time * random._inst is a SystemRandom() instance by default * there's a new ensure_repeatable() API to switch it back to random.Random() * seed(), getstate() and setstate() all implicitly call ensure_repeatable() * the latter issue a warning recommending calling ensure_repeatable() explicitly The key user experience difference from the status quo is that this allows the "not suitable for security purposes" warning to be moved to a section specifically covering ensure_repeatable(), seed(), getstate() and setstate() rather than automatically applying to the entire random module. The reason it becomes reasonable to move the warning is that it changes the failure mode from "any use of the module API for security sensitive purposes" is a problem to "any use of the module API for security sensitive purposes is a problem if the application also calls random.ensure_repeatable()".
Reducing the search space for brute force attacks on things like: * randomly generated default passwords * password reset tokens * session IDs The PHP paper covered an attack on password reset tokens. Python's seeding is indeed much better, and Tim's mathematical skills are infinitely better than mine so I'm never personally going to win a war of equations with him. If you considered a conclusive proof of a break specifically targeting *CPython's* PRNG essential before considering changing the default behaviour (even given the almost entirely backwards compatible approach I'm now proposing), I'd defer the PEP with a note suggesting that practical attacks on security tokens generated with CPython's PRNG may be a topic of potential interest to the security community. The PEP would then stay deferred until someone actually did the research and demonstrated a practical attack.
If folks are reaching for a third party library anyway, we'd be better off point them at one of the higher levels ones like passlib or cryptography. There's also the aspect that something I'd now like to achieve is to eliminate the security warning that is one of the first things people currently see when they open up the random module documentation: https://docs.python.org/3/library/random.html While I think that warning is valuable given the current default behaviour, it's also inherently user hostile for beginners that actually *do* read the docs, as it raises questions they don't know how to answer: "The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator." Switching the default means that the question to be asked is instead "Do you need repeatability?", which is *much* easier question, and we only need to ask it in the documentation for ensure_repeatable() and the related functions that call that implicitly. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

[Guido]
[Nick Coghlan <ncoghlan@gmail.com>]
Note that, in context, "saferandom" _would_ be a standard module in a future Python 3 feature release. But it _could_ be used literally tomorrow by anyone who wanted a head start, whether in a current Python 2 or Python 3. And if pieces of `passlib` and/or `cryptography` are thought to be essential for best practice, cool, then `saferandom` could also become a natural home for workalikes. Would you really want to _ever_ put such functions in the catch-all "random" module? The docs would become an incomprehensible mess.

On 17 September 2015 at 02:09, Tim Peters <tim.peters@gmail.com> wrote:
My main objection here was the name, so Steven's suggestion of calling such a module "secrets" with a suitably crafted higher level API rather than replicating the entire random module API made a big difference. We may even be able to finally give hmac.compare_digest a more obvious home as something like "secrets.equal". I'll leave PEP 504 as Draft for now, but I currently expect I'll end up withdrawing it in favour of Steven's idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 16 Sep 2015 at 09:10 Tim Peters <tim.peters@gmail.com> wrote:
+1 on the overall idea, although I would rather the module be named random.safe in the stdlib ("namespaces are one honking great idea" and it helps keep the "safer" version of random near the "unsafe" version in the module index which makes discovery easier). And as long as the version on PyPI stays Python 2/3 compatible people can just rely on the saferandom name until they drop Python 2 support and then just update their imports.
So, a PEP for this to propose which random algorithm to use (I have at least heard chacha/ch4random and some AES thing bandied about as being fast)? And if yes to a PEP, who's writing it? And then who is writing the implementation in the end?

On Sep 16, 2015 9:54 AM, "Brett Cannon" <brett@python.org> wrote:
Without repeating my somewhat satirical long name, I think "safe" is a terrible name because it makes a false promise. However, the name "secrets" is a great name. I think a top-level module is better than "random.secrets" because not everything related to secrets is related to randomness. But that detail is minor. Letting the documentation of "secrets" discuss the current state of cryptanalysis on the algorithms and protocols contained therein is the right place for it. With prominent dates attached to those discussions.

[Tim]
[Brett Cannon <brett@python.org>]
+1 on the overall idea, although I would rather the module be named random.safe in the stdlib ("namespaces are one honking great idea"
Ah, grasshopper, there's a reason that one is last in PEP 20. "Flat is better than nested" is the one - and only one - that _obviously_ applies here ;-)
I'd much rather see Steven D'Aprano's "secrets" idea pursued: solve "the problems" on their own terms directly.
os.urandom() is the obvious thing to build on, and it's already there. If alternatives are desired (which they may well be - .urandom() is sloooooooow on many systems), that can be addressed later. Before then, speed probably doesn't matter for most plausibly appropriate uses.
And if yes to a PEP, who's writing it? And then who is writing the implementation in the end?
Did you just volunteer? Great! Thanks ;-) OK, Steven already volunteered to write a PEP for his proposal.

On 17 September 2015 at 04:55, Tim Peters <tim.peters@gmail.com> wrote:
As far as implementation goes, based on a separate discussion at https://github.com/pyca/cryptography/issues/2347, I believe the essential cases can all be covered by: def random_bits(bits): return os.urandom(bits//8) def random_int(bits): return int.from_bytes(random_bits(bits), byteorder="big") def random_token(bits): return base64.urlsafe_b64encode(random_bits(bits)).decode("ascii") def random_hex_digits(bits): return binascii.hexlify(random_bits(bits)).decode("ascii") So if you want a 128 bit (16 bytes) IV, you can just write "secrets.random_bits(128)". Examples of all four in action:
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

[Nick Coghlan <ncoghlan@gmail.com>]
Probably better to wait until Steven starts a new thread about his PEP (nobody is ever gonna look at _this_ thread again ;-) ). Just two things to note: 1. Whatever task-appropriate higher-level functions people want, as you've shown "secure" implementations are easy to write for someone who knows what's available to build on. It will take 10000 times longer for people to bikeshed what "secrets" should offer than to implement it ;-) 2. I'd personally be surprised if a function taking a "number of bits" argument silently replaced argument `bits` with `bits - bits % 8`. If the app-level programmers at issue can't think in terms of bytes instead (and use functions with a `bytes` argument), then, e.g., better to raise an exception if `bits % 8 != 0` to begin with. Or to round up, taking "bits" as meaning "a number of bytes covering _at least_ the number of bits asked for".

On 18 September 2015 at 01:11, Tim Peters <tim.peters@gmail.com> wrote:
Agreed, although the 4 I listed are fairly well-credentialed - the implementations of the first two (raw bytes and integers) are the patterns cryptography.io uses, the token generator is comparable to the Django one (with a couple of extra punctuation characters in the alphabet), and the hex digit generator is the Pyramid one. You can get more exotic with full arbitrary alphabet password and passphrase generators, but I think we're getting beyond stdlib level functionality at that point - it's getting into the realm of password managers and attack software.
Yeah, I took a shortcut to keep them all as pretty one liners. A proper rand_bits with that API would look something like: def rand_bits(bits): num_bytes, add_byte = divmod(bits) if add_byte: num_bytes += 1 return os.urandom(bits) Compared to the os.urandom() call itself, the bits -> bytes calculation should disappear into the noise from a speed perspective (and a JIT compiled runtime like PyPy could likely optimise it away entirely). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Sep 17, 2015, at 12:36, Nick Coghlan wrote:
I think it's important to at least have a way to get a random number in a range that isn't a power of two, since that's so easy to get wrong. Even the libc arc4random API has that in arc4random_uniform. At that point people can build their own arbitrary alphabet password generators as one-liners.

[Tim]
[Nick Coghlan <ncoghlan@gmail.com>]
I will immodestly claim that nobody needs to be a crypto-wonk to see that these implementations are exactly as secure (or insecure) as the platform urandom(): in each case, it's trivial to invert the output to recover the exact bytes urandom() returned. So if there's any attack against the outputs, that's also an attack against what urandom() returned. The outputs just spell what urandom returned using a different alphabet. For the same reason, e.g., it would be fine to replace each 0 bit in urandom's result with the string "egg", and each 1 bit with the string "turtle". An attack on the output of that is exactly as hard (or easy) as an attack on the output of urandom. Obvious, right? It's only a little harder to see that the same is true of even the fanciest of your 4 functions. Where you _may_ get in trouble is creating a non-invertible output. Like: def secure_int(nbytes): n = int.from_bytes(os.urandom(nbytes), "big") return n - n That's not likely to be useful ;-)
I'll leave that for the discussion of Steven's PEP. I think he was on the right track to, e.g., suggest a secure choice() as one his few base building blocks. It _does_ take some expertise to implement a secure choice() correctly, but not so much from the crypto view as from the free-from-statistical-bias view. SystemRandom.choice() already gets both right.
You should really be calling that with "num_bytes" now ;-)
Goodness - "premature optimization" already?! ;-) Fastest in pure Python is likely num_bytes = (bits + 7) >> 3 But if I were bikeshedding I'd question why the function weren't: def rand_bytes(nbytes): return os.urandom(nbytes) instead. A rand_bits(nbits) that meant what it said would likely also be useful:

Nick Coghlan <ncoghlan@...> writes:
I think you want a little bit more flexibility than that, because the allowed characters may depend on the specific protocol (of course, people can use the hex digits version, but the output is longer). (quite a good idea, that "secrets" library - I wonder why nobody proposed it before ;-)) Regards Antoine.

On Wed, Sep 16, 2015, at 11:54, Nick Coghlan wrote:
* random._inst is a SystemRandom() instance by default
He has a point on the performance issue. The difference between Random and SystemRandom on my machine is significantly more than an order of magnitude. (Calling libc's arc4random with ctypes was roughly in the middle, though I *suspect* a lot of that was due to ctypes overhead).

On September 16, 2015 at 1:10:09 PM, Random832 (random832@fastmail.com) wrote:
I did the benchmark already: https://bpaste.net/show/79cc134a12b1 Using this code: https://github.com/dstufft/randtest However, using anything except for urandom is warned against by most security experts I’ve talked to. Most of them are only OK with it, if it means we’re using a CSPRNG by default in the random.py module, but not if it’s not the default. Even then, one of them thought that using a userspace CSPRNG instead of urandom was a bad idea (The rest though it was better than the status quo). They all agreed that splitting the namespace was a good idea. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote:
The output of random.random today when it's not seeded / seeded with None isn't _really_ deterministic - you can't reproduce it, after all, without modifying the code (though in principle you could do seed(None)/getstate the first time and then setstate on subsequent executions - it may be worth supporting this use case?) - so changing it isn't likely to affect anyone - anyone needing MT is likely to also be using the seed functions.
random.set_random_generator(<instance>)
What do you think of having calls to seed/setstate(/getstate?) implicitly switch (by whatever mechanism) to MT? This could be done without a deprecation warning, and would allow existing code that relies on reproducible values to continue working without modification? [indirection in global functions]...
(and similar for all related functions).
global getstate/setstate should also save/replace the _inst or its type; at least if it's a different type than it was at the time the state was saved. For backwards compatibility in case these are pickled it could use the existing format when _inst is the current MT implementation, and accept these in setstate.
SystemRandom already raises an exception when getstate and setstate are called.

On Tue, Sep 15, 2015 at 11:25 AM, Random832 <random832@fastmail.com> wrote:
Yes, that's how I would do it (better than using a weak seed).
Or they could just make a lot of random() calls and find their performance down the drain (like what happened in the tracker issue that started all this: http://bugs.python.org/issue25003).
I happen to believe that MT's performance is a feature of the (default) API, and this would still be considered breakage (again, as in that issue). [indirection in global functions]...
Great! -- --Guido van Rossum (python.org/~guido)

I commonly use random.some_distribution() as a quick source of "randomness" knowing full well that it's not cryptographic. Moreover, I usually do so initially without setting a seed. The first question I want to answer is "does this random process behave roughly as I expect?" But in the back of my mind is always the thought, "If/when I want to reuse this I'll add a seed for reproducibility". It would never occur to me to reach for the random module if I want to do cryptography. It's a good and well established API that currently exists. Sure, add a submodule random.crypto (or whatever name), but I'm -1 on changing anything whatsoever on the module functions that are well known. On Sep 15, 2015 11:26 AM, "Random832" <random832@fastmail.com> wrote:

How about the following. We add a fast secure random generator to the stdlib as an option, and when it has proven its worth a few releases from now we consider again whether the default random() can be made secure without breaking anything. On Tue, Sep 15, 2015 at 12:43 PM, David Mertz <mertz@gnosis.cx> wrote:
-- --Guido van Rossum (python.org/~guido)

On Sep 15, 2015 1:19 PM, "Guido van Rossum" <guido@python.org> wrote:
How about the following. We add a fast secure random generator to the
stdlib as an option, and when it has proven its worth a few releases from now we consider again whether the default random() can be made secure without breaking anything. If we have a fast secure RNG, then the standard Random object might as well at least use it by default until someone actually sets or reads the state (and then switch to MT at that point). Until one of these events happens, the two RNGs are indistinguishable, and this would be a 100% backwards compatible change. (It might even make sense to backport to 2.7.) The limitation is that if library A uses the global random object without seeding in a security sensitive context, and library B uses seeding, then a program that just uses library A will be secure, but if it then starts using library B it will become insecure. But this is still better than the current situation where library A is always insecure. The only case where this would actually have a downside compared to status quo (assuming arc4random lives up to it's reputation for speed etc) is if people start assuming that the default random object is in fact secure and intentionally choosing to use it in security sensitive situations. But hopefully people who know enough to realize that this is a decision they need to make will also read the docs where it clearly states that this is only a best-effort kind of hardening mechanism and that using random.Random/the global methods for cryptographic purposes is still a bug. -n

A pseudo-randomly selected recent quote:
It would never occur to me to reach for the random module if I want to do cryptography.
It's sad that so many of the opponents of this change make this kind of comment sooner or later. Security is (rarely) about *any* of *us*! Most of *us* don't need it (if we do, our physical or corporate security has already been compromised), most of *us* understand it, a somewhat smaller fraction of *us* behave in habitually secure ways (at the level we practice oral hygiene, say). That doesn't mean that security has to be #1 always and everywhere in designing Python, but I find it pretty distressing that apparently a lot of people either don't understand or don't care about what's at stake in these kinds of decisions *for the rest of the world*. The reality is that security that is not on by default is not secure. Any break in a dike can flood a whole town. The flip side is that security has costs, specifically the compatibility break, and since security needs to be on by default, the aggregate burden should be *presumed large* (even a small burden is spread over many users). Nevertheless, I think that the arguments to justify this change are pretty good: (1) The cost of adapting per program seems small, and seems to restricted to a class of users (software engineers doing regression testing and scientists doing simulations) who probably can easily make the change locally. Nick's proto-PEP is specifically designed so that there will be no cost to naive users (kids writing games etc) who don't need access to state. Caveat: there may be a performance hit for some naive users. That can probably be avoided with an appropriate choice of secure RNG, but that hasn't actually been benchmarked AFAIK. (2) ISTM there are no likely attack vectors due to choice of default RNG in random.random, based on Tim's analysis, but AFAICS he's unwilling to say it's implausible that they exist. (Sorry for the double negative!) I take this to mean that there may be real risk. (3) The anecdotal evidence that the module's current default is frequently misused is strong (the StackOverflow recipes for password generation). Two out of three ain't bad. YMMV, of course.

[Stephen J. Turnbull <stephen@xemacs.org>]
Oh, _many_ attacks are possible. Many are even plausible. For example, while Python's _default_ seeding is based on urandom() setting MT's entire massive state (no more secure way exists), a user making up their own seed is quite likely to do so in a way vulnerable to a "poor seeding" attack. "Password generators" should be the least of our worries. Best I can tell, the PHP paper's highly technical MT attack against those has scant chance of working in Python except when random.choice(x) is known to have len(x) a power of 2. Then it's a very powerful attack. But in PHP's idiomatic way of spelling random.choice(x) ("by hand", spelled out in the paper), it's _always_ a very powerful attack. In general, the more technical the attack, the more details matter. It's just no _fun_ to drone on about simple universally applicable brute-force attacks, so I'll continue to drone on about the PHP paper's sophisticated MT state-deducer ;-)

Tim Peters writes:
I'm not sure what you mean to say, but I don't count that as "due to choice of default RNG". That's foot-shooting of the kind we can't do anything about anyway, and if *that* is what Nick is worried about, I'm worried about Nick. ;-) *I* am more worried about attacks we don't know about yet (or at least haven't been mentioned in this thread), and maybe even haven't been invented yet. I presume Nick is, too.
That's genuinely comforting to read (even though it's the second or third time I've read it ;-). But I'm still nervous about the unknown.

[Stephen J. Turnbull <stephen@xemacs.org>]
[Tim]
[Stephen]
I'm not sure what you mean to say,
That the most obvious and easiest of RNG attacks remain possible regardless of anything that may be done, short of refusing to provide a seedable generator.
Oh no, _nobody_ is worried enough to "do something" about it. Not really. Note that in the PHP paper, 10 of the 16 apps scored "full attack" via pure brute force against poor seeding (figure 13, column 4.3). That's probably mostly due to that the versions of PHP tested inflicted poor _default_ seeding on users. I hope so. But there's no accounting of which apps did and didn't set their own seeds. They did note that "Joomla" attempted to repair a security bug by _removing_ its own seeding, in 2010. Which left it open to PHP's poor default seeding instead - which was nevertheless an improvement.
Fundamentally, I just don't see the sense in saying that someone who does their own seeding deserves whatever they get, while someone who uses an inappropriate generator in a security context should be saved from themself. I know, I read all the posts about why I'm wrong. I just don't buy it. There's no real substitute for understanding what you're doing, regardless of field. Yes, incompetence can cause great damage. But I'm not sure it does the world a real favor to possibly help a programmer incompetent to do a task keep working in the field a little longer. This isn't the only damage they can cause, and the longer they keep working in an area they don't understand the more damage they can do. The alternative? Learn how to use frickin' SystemRandom. It's not hard. Or get work for which they are competent.
That's genuinely comforting to read (even though it's the second or third time I've read it ;-)
If you read everything I ever wrote, it's the second. Although you may have _inferred_ it before I ever wrote it, from Nathaniel's "if I use the base64 or hex alphabets", instinctively leaping from "hmm ... 2**6 and ... 2**4" to "power of 2". In which case it could feel like the third time. And I used the phrase "power of 2" in a reply to you before, but in a context wholly unrelated to the PHP paper. That may even make it feel like the fourth time. Always happy to clarify ;-)
But I'm still nervous about the unknown.
Huh! I've heard humans are prone to that. In which case, there will always be something to be nervous about :-)

On 16 September 2015 at 08:23, Tim Peters <tim.peters@gmail.com> wrote:
Because that's never how these things go. You usually don't write a password generator that uses a non-CS PRNG in a security context, get discovered in the short term, and fired/reprimanded/whatever. Instead, one of the following things happens: - you get code review from a reviewer who knows the problem space and spots the problem. It gets fixed, you get educated, you're better prepared for the field. - you get code review from a reviewer who knows the problem space but *doesn't* spot the problem because Python isn't their first language. It doesn't get fixed and no-one notices for ten years until the problem is exploited, but you left the company 8 years ago and are now Head of Security Engineering at CoolStartupInc. - you don't get code review, or your reviewer is no better informed on this topic than you are. The problem doesn't get fixed and no-one notices ever because your program isn't exploited, or is only exploited in ways you never find out about because the rest of your security process sucked too, but you never find out about this. This is the ongoing problem with incompetence when it comes to security: the feedback loop is long and the negative event fires rarely, so most programmers never experience it. Most engineers have *never* experienced a security vulnerability in their own project, let alone had one exploited. Thus, most engineers never get the negative feedback loop that tells them that they don't know enough to do the work they're doing. Look at all the people who get this wrong. Consider haveibeenpwned.com for a minute. They list a fraction of the website databases that have been exposed due to security errors. At last count, that list includes (I removed more than half for the sake of length): - Adobe - Ashley Madison - Snapchat - Gawker - NextGenUpdate - Yandex - Forbes - Stratfor - Domino's - Yahoo - Telecom Regulatory Authority of India - Vodafone - Sony - HackingTeam - Bell - Minecraft Forum - UN Internet Governance Forum - Tesco Are you telling me that every engineer responsible for these is not working in the industry any more? I doubt it. In fact, I think most of these places can't even account for which engineer is responsible, and if they can odds are good they left long before the problem was exploited. So you're right, there is no real substitute for knowing what you're doing. But we cannot prevent programmers who don't know this stuff from writing the code that does it. We don't get to set the bar. We cannot throw GoReadABookOrTwo exceptions when inexperienced programmers type random.random, much as we would like too. With that said, we *can* construct an environment where a programmer has to have actually tried to hurt themselves. They have to have taken the gun off the desk, loaded it, disabled the safety, pointed it at their foot, and pulled the trigger. At that point we can say that we took all reasonable precautions to stop you doing what you did and you did it anyway: that's entirely on you. If you disable the safety settings, then frankly you are taking on the mantle of an expert: you are claiming you knew more than the person who developed the system, and if you don't then the consequences are on you. But if you use the defaults then you're just doing the most obvious thing, and from my perspective that should not be a punishable offence.

Tim Peters writes:
Strawman, or imprecise quotation if you prefer. Nobody said they *deserve* it AFAICR; I said we can't stop them. Strictly speaking, yes, we could. We could (and *I* think we *should*) make it much less obvious how to do it by removing the seed method and the seed argument to __init__. The problem there is backward compatibility. I don't see that Guido would stand for it. Dis here homeboy not a-gonna stick mah neck out heeya, suh. I suspect we might also want to provide helper functions to construct a state from a seed as used by some other simulation package, such as Python 3.4. ;-) Name them and document them as for use in replicating simulations done from those seeds. Nice self-documenting names like "construct_rng_internal_state_from_python_3_4_compatible_seed". There should be one for each version of Python, too (sssh! don't confuse the users with abstractions like "identical implementation").
"Think of it as evolution in action." Yeah, I sympathize. But realistically, Darwinian selection will take geological time, no? That is, in almost all cases where disaster strikes, the culprit has long since moved on[1]. Whoever gets the sack is unlikely to be him or her. More likely it will be whoever has been telling the shop that their product is an accident waiting to happen. :-( The way I think about it, though, is a variation on a theme by Nick. Specifically, the more attractive nuisances we can eliminate, the fewer things the uninitiated need to learn. Footnotes: [1] That's especially true in Japan, where I live. "Whodunnit" also gets fuzzed up by the tendency to group work and group think, and a value system that promotes "getting along with others" more than expertise. Child-proof caps are a GoodThang[tm]. ;-)

[Tim]
Ha! That's actually its worse case, although everyone missed that. I wrote a solver, and bumped into this while testing it. The rub is this line in _randbelow(): k = n.bit_length() # don't use (n-1) here because n can be 1 If n == 2**i, k is i+1 then, and ._randbelow() goes on to throw away half of all 32-bit MT outputs. Everyone before assumed it wouldn't throw any away. The best case for this kind of solver is when .choice(x) has len(x) one less than a power of 2, say 2**i - 1. Then k = i, and ._randbelow() throws away 1 of each of 2**i MT outputs (on average). For small i (say, len(x) == 63), every time I tried then the solver (which can only record bits from MT outputs it _knows_ were produced) found itself stuck with inconsistent equations. If len(x) = 2**20 - 1, _then_ it has a great chance of succeeding. There's about a chance in a million then that a single .choice() call will consume 2 32-bit MT outputs, It takes around 1,250 consecutive observations (of .choice() results) to deduce the starting state then, assuming .choice() never skips an MT output. The chance that no output was in fact skipped is about:
(1 - 1./2**20) ** 1250 0.9988086167972104
So that attack is very likely to succeed. So, until the "secrets" module is released, and you're too dense to use os.urandom(), don't pick passwords from a million-character alphabet ;-)

On Sep 15, 2015 7:23 PM, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
This feels somewhere between disingenuous and dishonest. Just like I don't use the random module for cryptography, I also don't use the socket module or the threading module for cryptography. Could a program dealing with sockets have security issues?! Very likely! Could a multithreaded one expose vulnerabilities? Certainly! Should we try to "secure" these modules for users who don't need to our don't know to think about security? Absolutely not!

On 16 September 2015 at 14:27, David Mertz <mertz@gnosis.cx> wrote:
That's great that you already know not to use the random module for cryptography. Unfortunately, this is a lesson that needs to be taught developer by developer: "don't use the random module for security sensitive tasks". When they ask "Why not?", they get hit with a wall of confusing arcana about brute force search spaces, and cryptographically secure random number generators, and get left with a feeling of dissatisfaction with the explanation because cryptography is one of the areas of computing where our intuitions break down so it takes years to retrain our brains to adopt the relevant mindset. Beginners don't even get that far, as they have to ask "What's a security sensitive task?" while they're still at a stage where they're trying to grasp the basic concept of computer generated random numbers (this is a concrete problem with the current situation, as a warning that says "Don't use this for <X>" is equivalent to "Don't use this" is you don't yet know how to identify "<X>"). It's instinctive for humans to avoid additional work when it provides no immediate benefit to us personally. This is a sensible time management strategy, but it's proved to be a serious problem in the context of computer security. An analogy that came up in one of the earlier threads is this: * as an individual lottery ticket holder, assuming you're going to win is a bad assumption * as a lottery operator, assuming someone, somewhere, is going to win is a good assumption Infrastructure security engineers are lottery operators - with millions of software development projects, millions of businesses demanding online web presences, and tens of millions of developers worldwide (with many, many more on the way as computing becomes a compulsory part of schooling), any potential mistake is going to be made and exploited eventually, we just have no way of predicting when or where. Unlike lottery operators (who get to set their prize levels), we also have no way of predicting the severity of the consequences. The *problem* we have is that individual developers are lottery ticket holders - the probability of *our* particular component being the one that gets compromised is vanishingly small, so the incentive to inflict additional work on ourselves to mitigate security concerns is similarly small (although some folks do it anyway out of sheer interest, and some have professional incentives to do so). So let's assume any given component has a 1 in 10000 chance of being compromised (0.01%). We only have to get to 100k components before the aggregate chance of at least one component being compromised rises to almost 100% (around 99.54%). It's at this point the sheer scale of the internet starts working against us - while it's currently estimated that there are currently only around 30 million developers (both professionals and hobbyists) worldwide, it's further estimated that there are 3 *billion* people with access to the internet. Neither of those numbers is going to suddenly start getting smaller, so we start getting interested in security risks with a lower and lower probability of being exploited. Accordingly, we want "don't worry about security" to be the *right answer* in as many cases as possible - there's always going to be plenty of unavoidable security risks in any software development project, so eliminating the avoidable ones by default makes it easier to focus attention on other areas of potential concern. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16 September 2015 at 03:33, Guido van Rossum <guido@python.org> wrote:
The proposed runtime warnings are just an additional harder to avoid nudge for folks that don't read the documentation, so I'd be OK with dropping them from the proposal. However, it also occurs to me there may be a better solution to eliminating them than getting people to change their imports: add a "random.ensure_seedable()" API that flips the default instance to the deterministic RNG without triggering the warning. For applications that genuinely want the determinism, warnings free 3.6+ compatibility would then look like: if hasattr(random, "ensure_seedable"): random.ensure_seedable()
That was my previous proposal. The problem with it is that it's much harder to test and support, as you have to allow for the global instance changing multiple times, and in multiple different directions. With the proposal in the PEP, there's only a single idempotent change that's possible: from the system RNG (used by default to eliminate the silent security failure) to the seedable RNG (needed for reproducibility). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 15, 2015 at 8:40 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Good, because I really don't want the warnings, nor the hack based on whether you call any of the seed/state-related methods.
I don't believe that seedability is the only thing that matters. MT is also over an order of magnitude faster than os.urandom() or SystemRandom.
Actually part of my proposal was a use_secure_random() that was also a one-way flag flip, just in the opposite direction. :-) With the proposal in the PEP, there's only a single idempotent change
I'd be much more comfortable if in 3.6 we only introduced a new way to generate secure random numbers that was as fast as MT. Once that has been in use for a few releases we may have a discussion about whether it's time to make it the default. Security isn't served well by panicky over-reaction. -- --Guido van Rossum (python.org/~guido)

On 16 September 2015 at 14:12, Guido van Rossum <guido@python.org> wrote:
Security isn't served well by panicky over-reaction.
Proposing a change in 2015 that wouldn't be released to the public until early 2017 or so isn't exactly panicking. (And the thing that changed for me that prompted me to write the PEP was finally figuring out a remotely plausible migration plan to address the backwards compatibility concerns, rather than anything on the security side) As I wrote in the PEP, this kind of problem is a chronic one, not an acute one, where security engineers currently waste a *lot* of their (and other people's) time on remedial firefighting - a security audit (or a breach investigation) detects a vulnerability, high priority issues get filed with affected projects, nobody goes home happy. Accordingly, my proposal is aimed as much at eliminating the perennial "But *why* can't I use the random module for security sensitive tasks?" argument as it is at anything else. I'd like the answer to that question to eventually be "Sure, you can use the random module for security sensitive tasks, so let's talk about something more important, like why you're collecting and storing all this sensitive personally identifiable information in the first place". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 15, 2015 11:00 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
I believe this attitude makes overall security WORSE, not better. Giving a false assurance that simply using a certain cryptographic building block makes your application secure makes it less likely applications will fail to undergo genuine security analysis. Hence I affirmatively PREFER a random module that explicitly proclaims that it is non-cryptographic. Someone who figures out enough to use random.SystemRandom, or a future crypto.random, or the like is more likely to think about why they are doing so, and what doing so does and does NOT assure them off.

The point here is that the closest we can come to PROTECTING users is to avoid making false promises to them. All this talk of "maybe, possibly, secure RNGs" (until they've been analyzed longer) is just building a house on sand. Maybe ChaCha20 is completely free of all exploits... It's new-ish, and no one has found any. The API we really owe users is to create a class random.BelievedSecureIn2015, and let users utilize that if they like. All the rest of the proposals are just invitations to create more security breaches... The specific thing that random.random and MT DOES NOT do. On Sep 16, 2015 1:29 AM, "Cory Benfield" <cory@lukasa.co.uk> wrote:

On 16 September 2015 at 17:43, David Mertz <mertz@gnosis.cx> wrote:
You're *describing the status quo*. This isn't a new concept, as it's the way our industry has worked since forever: 1. All the security features are off by default 2. The onus is on individual developers to "just know" when the work they're doing is security sensitive 3. Once they realise what they're doing is security sensitive (probably because a security engineer pointed it out), the onus is *still* on them to educate themselves as to what to do about it Meanwhile, their manager is pointing at the project schedule demanding to know why the new feature hasn't shipped yet, and they're in turn pointing fingers at the information security team blaming them for blocking the release until the security vulnerabilities have been addressed. And that's the *good* scenario, since the only people it upsets are the people working on the project. In the more typical cases where the security team doesn't exist, gets overruled, or simply has too many fires to try to put out, we get http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-bre... and http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ On the community project side, we take the manager, the product schedule and the information security team out of the picture, so folks never even get to find out that there are any problems with the approach they're taking - they just ship and deploy software, and are mostly protected by the lack of money involved (companies and governments are far more interesting as targets than online communities, so open source projects mainly need to worry about protecting the software distribution infrastructure that provides an indirect attack vector on more profitable targets). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Sep 16, 2015 at 03:59:04PM +1000, Nick Coghlan wrote: [...]
The answer to that question is *already* "sure you can use the random module". You just have to use it correctly. [Aside: do you think that, having given companies and people a "secure by default" solution that will hopefully prevent data breaches, that they will be *more* or *less* open to the idea that they shouldn't be collecting this sensitive information?] We've spent a long time taking about random() as regards to security, but nobody exposes the output of random directly. They use it as a building block to generate tokens and passwords, and *that's* where the breech is occurring. We shouldn't care so much about the building blocks and care more about the high-level tools: the batteries included. Look at the example given by Nathaniel: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_... What was remarkable about this is how many individual factors were involved in the attacks. It wasn't merely an attack on Mersenne Twister, and it is quite possible that had any of the other factors been changed, the attacks would have failed. E.g. the applications used MD5 hashes. What if they had used SHA-1? They leaked sensitive information such as PIDs and exposed the time that the random numbers where generated. They allowed the attackers to get as many connections as they wanted. Someone might argue that none of those other problems would matter if the PRNG was more secure. That's true, up to a point: you never know when somebody will come up with an attack on the CSPRNG. Previous generations of CSPRNGs, including RC4, have been "retired", and we must expect that the current generation will be too. It is a good habit to avoid leaking this sort of information (times, PIDs etc) even if you don't have a concrete attack in place, because you don't know when a concrete attack will be discovered. Today's CSPRNG is tomorrow's hopelessly insecure PRNG, but defence is depth is always useful. I propose that instead of focusing on changing the building blocks that people will use by default, we provide them with ready made batteries for the most common tasks, and provide a clear example of acceptable practices for making their own batteries. (As usual, the standard lib will provide batteries, and third-party frameworks or libraries can provide heavy-duty nuclear reactors.) I propose: - The random module's API is left as-is, including the default PRNG. Backwards compatibility is important, code-churn is bad, and there are good use-cases for a non-CSPRNG. - We add at least one CSPRNG. I leave it to the crypto-wonks to decide which. - We add a new module, which I'm calling "secrets" (for lack of a better name) to hold best-practice security-related functions. To start with, it would have at least these three functions: one battery, and two building blocks: + secrets.token to create password recovery tokens or similar; + secrets.random calls the CSPRNG; it just returns a random number (integer?). There is no API for getting or setting the state, setting the seed, or returning values from non-uniform distributions; + secrets.choice similarly uses the CSPRNG. Developers will still have to make a choice: "do I use secrets, or random?" If they're looking for a random token (or password?), the answer is obvious: use secrets, because the battery is already there. For reasons that I will go into below, I don't think that requiring this choice is a bad thing. I think it is a *good* thing. secrets becomes the go-to module for things you want to keep secret. random remains the module you use for games and simulations. If there is interest in this proposed secrets module, I'll write up a proto-PEP over the weekend, and start a new thread for the benefit of those who have muted this one. You can stop reading now. The rest is motivational rather than part of the concrete proposal. Still here? Okay. I think that it is a good thing to have developers explicitly make a choice between random and secrets. I think it is important that we continue to involve developers in security thinking. I don't believe that "secure by default" is possible at the application level, and that's what really matters. It doesn't matter if the developer uses a "secure by default" CSPRNG if the application leaks information some other way. We cannot possibly hope to solve application security from the bottom-up (although providing good secure tools is part of the solution). I believe that computer security is to the IT world what occupational health and safety is to the farming, building and manufacturing industries (and others). The thing about security is that, like safety, it is not a product. There is no function you can call to turn security on, no secure=True setting. It is a process and a mind-set, and everyone involved needs to think about it, at least a little bit. It took a long time for the blue collar industries to accept that OH&S was something that *everyone* has to be involved in, from the government setting standards to individual workers who have to keep their eyes open while on the job. Like the IT industry today, management's attitude was that safety was a cost that just ate into profits and made projects late (sound familiar?), and the workers' attitude was all too often "it won't happen to us". It takes experience and training and education to recognise dangerous situations on the job, and people die when they don't get that training. It is part of every person's job to think about what they are doing. I don't believe that it is possible to have "zero-thought security" any more than it is possible to have "zero-thought safety". The security professionals can help by providing ready-to-use tools, but the individual developers still have to use those tools correctly, and cultivate a properly careful mindset: "If I wanted to break into this application, what information would I look for? How can I stop providing it? Am I using the right tool for this job? How can I check? Where's the security rep?" Until the IT industry treats security as the building industry treats OH&S, attempts to bail out the Titanic with a teacup with bottom-up "safe by default" functions will just encourage a false sense of security. -- Steve

On 17 September 2015 at 01:54, Steven D'Aprano <steve@pearwood.info> wrote:
Oh, *this* I like (minus the idea of introducing a CSPRNG - random.SystemRandom will be a good choice for this task). "Is it an important secret?" is a question anyone can answer, so simply changing the proposed name addresses all my concerns regarding having to ask people to learn how to answer a difficult question that isn't directly related to what they're trying to do. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On September 16, 2015 at 11:55:48 AM, Steven D'Aprano (steve@pearwood.info) wrote:
- We add at least one CSPRNG. I leave it to the crypto-wonks to decide which.
We already have a CSPRNG via os.urandom, and importantly we don't have to decide which implementation it is, because the OS provides it and is responsible for it. I am against adding a userspace CSPRNG as anything but a possible implementation detail of making a CSPRNG the default for random.py. If we're not going to change the default, then I think adding a userspace CSPRNG is jsut adding a different footgun. That's OK though, becuase os.urandom is a pretty great CSPRNG.
Forcing the user to make a choice isn’t a bad option from a security point of view. Most people will prefer to use the secure one by default even if they don't know better, the problem right now is that there is a "default", and that default is unsafe so people aren't forced to make a choice, they are given a choice with the option to go and make a choice later. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 16 September 2015 at 16:54, Steven D'Aprano <steve@pearwood.info> wrote:
I love this idea. The name is perfect, and your motivational discussion fits exactly how I think we should be approaching security. Would it also be worth having secrets.password(alphabet, length) - generate a random password of length "length" from alphabet "alphabet". It's not going to cover every use case, but it immediately becomes the obvious answer to all those "how do I generate a password" SO questions people keep pointing at. Also, a backport version could be made available via PyPI. I don't see why the module couldn't use random.SystemRandom as its CSPRNG (and as a result be pure Python) but that can be an implementation detail the security specialists can argue over if they want. No need to expose it here (although if it's useful, republishing (some more of) its API without exposing the implementation, just like the proposed secrets.choice, would be fine). Paul.

On 16.09.2015 17:54, Steven D'Aprano wrote:
+1 on the idea (not sure about the name, though :-)) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 16 2015)
2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 2 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

[Steven D'Aprano <steve@pearwood.info>, on "secrets"] +1 on everything. Glad _that's_ finally over ;-) One tech point:
The OpenBSD arc4random() has a very sparse API, but gets this part exactly right: uint32_t arc4random_uniform(uint32_t upper_bound); arc4random_uniform() will return a single 32-bit value, uniformly distributed but less than upper_bound. This is recommended over constructions like “arc4random() % upper_bound” as it avoids "modulo bias" when the upper bound is not a power of two. In the worst case, this function may consume multiple iterations to ensure uniformity; see the source code to understand the problem and solution. In Python, there's no point to the uint32_t restrictions, and the function is already implemented for arbitrary bigints via the current (but private) Random._randbelow() method, whose implementation could be simplified for this specific use. That in turn relies on the .getrandbits(number_of_bits) method, which SystemRandom overrides. So getrandbits() is the fundamental primitive. and SystemRandom already implements that based on .urandom() results. An OpenBSD-ish random_uniform(upper_bound) would be a "nice to have", but not essential.
+ secrets.choice similarly uses the CSPRNG.
Apart from error checking, that's just: def choice(seq): return seq[self.random_uniform(len(seq))] random.Random already does that (and SystemRandom inherits it), although spelled with _randbelow().

On Wed, Sep 16, 2015 at 12:13 PM, Tim Peters <tim.peters@gmail.com> wrote:
[Steven D'Aprano <steve@pearwood.info>, on "secrets"]
+1 on everything. Glad _that's_ finally over ;-)
Yes. Thanks all! I'm looking forward to the new PEP. -- --Guido van Rossum (python.org/~guido)
participants (15)
-
Antoine Pitrou
-
Brett Cannon
-
Cory Benfield
-
David Mertz
-
Donald Stufft
-
Guido van Rossum
-
M.-A. Lemburg
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Random832
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sturla Molden
-
Tim Peters