Python's Source of Randomness and the random.py module Redux

Ok, I reached out to Theo de Raadt to talk to him about what he was suggesting without Guido having to play messenger and forward fragments of the email conversation. I'm starting a new thread because this email is rather long, and I'm hoping to divorce it a bit from the back and forth about a proposal that wasn't exactly what Theo was suggesting that is being discussed in the other thread. Essentially, there are three basic types of uses of random (the concept, not the module). Those are: 1. People/usecases who absolutely need deterministic output given a seed and for whom security properties don't matter. 2. People/usecases who absolutely need a cryptographically random output and for whom having a deterministic output is a downside. 3. People/usecases that fall somewhere in between where it may or may not be security sensitive or it may not be known if it's security sensitive. The people in group #1 are currently, in the Python standard library, best served using the MT random source as it provides exactly the kind of determinsm they need. The people in group #2 are currently, in the Python standard library, best served using os.urandom (either directly or via random.SystemRandom). However, the third case is the one that Theo's suggestion is attempting to solve. In the current landscape, the security minded folks will tell these people to use os.urandom/random.SystemRandom and the performance or otherwise less security minded folks will likely tell them to just use random.py. Leaving these people with a random that is not cryptographically safe. The questin then is, does it matter if #3 are using a cryptographically safe source of randomness? The answer is obviously that we don't know, and it's possible that the user doesn't know. In these cases it's typically best if we default to the more secure option and expect people to opt in to insecurity. In the case of randomness, a lot of languages (Python included) don't do that and instead they opt to pick the more peformant option first, often with the argument (as seen in the other thread) that if people need a cryptographically secure source of random, they'll know how to look for it and if they don't know how to look for it, then it's likely they'll have some other security problem. I think (and I believe Theo thinks) this sort of thinking is short sighted. Let's take an example of a web application, it's going to need session identifiers to put into a cookie, you'll want these to be random and it's not obvious on the tin for a non-expert that you can't just use the module level functions in the random module to do this. Another examples are generating API keys or a password. Looking on google, the first result for "python random password" is StackOverflow which suggests: ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N)) However, it was later edited to, after that, include: ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(N)) So it wasn't obvious to the person who answered that question that the random module's module scoped functions were not appropiate for this use. It appears that the original answer lasted for roughly 4 years before it was corrected, so who knows how many people used that in those 4 years. The second result has someone asking if there is a better way to generate a random password in Python than: import os, random, string length = 13 chars = string.ascii_letters + string.digits + '!@#$%^&*()' random.seed = (os.urandom(1024)) print ''.join(random.choice(chars) for i in range(length)) This person obviously knew that os.urandom existed and that he should use it, but failed to correctly identify that the random module's module scoped functions were not what he wanted to use here. The third result has this code: import string import random def randompassword(): chars=string.ascii_uppercase + string.ascii_lowercase + string.digits size=8 return ''.join(random.choice(chars) for x in range(size,12)) I'm not going to keep pasting snippets, but going through the results it is clear that in the bulk of cases, this search turns up code snippets that suggest there is likely to be a lot of code out there that is unknownly using the random module in a very insecure way. I think this is a failing of the random.py module to provide an API that guides users to be safe which was attempted to be papered over by adding a warning to the documentation, however like has been said before, you can't solve a UX problem with documentation. Then we come to why might we want to not provide a safe random by default for the folks in the #3 group. As we've seen in the other thread, this basically boils down to the fact that for a lot of users they don't care about the security properties and they just want a fast random-esque value. This particular case is made stronger by the fact that there is a lot of code out there using Python's random module in a completely safe way that would regress in a meaningful way if the random module slowed down. The fact that speed is the primary reason not to give people in #3 a cryptographically secure source of random by default is where we come back to the meat of Theo's suggestion. His claim is that invoking os.urandom through any of the interfaces imposes a performance penalty because it has to round trip through the kernel crypto sub system for every request. His suggestion is essentially that we provide an interface to a modern, good, userland cryptographically secure source of random that is running within the same process as Python itself. One such example of this is the arc4random function (which doesn't actually provide ARC4 on OpenBSD, it provides ChaCha, it's not tied to one specific algorithm) which comes from libc on many platforms. According to Theo, modern userland CSPRNGs can create random bytes faster than memcpy which eliminates the argument of speed for why a CSPRNG shouldn't be the "default" source of randomness. Thus the proposal is essentially: * Provide an API to access a modern userland CSPRNG. * Provide an implementation of random.SomeKindOfRandom that utilizes this. * Move the MT based implementation of the random module to random.DeterministicRandom. * Deprecate the module scoped functions, instructing people to use the new random.SomeKindofRandom unless they need deterministic random, in which case use random.DeterministicRandom. This can of course be tweaked one way or the other, but that's the general idea translated into something actionable for Python. I'm not sure exactly how I feel about it, but I certainly do think that the current situation is confusing to end users and leaving them in an insecure state, and that a minimum we should move MT to something like random.DeterministicRandom and deprecate the module scoped functions because it seems obvious to me that the idea of a "default" random function that isn't safe is a footgun for users. As an additional consideration, there are security experts who believe that userland CSPRNGs should not be used at all. One of those is Thomas Ptacek who wrote a blog post [1] on the subject. In this, Thomas makes the case that a userland CSPRNG pretty much always depends on the cryptographic security of the system random, but that it itself may be broken which means you're adding a second, single point of failure where a mistake can cause you to get non-random data out of the system. I had asked Theo about this, and he stated that he disagreed with Thomas about never using a userland CSPRNG and in his opinion that blog post was mostly warning people away from using something like MT in the userland and away from /dev/random (which is often the cause of people reaching for MT because /dev/random blocks which makes programs even slower). It seems to boil down to, do we want to try to protect users by default or at least make it more obvious in the API which one they want to use (I think yes), and if so do we think that /dev/urandom is "fast enough" for most people in group #3 and if not, do we agree with Theo that a modern userland CSPRNG is safe enough to use, or do we agree with Thomas that it's not and if we think that it is, do we use arc4random and what do we do on systems that don't have a modern userland CSPRNG in their libc. [1] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Deprecating the module-level functions has one problem for backward compatibility: if you're using random across multiple modules, changing them all from this: import random ... to this: from random import DeterministicRandom random = DeterministicRandom() ... gives a separate MT for each module. You can work around that by, e.g., providing your own myrandom.py that does that and then using "from myrandom import random" everywhere, or by stashing a random_inst inside the random module or builtins or something and only creating it if it doesn't exist, etc., but all of these are things that people will rightly complain about. One possible solution is to make DeterministicRandom a module instead of a class, and move all the module-level functions there, so people can just change their import to "from random import DeterministicRandom as random". (Or, alternatively, give it classmethods that create a singleton just like the module global.) For people who decide they want to switch to SystemRandom, I don't think it's as much of a problem, as they probably won't care that they have a separate instance in each module. (And I don't think there's any security problem with using multiple instances, but I haven't thought it through...) So, the change is probably only needed in DeterministicRandom. There are hopefully better solutions than that. But I think some solution is needed. People who have existing code (or textbooks, etc.) that do things the "wrong" way and get a DeprecationWarning should be able to easily figure out how to make their code correct. Sent from my iPhone

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
Of course, this brings to mind the fact that there's *already* an instance stashed inside the random module. At that point, you might as well just keep the module-level functions, and rewrite them to be able to pick up on it if you replace _inst (perhaps suitably renamed as it would be a public variable) with an instance of a different class. Proof-of-concept implementation: class _method: def __init__(self, name): self.__name__ = name def __call__(self, *args, **kwargs): return getattr(_inst, self.__name__)(*args, **kwargs) def __repr__(self): return "<random method wrapper " + repr(self.__name__) + ">" _inst = Random() seed = _method('seed') random = _method('random') ...etc...

On Sep 9, 2015, at 18:25, Random832 <random832@fastmail.com> wrote:
The whole point is to make people using the top-level functions see a DeprecationWarning that leads them to make a choice between SystemRandom and DeterministicRandom. Just making inst public (and dynamically switchable) doesn't do that, so it doesn't solve anything. However, it seems like there's a way to extend it to do that: First, rename Random to DeterministicRandom. Then, add a subclass called Random that raises a DeprecationWarning whenever its methods are called. Then preinitialize inst to Random(), just as we already to. Existing code will work, but with a warning. And the text of that warning or the help it leads to or the obvious google result or whatever can just suggest "add random.inst = random.DeterministicRandom() or random.inst = random.SystemRandom() at the start of your program". That has most of the benefit of deprecating the top-level functions, without the cost of the solution being non-obvious (and the most obvious solution being wrong for some use cases). Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst.

On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst.
+1. A single function call that replaces all the methods adds a minuscule constant to code size, run time, etc, and it's no less readable than assignment to a module attribute. (If anything, it makes it more clearly a supported operation - I've seen novices not realize that "module.xyz = foo" is valid, but nobody would misunderstand the validity of a function call.) ChrisA

On Sep 9, 2015, at 23:08, Chris Angelico <rosuav@gmail.com> wrote:
I was only half-serious about this, but now I think I like it: it provides exactly the fix people are hoping to fix by deprecating the top-level functions, but with less risk, less user code churn, a smaller patch, and a much easier fix for novice users. (And it's much better than my earlier suggestion, too.) See https://gist.github.com/abarnert/e0fced7569e7d77f7464 for the patch, and a patched copy of random.py. The source comments in the patch should be enough to understand everything that's changed. A couple things: I'm not sure the normal deprecation path makes sense here. For a couple versions, everything continues to work (because most novices, the people we're thing to help, don't see DeprecationWarnings), and then suddenly their code breaks. Maybe making it a UserWarning makes more sense here? I made Random a synonym for UnsafeRandom (the class that warns and then passes through to DeterministicRandom). But is that really necessary? Someone who's explicitly using an instance of class Random rather than the top-level functions probably isn't someone who needs this warning, right? Also, if this is the way we'd want to go, the docs change would be a lot more substantial than the code change. I think the docs should be organized around choosing a random generator and using its methods, and only then mention set_default_instance as being useful for porting old code (and for making it easy for multiple modules to share a single generator, but that shouldn't be a common need for novices).

On Sep 10, 2015, at 01:32, Serhiy Storchaka <storchaka@gmail.com> wrote:
Well, the goal of the deprecation idea was to eventually get people to explicitly use instances, so the fact that doesn't work out of the box is a good thing, not a problem. But for people just trying to retrofit existing code, all they have to do is call random.set_default_instance at the top of the main module, and all their other modules can just import what they need this way. Which is why it's better than straightforward deprecation.

On Thu, Sep 10, 2015 at 04:08:09PM +1000, Chris Angelico wrote:
Making monkey-patching the official, recommended way to choose a PRNG is a risky solution, to put it mildly. That means that at any time, some other module that is directly or indirectly imported might change the random number generators you are using without your knowledge. You want a crypto PRNG, but some module replaces it with MT. Or visa versa. Technically, it is true that (this being Python) they can do this now, just by assigning to the random module: random.random = lambda: 9 but that is clearly abusive, and if you write code to do that, you're asking for whatever trouble you get. There's no official API to screw over other callers of the random module behind their back. You're suggesting that we add one.
(If anything, it makes it more clearly a supported operation
Which is exactly why this is a terrible idea. You're making monkey- patching not only officially supported, but encouraged. That will not end well. -- Steve

On Sep 11, 2015, at 06:49, Steven D'Aprano <steve@pearwood.info> wrote:
But that's not the proposal. The proposal is to make explicitly passing around an instance the official, recommended way to choose a PRNG; monkey-patching is only the official, recommended way to quickly get legacy code working: once you see the warning about the potential problem and decide that the problem doesn't affect you, you write one standard line of code at the top of your main script instead of rewriting all of your modules and patching or updating every third-party module you use. As I said later, I think my later suggestion of just having a singleton DeterministicRandom instance (or even a submodule with the same interface) that you can explicitly import in place or random serves the same needs well enough, and is even simpler, and is more flexible (in particular, it can also be used for novices' "my first game" programs), so I'm no longer suggesting this. But that doesn't mean there's any benefit to mischaracterizing the suggestion (especially if Chris or anyone else still supports it even though I don't).

On September 9, 2015 at 8:01:17 PM, Donald Stufft (donald@stufft.io) wrote:
Ok, I've talked to an honest to god cryptographer as well as some other smart folks! Here's the general gist: Using a userland CSPRNG like arc4random is not advisable for things that you absolutely need cryptographic security for (this is group #2 from my original email). These people should use os.urandom or random.SystemRandom as they should be doing now. In addition os.urandom or random.SystemRandom is probably fast enough for most use cases of the random.py module, however it is true that using os.urandom/random.SystemRandom would be slower than MT. It is reasonable to use a userland CSPRNG as a "default" source of randomness or in cases where people care about speed but maybe not about security and don't need determinism. However, they've said that the primary benefit in using a userland CSPRNG for a faster cryptographically secure source of randomness is if we can make it the default source of randomness for a "probably safe depending on your app" safety net for people who didn't read or understand the documentation. This would make most uses of random.random and friends secure but not deterministic. If we're unwilling to change the default, but we are willing to deprecate the module scoped functions and force users to make a choice between random.SystemRandom and random.DeterministicRandom then there is unlikely to be much benefit to also adding a userland CSPRNG into the mix since there's no class of people who are using an ambiguous "random" that we don't know if they need it to be secure or deterministic/fast. So I guess my suggestion would be, let's deprecate the module scope functions and rename random.Random to random.DeterministicRandom. This absolves us of needing to change the behavior of people's existing code (besides deprecating it) and we don't need to decide if a userland CSPRNG is safe or not while still moving us to a situation that is far more likely to have users doing the right thing. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Thu, Sep 10, 2015 at 3:30 AM, Donald Stufft <donald@stufft.io> wrote: [...]
There is one use case that would be hit by that: the kid writing their first rock-paper-scissors game. A beginner who just learned the `if` statement isn't ready for a discussion of cryptography vs. reproducible results, and random.SystemRandom.random() would just become a magic incantation to learn. It would feel like requiring sys.stdout.write() instead of print(). Functions like paretovariate(), getstate(), or seed(), which require some understanding of (pseudo)randomness, can be moved to a specific class, but I don't think deprecating random(), randint(), randrange(), choice(), and shuffle() would not be a good idea. Switching them to a cryptographically safe RNG is OK from this perspective, though.

On Sep 10, 2015, at 00:35, Petr Viktorin <encukou@gmail.com> wrote:
Silently switching them could break a lot of code. I don't think there's any way around making them warn the user that they need to do something. I think the patch I just sent is a good way of doing that: the minimum thing they need to do is a one-liner, which is explained in the warning, and it also gives them enough information to check the docs or google the message and get some understanding of the choice if they're at all inclined to do so. (And if they aren't, well, either one works for the use case you're talking about, so let them flip a coin, or call random.choice.;))

Can I just ask what is the actual problem we are trying to solve here? Python has third party cryptography modules, that bring their own sources of randomness (or cryptography libraries that do the same). Python has a good random library for everything other than cryptography. Why in the heck are we trying to make the random module do something that it is already documented as being a poor choice, where there is already third party modules that do just this? Who needs cryptographic randomness in the standard library anyways (even though one line of code give you access to it)? Have we identified even ONE person who does cryptography in python who is kicking themselves that they cant use the random module as implemented? Is this just indulging a paranoid developer?

On September 10, 2015 at 5:21:29 AM, Alexander Walters (tritium-list@sdamon.com) wrote:
Because there are a situations where you need a securely generated randomness where you are *NOT* "doing cryptography". Blaming people for the fact that the random module has a bad UX that naturally leads them to use it when it isn't appropriate is a shitty thing to do. What harm is there in making people explicitly choose between deterministic randomness and secure randomness? Is your use case so much better than theirs that you thing you deserve to type a few characters less to the detriment of people who don't know any better? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 9/10/2015 07:40, Donald Stufft wrote:
API Breakage. This is not worth the break in backwards compatibility. My use case is using the API that has been available for... 20 years? And for what benefit? None, and it can be argued that it would do the opposite of what is intended (false sense of security and all).

On Wed, Sep 09, 2015 at 08:01:16PM -0400, Donald Stufft wrote: [...]
You're worried about attacks on the random number generator that produces the characters in the password? I think I'm going to have to see an attack before I believe that this is meaningful. Excluding PRNGs that are hopelessly biased ("nine, nine, nine, nine...") or predictable, how does knowing the PRNG help in an attack? Here's a password I just generated using your "corrected" version using SystemRandom: 06XW0X0X (Honest, that's exactly what I got on my first try.) Here's one I generated using the "bad" code snippet: V6CFKCF2 How can you tell them apart, or attack one but not the other based on the PRNG?
Shouldn't it be using a single instance of SystemRandom rather than a new instance for each call? [...]
According to Theo, modern userland CSPRNGs can create random bytes faster than memcpy
That is an astonishing claim, and I'd want to see evidence for it before accepting it. -- Steve

Steven D'Aprano <steve@pearwood.info> writes:
Isn't the only difference between generating a password and generating a key the length (and base) of the string? Where is the line?
That is an astonishing claim, and I'd want to see evidence for it before accepting it.
I assume it's comparing a CSPRNG all of whose state is in cache (or registers, if a large block of random bytes is requested from the CSPRNG in one go, with memcpy of data which must be retrieved from main memory.

On 10 September 2015 at 01:01, Donald Stufft <donald@stufft.io> wrote:
Wrong. There is a fourth basic type. People (like me!) whose code absolutely doesn't have any security issues, but want a simple, convenient, fast RNG. Determinism is not an absolute requirement, but is very useful (for writing tests, maybe, or for offering a deterministic rerun option to the program). Simulation-style games often provide a way to find the "map seed", which allows users to share interesting maps - this is non-essential but a big quality-of-life benefit in such games. IMO, the current module perfectly serves this fourth group. While I accept your point that far too many people are using insecure RNGs in "generate a random password" scripts, they are *not* the core target audience of the default module-level functions in the random module (did you find any examples of insecure use that *weren't* password generators?). We should educate people that this is bad practice, not change the module. Also, while it may be imperfect, it's still better than what many people *actually* do, which is to use "password" as a password on sensitive systems :-( Maybe what Python *actually* needs is a good-quality "random password generator" module in the stdlib? (Semi-serious suggestion...) Paul

On September 10, 2015 at 4:41:56 AM, Paul Moore (p.f.moore@gmail.com) wrote:
This group is the same as #3 except for the map seed thing which is group #1. In particular, it wouldn’t hurt you if the random you were using was cryptographically secure as long as it was fast and if you needed determinism, it would hurt you to say so. Which is the point that Theo was making.
IMO, the current module perfectly serves this fourth group.
Making the user pick between Deterministic and Secure random would serve this purpose too, especially in a language where "In the face of ambiguity, refuse the temptation to guess" is one of the core tenets of the language. The largest downside would be typing a few extra characters, which Python is not a language that attempts to do things in the fewest number of characters.
You cannot document your way out of a UX problem. The problem isn’t people doing this once on the command line to generate a password, the problem is people doing it in applications where they generate an API key, a session identifier, a random password which they then give to their users. If you give a way to get the output of the MT base random enough times, it can be used to determine what every random it generated was and will be. Here’s a game a friend of mine created where the purpose of the game is to essentially unrandomize some random data, which is only possible because it’s (purposely) using MT to make it possible https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia case, it’s a real concern that will absolutely fix some insecure software out there instead of telling them “welp typing a little bit extra once an import is too much of a burden for me and really it’s your own fault anyways”.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10 September 2015 at 12:26, Donald Stufft <donald@stufft.io> wrote:
I don't understand the phrase "if you needed determinism, it would hurt you to say so". Could you clarify?
And yet I know that I would routinely, and (this is the problem) without thinking, choose Deterministic, because I know that my use cases all get a (small) benefit from being able to capture the seed, but I also know I'm not doing security-related stuff. No amount of making me choose is going to help me spot security implications that I've missed. And also, calling the non-crypto choice "Deterministic" is unhelpful, because I *don't* want something deterministic, I want something random (I understand PRNGs aren't truly random, but "good enough for my purposes" is what I want, and "deterministic" reads to me as saying it's *not* good enough...)
What I'm trying to say is that this is an education problem more than a UX problem. Personally, I think I know enough about security for my (not a security specialist) purposes. To that extent, if I'm working on something with security implications, I'm looking for things that say "Crypto" in the name. The rest of the time, I just use non-specialist stuff. It's a similar situation to that of the "statistics" module. If I'm doing "proper" maths, I'd go for numpy/scipy. If I just want some averages and I'm not bothered about numerical stability, rounding behaviour, etc, I'd go for the stdlib statistics package.
To me, that's crypto and I'd look to the cryptography module, or to something in the stdlib that explicitly said it was suitable for crypto. Saying people write bad code isn't enough - how does the current module *encourage* them to write bad code? How much API change must we allow to cater for people who won't read the statement in the docs (in a big red box) "Warning: The pseudo-random generators of this module should not be used for security purposes." (Specifically people writing security related code who won't read the docs).
I don't understand how that game (which is an interesting way of showing people how attacks on crypto work, sure, but that's just education, which you dismissed above) relates to the issue here. And I hope you don't really think that your quote is even remotely what I'm trying to say (I'm not that selfish) - my point is that not everything is security related. Not every application people write, and not every API in the stdlib. You're claiming that the random module is security related. I'm claiming it's not, it's documented as not being, and that's clear to the people who use it for its intended purpose. Telling those people that you want to make a module designed for their use harder to use because people for whom it's not intended can't read the documentation which explicitly states that it's not suitable for them, is doing a disservice to those people who are already using the module correctly for its stated purpose. By the same argument, we should remove the statistics module because it can be used by people with numerically unstable problems. (I doubt you'll find StackOverflow questions along these lines yet, but that's only because (a) the module's pretty new, and (b) it actually works pretty hard to handle the hard corner cases, but I bet they'll start turning up in due course, if only from the people who don't understand floating point...) Paul

On September 10, 2015 at 8:29:16 AM, Paul Moore (p.f.moore@gmail.com) wrote:
I transposed some words, fixed: "If you needed determinism, would it hurt you to say so?"" Essentially, other than typing a little bit more, why is: import random print(random.choice([“a”, “b”, “c”])) better than import random; print(random.DetereministicRandom().choice([“a”, “b”, “C”])) As far as I can tell, you've made your code and what properties it has much clearer to someone reading it at the cost of 22 characters. If you're going to reuse the DeterministicRandom class you can assign it to a variable and actually end up saving characters if the variable you save it to can be accessed at less than 6 characters.
You're allowed to pick DeterministicRandom, you're even allowed to do it without thinking. This isn't about making it impossible to ever insecurely use random numbers, that's obviously a boil the ocean level of problem, this is about trying to make it more likely that someone won't be hit by a fairly easy to hit footgun if it does matter for them, even if they don't know it. It's also about making code that is easier to understand on the surface, for example without using the prior knowledge that it's using MT, tell me how you'd know if this was safe or not: import random import string password = "".join(random.choice(string.ascii_letters) for _ in range(9)) print("Your random password is",)
But you *DO* want something deterministic, the *ONLY* way you can get this small benefit of capturing the seed is if you can put that seed back into the system and get a deterministic result. If the seed didn’t exactly determine the output of the randomness then you wouldn't be able to do that. If you don't need to be able to capture the seed and essentially "replay" the PRNG in a deterministic way then there is exactly zero downsides to using a CSPRNG other than speed, which is why Theo suggested using a very fast, modern CSPRNG to solve the speed issues. Can you point out one use case where cryptographically safe random numbers, assuming we could generate them as quickly as you asked for them, would hurt you unless you needed/wanted to be able to save the seed and thus require or want deterministic results?
Reminder that this warning does not show up (in any color, much less red) if you’re using ``help(random)`` or ``dir(random)`` to explore the random module. It also does not show up in code review when you see someone doing random.random. It encourages you to write bad code, because it has a baked in assumption that there is a sane default for a random number generator and expects people to understand a fairly dificult concept, which is that not all "random" is equal. For instance, you've already made the mistake of saying you wanted "random" not deterministic, but the two are not mutually exlusive and deterministic is a property that a source of random can have, and one that you need for one of the features you say you like.
I'm claiming that the term random is ambiguously both security related and not security related and we should either get rid of the default and expect people to pick whether or not their use case is security related, or we should assume that it is unless otherwise instructed. I don't particularly care what the exact spelling of this looks like, random.(System|Secure)Random and random.DeterministicRandom is just one option. Another option is to look at something closer to what Go did and deprecate the "random" module and move the MT based thing to ``math.random`` and the CSPRNG can be moved to something like crypto.random.
No, by this argument we shouldn't have a function called statistics in the statistics module because there is no globally "right" answer for what the default should be. Should it be mean? mode? median? Why is *your* use case the "right" use case for the default option, particularly in a situation where picking the wrong option can be disastrous. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10 September 2015 at 14:10, Donald Stufft <donald@stufft.io> wrote:
Thanks. In one sense, no it wouldn't. Nor would it matter to me if "the default random number generator" was fast and cryptographically secure. What matters is just that I get a load of random (enough) numbers. What hurts somewhat (not enormously, I'll admit) is up front having to think about whether I need to be able to capture a seed and replay it. That's nearly always something I'd think of way down the line, as a "wouldn't it be nice if I could get the user to send me a reproducible test case" or something like that. And of course it's just a matter of switching the underlying RNG at that point. None of this is hard. But once again, I'm currently using the module correctly, as documented. I've omitted most of the rest of your response largely because we're probably just going to have to agree to differ. I'm probably too worn out being annoyed at the way that everything ends up needing to be security related, and the needs of people who won't read the docs determines API design, to respond clearly and rationally :-( Paul

On Thu, Sep 10, 2015 at 8:44 AM, Paul Moore <p.f.moore@gmail.com> wrote:
No one in this thread is accusing everyone of using the module incorrectly. The fact that you do use it correctly is a testament to the fact that you read the docs carefully and have some level of experience with the module to know that you're using it correctly.
I think the people Theo, Donald, and others (including myself) are worried about are the people who have used some book or online tutorial to write games in Python and have seen random.random() or random.choice() used. Later on they start working on something else (including but not limited to the examples of what Donald has otherwise pointed out). They also have enough experience with the random module to know it produced randomness (what kind, they don't know... in fact they probably don't know there are different kinds yet) and they use what they know because Python has batteries included and they're awesome and easy to use. The reality is that past experiences bias current decisions. If that person went and read the docs, they probably won't know if what they're doing warrants using a CSPRNG instead of the default Python one. If they're not willing to learn, or read enough (and I stress enough) (or just really don't have the time because this is a side project) about the topic before making a decision, they'll say "Well the module level functions seemed random enough to me, so I'll just use those". That could end up being rather awful for them. The reality is that your past experiences (and other people's past experiences, especially those who refuse to do some research and are demanding others prove that these are insecure with examples) are biasing this discussion because they fail to empathize with new users whose past experiences are coloring their decisions. People choose Python for a variety of reasons, and one of those reasons is that in their past experience it was "fast enough" to be an acceptable choice. This is how most people behave. Being angry at people for reading a two sentence long warning in the middle of the docs isn't helping anyone or arguing the validity of this discussion.

On September 10, 2015 at 9:44:13 AM, Paul Moore (p.f.moore@gmail.com) wrote:
This is actually exactly why Theo suggested using a modern, userland CSPRNG because it can generate random numbers faster than /dev/urandom can and, unless you need deterministic results, there's little downside to doing so. There's really two possible ideas here that depends on what sort of balance we'd want to strike. We can make a default "I don't want to think about it" implementation of random that is both *generally* secure and fast, however it won't be deterministic and you won't be able to explicitly seed it. This would be a backwards compatible change [1] for people who are simply calling these functions [2]: random.getrandbits random.randrange random.randint random.choice random.shuffle random.sample random.random random.uniform random.triangular random.betavariate random.expovariate random.gammavariate random.gauss random.lognormvariate random.normalvariate random.vonmisesvariate random.paretovariate random.weibullvariate If this were all that the top level functions in random.py provided we could simply replace the default and people wouldn't notice, they'd just automatically get safer randomness whether that's actually useful for their use case or not. However, random.py also has these functions: random.seed random.getstate random.setstate random.jumpahead and these functions are where the problem comes. These functions only really make sense for deterministic sources of random which are not "safe" for use in security sensitive applications. So pretending for a moment that we've already decided to do "something" about this, the question boils down to what do we do about these 4 functions. Either we can change the default to a secure CSPRNG and break these functions (and the people using them) which is however easily fixed by changing ``import random`` to ``import random; random = random.DeterministicRandom()`` or we can deprecate the top level functions and try to guide people to choose up front what kind of random they need. Either of these solutions will end up with people being safer and, if we pretend we've agreed to do "something", it comes down to whether we'd prefer breaking compatability for some people while keeping a default random generator that is probably good enough for most people, or if we'd prefer to not break compatability and try to push people to always deciding what kind of random they want. Of course, we still haven't decided that we should do "something", I think that we should because I think that secure by default (or at least, not insecure by default) is a good situation to be in. Over the history of computing it's been shown that time and time again that trying to document or educate users is error prone and doesn't scale, but if you can design APIs to make the "right" thing obvious and opt-out and require opting in to specialist [3] cases which require some particular property. [1] Assuming Theo's claim of the speed of the ChaCha based arc4random function is accurate, which I haven't tested but I assume he's smart enough to know what he's talking about WRT to speed of it. [2] I believe anyways, I don't think that any of these rely on the properties of MT or a deterministic source of random, just a source of random. [3] In this case, their are two specialist use cases, those that require deterministic results and those that require specific security properties that are not satisified by a userland CSPRNG because a userland CSPRNG is not as secure as /dev/urandom but is able to be much faster.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10 September 2015 at 15:21, Donald Stufft <donald@stufft.io> wrote:
Switching (somewhat hypocritically :-)) from an "I'm a naive user" stance, to talking about deeper issues as if I knew what I was talking about, this change results in each module getting a separate instance of the generator. That has implications on the risks of correlated results. It's unlikely to cause issues in real life, conceded. Paul

On Thu, 10 Sep 2015 at 07:22 Donald Stufft <donald@stufft.io> wrote:
+1 for deprecating module-level functions and putting everything into classes to force a choice +0 for deprecating the seed-related functions and saying "the stdlib uses was it uses as a RNG and you have to live with it if you don't make your own choice" and switching to a crypto-secure RNG. -0 leaving it as-is -Brett

On 11 September 2015 at 02:05, Brett Cannon <brett@python.org> wrote:
+1 for deprecating module-level functions and putting everything into classes to force a choice
-1000, as this would be a *huge* regression in Python's usability for educational use cases. (Think 7-8 year olds that are still learning to read, not teenagers or adults with more fully developed vocabularies) A reasonable "Hello world!" equivalent for introducing randomness to students is rolling a 6-sided die, as that relates to a real world object they'll often be familiar with. At the moment that reads as follows:
Another popular educational exercise is the "Guess a number" game, where the program chooses a random number from 1-100, and the person playing the game has to guess what it is. Again, randint() works fine here. Shuffling decks of cards, flipping coins, these are all things used to introduce learners to modelling random events in the real world in software, and we absolutely do *not* want to invalidate the extensive body of educational material that assumes the current module level API for the random module.
However, this I'm +1 on. People *do* use the module level APIs inappropriately, and we can get them to a much safer place, while nudging folks that genuinely need deterministic randomness towards an alternative API. The key for me is that folks that actually *need* deterministic randomness *will* be calling the stateful module level APIs. This means we can put the deprecation warnings on *those* methods, and leave them out for the others. In terms of practical suggestions, rather than DeterministicRandom and NonDeterministicRandom, I'd actually go with the simpler terms SeededRandom and SeedlessRandom (there's a case to be made that those are misnomers, but I'll go into that more below): SeededRandom: Mersenne Twister SeedlessRandom: new CSPRNG SystemRandom: os.urandom() Phase one of transition: * add SeedlessRandom * rename Random to SeededRandom * Random becomes a subclass of SeededRandom that deprecates all methods not shared with SeedlessRandom * this will also effectively deprecate the corresponding module level functions * any SystemRandom methods that are no-ops (like seed()) are deprecated Phase two of transition: * Random becomes an alias for SeedlessRandom * deprecated methods are removed from SystemRandom * deprecated module level functions are removed As far as the proposed Seeded/Seedless naming goes, that deliberately glosses over the fact that "seed" gets used to refer to two different things - seeding a PRNG with entropy, and seeding a deterministic PRNG with a particular seed value. The key is that "SeedlessRandom" won't have a "seed()" *method*, and that's the single most salient fact about it from a user experience perspective: you can't get the same output by providing the same seed value, because we wouldn't let you provide a seed value at all. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 11, 2015 at 3:00 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Aside from sounding like varieties of grapes in a grocery, those names seem just fine. From the POV of someone with a bit of comprehension of crypto (as in, "use /dev/urandom rather than a PRNG", but not enough knowledge to actually build or verify these things), the distinction is precise: with SeededRandom, I can give it a seed and get back a predictable sequence of numbers, but with SeedlessRandom, I can't. I'm not sure what the difference is between "seeding a PRNG with entropy" and "seeding a deterministic PRNG with a particular seed value", though; aside from the fact that one of them uses a known value and the other doesn't, of course. Back in my BASIC programming days, we used to use "RANDOMIZE TIMER" to seed the RNG with time-of-day, or "RANDOMIZE 12345" (or other value) to seed with a particular value; they're the same operation, but one's considered random and the other's considered predictable. (Of course, bytes from /dev/urandom will be a lot more entropic than "number of centiseconds since midnight", but for a single-player game that wants to provide a different starting layout every time you play, the latter is sufficient.) ChrisA

On 11 September 2015 at 03:11, Chris Angelico <rosuav@gmail.com> wrote:
Actually, that was just a mistake on my part - they're really the same thing, and the only distinction is the one you mention: setting the seed to a known value. Thus the main seed-related difference between something like arc4random and other random APIs is the same one I'm proposing to make here: it's seedless at the API level because it takes care of collecting its own initial entropy from the operating system's random number API. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Angelico wrote:
I think the only other difference is that the Linux kernel is continually re-seeding its generator whenever more unpredictable bits become available. It's not something you need to explicitly do yourself, as in your BASIC example. -- Greg

Nick Coghlan <ncoghlan@...> writes:
Fully agreed with Nick. That this is being seriously considered shows a massive disregard for usability. Python is not C++, it places convenience first. Besides, a deterministic RNG is a feature: you can reproduce exactly a random sequence by re-using the same seed, which helps fix rare input-dependent failures (we actually have good example of that in CPython development with `regrtest -r`). Good luck debugging such issues when using a RNG which reseeds itself in a random (!) way. Endly, the premise of this discussion is idealistic in the first place. If someone doesn't realize their code is security-sensitive, there are other mistakes they will make than simply choosing the wrong RNG. If you want to help people generate secure passwords, best would be perhaps to write a password-generating (or more generally secret-generating, for different kinds of secrets: passwords, session ids, etc.) library. Regards Antoine.

On 14 September 2015 at 13:59, Antoine Pitrou <antoine@python.org> wrote:>
Is your argument that there are lots of ways to get security wrong, and for that reason we shouldn't try to fix any of them? After all, I could have made this argument against PEP 466, or against the deprecation of SHA1 in TLS certificates, or against any security improvement ever made that simply changed defaults. The fact that there are secure options available is not a good excuse for leaving the insecure ones as the defaults. And let's be clear, this is not a theoretical error that people don't hit in real life. Investigating your last comment, Antoine, I googled "python password generator". The results: - The first one is a StackOverflow question which incorrectly uses random.choice (though seeded from os.urandom, which is an improvement). The answer to that says to just use os.urandom everywhere, but does not provide sample code. Only the third answer gets so far as to provide sample code, and it's way overkill. - The second option, entitled "A Better Password Generator", incorrectly uses random.randrange. This code is *aimed at beginners*, and is kindly handing them a gun to point at their own foot. - The third one uses urandom, which is fine - The fourth, an XKCD-based password generator, uses SystemRandom *if available* but then falls back to the MT approach, which is an unexpected decision, but there we go. - The fifth, from "pythonforbeginners.com", incorrectly uses random.choice - The sixth goes into an intensive discussion about 'password strength', including a discussion about the 'bit strength' of the password, despite the fact that they use random.randint which means that the analysis about bit strength is totally flawed. - For the seventh we get a security.stackexchange question with the first answer saying not to use Random, though the questioner does use it and no sample code is provided. - The eight is a library that "generates randomized strings of characters". It attempts to use SystemRandom but falls back silently if it's unavailable. At this point I gave up. Of that list of 8 responses, three are completely wrong, two provide sample code that is wrong with no correct sample code to be found on the page, two attempt to do the right thing but will fall into a silent failure mode if they can't, and only one is unambiguously correct. Similarly, a quick search of GitHub for Python repositories that contain random.choice and the string 'password' returns 40,000 results.[0] Even if 95% of them are safe, that leaves 2000 people who wrote wrong code and uploaded it to GitHub. It is disingenuous to say that only people who know enough write security-critical code. They don't. The reason for this is that most people don't know they don't know enough. And for those people, Python's default approach screws them over, and then they write blog posts which screw over more people. If the Python standard library would like to keep the insecure default of random.random that's totally fine, but we shouldn't pretend that the resulting security failures aren't our fault: they absolutely are. [0]: https://github.com/search?l=python&q=random.choice+password&ref=searchresults&type=Code&utf8=%E2%9C%93

On 14 September 2015 at 14:29, Cory Benfield <cory@lukasa.co.uk> wrote:
Is your argument that there are lots of ways to get security wrong, and for that reason we shouldn't try to fix any of them?
This debate seems to repeatedly degenerate into this type of accusation. Why is backward compatibility not being taken into account here? To be clear, the proposed change *breaks backward compatibility* and while that's allowed in 3.6, just because it is allowed, doesn't mean we have free rein to break compatibility - any change needs a good justification. The arguments presented here are valid up to a point, but every time anyone tries to suggest a weak area in the argument, the "we should fix security issues" trump card gets pulled out. For example, as this is a compatibility break, it'll only be allowed into 3.6+ (I've not seen anyone suggest that this is sufficiently serious to warrant breaking compatibility on older versions). Almost all of those SO questions, and google hits, are probably going to be referenced by people who are using 2.7, or maybe some version of 3.x earlier than 3.6 (at what stage do we allow for the possibility of 3.x users who are *not* on the latest release?) So is a solution which won't impact most of the people making the mistake, worth it? I fully expect the response to this to be "just because it'll take time, doesn't mean we should do nothing". Or "even if it just fixes it for one or two people, it's still worth it". But *that's* the argument I don't find compelling - not that a fix won't help some situations, but that because it's security, (a) all the usual trade-off calculations are irrelevant, and (b) other proposed solutions (such as education, adding specialised modules like a "shared secret" library, etc) are off the table. Honestly, this type of debate doesn't do the security community much good - there's too little willingness to compromise, and as a result the more neutral participants (which, frankly, is pretty much anyone who doesn't have a security agenda to promote) end up pushed into a "reject everything" stance simply as a reaction to the black and white argument style. Paul

On September 14, 2015 at 11:01:36 AM, Paul Moore (p.f.moore@gmail.com) wrote:
How has it not been taken into account? The current proposal (best summed up by Nick in the other thread) will not break compatability for anyone except those calling the functions that are specifically about setting a seed or getting/setting the current state. In looking around I don't see a lot of people using those particular functions so most people likely won't notice the change at all, and for those who there is a very trivial change they can make to their code to cope with the change.
We can't go back in time and fix those versions that is true. However, one of the biggest groups of people who are most likely to be helped by this change is new and inexperienced developers who don't fully grasp the security sensitive nature of whatever they are doing with random. That group of people are also more likely to be using Python 3.x than experienced programmers.
If I/we were not willing to compromise, I'd be pushing for it to use SystemRandom everywhere because that removes all of the possibly problematic parts of using using a user-space CSPRNG like is being proposed. However, I/we are willing to compromise by sacrificing possible security in order to not regress things where we can, in particular a user-space CSPRNG is being proposed over SystemRandom because it will provide you with random numbers almost as fast as MT will. However, when proposing this possible compromise, we are met with people refusing to meet us in the middle. There are some folks who are trying to propose other middle grounds, and there will undoubtably be some discussion around which ones are the best. We've gone from suggesting to replacing the default random with SystemRandom (a lot slower than MT) to removing the default altogether, to deprecating the default and replacing it with a fast user-space CSPRNG. However, folks who don't want to see it change at all have thus far been unwilling to compromise at all. I'm confused how you're saying that the security minded folks have been unwilling to compromise when we've done that repeatidly in this thread, whereas the backwards compat minded folks have consistently said "No, it would break compatability" or "We don't need to change" or "They are probably insecure anyways". Can you explain what compromise you're willing to accept here? If it doesn't involve breaking at least a little compatability then it's not a compromise it's you demanding that your opinion is the correct one (which isn't wrong, we're also asserting that our opinion is the correct one, we've just been willing to move the goal posts to try and limit the damage while still getting most of the benefit). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft <donald@...> writes:
That's a pretty big "except". Paul's and my concern is about compatibility breakage, saying "it doesn't break compatibility except..." sounds like a lot of empty rhetoric.
In looking around I don't see a lot of people using those particular functions
Given that when you "look around" you only end up looking around amongst the Web developer crowd, I may not be surprised. You know, when I "look around" I don't see a lot of people using the random module to generate passwords. Your anecdote would be more valuable than other people's?
Yes, because generating passwords is a common and reasonable task for new and inexperienced developers? Really? Again, why don't you propose a dedicated API for that? That's what we did for constant-time comparisons. That's what people did for password hashing. That's what other people did for cryptography. I haven't seen a reasonable rebuttal to this. Why would generating passwords be any different from all those use cases? After all, if you provide a convenient API people should flock to it, instead of cumbersomely reinventing the wheel... That's what libraries are for.
Really, it's not so much a performance issue as a compatibility issue. The random module provides, by default, a *deterministic* stream of random numbers. That os.urandom() may be a tad slower isn't very important when you're generating one number at a time and processing it with a slow interpreter (besides, MT itself is hardly the fastest PRNG out there). That os.urandom() doesn't give you a way to seed it once and get predictable results is a big *regression* if made the default RNG in the random module. And the same can be said for a user-space CSRNG, as far as I understand the explanations here.
However, when proposing this possible compromise, we are met with people refusing to meet us in the middle.
See, people are fed up with the incompatibilities arising "in the name of the public good" in each new feature release of Python. When the "middle" doesn't sound much more desirable than the "extreme", I don't see why I should call it a "compromise". Some people have to support code in 4 different Python versions and further gratuitous breakage in the stdlib doesn't help. Yes, they can change their code. Yes, they can use the "six" module, the "future" module or whatever new bandaid exists on PyPI. Still they must change their code in a way or another because it was deemed "necessary" to break compatibility to solve a concern that doesn't seem grounded in any reasonable analysis. Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. Not Python 3.6. (in case you're wondering, trying to make all published code on the Internet secure by appropriately changing the interpreter's "behaviour" to match erroneous expectations - even *documented* as erroneous - is *not* reasonable - no matter how hard you try, there will always be occurrences of broken code that people copy and paste around)
Can you explain what compromise you're willing to accept here?
Let's rephrase this: are *you* willing to accept an admittedly "insecure by default" compromise? No you aren't, evidently. There's no evidence that you would accept to leave the top-level random functions intact, even if a new UserSpaceSecureRandom class was added to the module, right? So why would we accept a compatibility-breaking compromise? Because we are more "reasonable" than you? (which in this context really reads: more willing to quit the discussion because of boredom, exhaustion, lack of time or any other quite humane reason; which, btw, sums up of significant part of what the dynamics of python-ideas have become: "victory of the most obstinate") Yeah, that's always what you are betting on, because it's not like *you* will ever be reasonable except if it's the last resort for getting something accepted. And that's why every discussion about security with security-minded (read: "obsessed") people is a massive annoyance, even if at the end it succeeds in reaching a "compromise", after 500+ excruciating backs and forths on a mailing-list. Regards Antoine.

On 14 September 2015 at 16:55, Antoine Pitrou <antoine@python.org> wrote:
Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. Not Python 3.6.
To clarify: your position is that we cannot break backward compatibility in Python 3.6?

Cory Benfield <cory@...> writes:
It is. Not breaking backward compatibility in feature releases (except 3.0, which was a deliberate special case) is a very long standing policy, and it is so because users have a much better time with such a policy, especially when people have to maintain code that's compatible accross multiple versions (again, the 2->3 transition is a special case, which justifies the existence of tools such as "six", and has incidently created a lot of turmoil in the community that has only recently begin to recede). Of course, fixing a bug is not necessarily breaking compatibility (although sometimes we may even refuse to fix a bug because the impact on working code would be too large). But changing or removing a documented behaviour that people rely on definitely is. We do break feature compatibility, from time to time, in exceptional and generally discussed-at-length cases, but there is a sad pressure recently to push for more compatibility breakage - and, strangely, always in the name of "security". (also note that some library modules such as asyncio are or were temporarily exempted from the compatibility requirements, because they are in very active development; the random module evidently isn't part of them) Regards Antoine.

On Mon, Sep 14, 2015 at 10:01 AM, Paul Moore <p.f.moore@gmail.com> wrote:
So people who are arguing that the defaults shouldn't be fixed on Python 2.7 are likely the same people who also argued that PEP 466 was a terrible, awful, end-of-the-world type change. Yes it broke things (like eventlet) but the net benefit for users who can get onto Python 2.7.9 (and later) is immense. Now I'm not arguing that we should do the same to the random module, but a backport (that is part of the stdlib) would probably be a good idea under the same idea of allowing users to opt into security early.
They're not irrelevant. I personally think they're of a lower impact to the discussion, but the reality is that the people who are educating others are few and far between. If there are public domain works, free tutorials, etc. that all advocate using a module in the standard library and no one can update those, they still exist and are still recommendations. People prefer free to correct when possible because there's nothing free to correct them (until they get hacked or worse). Do we have a team in the Python community that goes out to educate for free people on security related best practices? I haven't seen them. The best we have is a few people on crufty mailing lists like this one trying to make an impact because education is a much larger and harder to solve problem than making something secure by default. Perhaps instead of bickering like fools on a mailing list, we could all be spending our time better educating others. That said, I can't make that decision for you just like you can't make that for me.
Except you seem to have missed much of the compromises being discussed and conceded by the security minded folks. Personally, names that describe the outputs of the algorithms make much more sense to me than "Seedless" and "Seeded" but no one has really bothered to shave that yak further out of a desire to compromise and make things better as a whole. Much of the lack of gradation has come from the opponents to this change who seem to think of security as a step function where a subjective measurement of "good enough for me" counts as secure.

On 14 September 2015 at 16:32, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
You may well be right. Personally, I'm pretty sick of the way all of these debates degenerate into content-free reiteration of the same old points, and unwillingness to hear other people's views. Here's a point - it seems likely that the people arguing for this change are of the opinion that I'm not appreciating their position. (For the record, I'm not being deliberately obstructive in case anyone thought otherwise. In my view at least, I don't understand the security guys' position). Assuming that's the case, then I'm probably one of the people who needs educating. But I don't feel like anyone's trying to educate me, just that I'm being browbeaten until I give in. Education != indoctrination.
That said, I can't make that decision for you just like you can't make that for me.
Indeed. Personally, I spend quite a lot of time in my day job (closed source corporate environment) trying to educate people in sane security practices, usually ones I have learned from people in communities like this one. One of the biggest challenges I have is stopping people from viewing security as "an annoying set of rules that get in the way of what I'm trying to do". But you would not believe the sorts of things I see routinely - I'm not willing to give examples or even outlines on a public mailing list because I can't assess whether such information could be turned into an exploit. I can say, though, that crypto-safe RNGs is *not* a relevant factor :-) At its best, good security practice should *help* people write reliable, easy to use systems. Or at a minimum, not get in the way. But the PR message needs always to be "I understand the constraints you're dealing with", not "you must do this for your own good". Otherwise the "follow the rules until the auditors go away" attitude just gets reinforced. Hence my focus on seeing proof that breakages are justified *in the context of the target audience I am responsible for*. Conversely, you're right that I can't force anyone else to try to educate people in good security practices, however much better than me at it I might think they are. In actual fact, though, I think a lot of people do a lot of good work educating others - as I say, most of what I've learned has been from lists like these.
OK, you have a point - there have been changes to the proposals. But there are fundamental points that have (as far as I can see) never been acknowledged. As a result, the changes feel less like compromises based on understanding each other's viewpoints, and more like repeated attempts to push something through, even if it's not what was originally proposed. (I *know* this is an emotional position - please understand I'm fed up and not always managing to word things objectively). Specifically, I have been told that I can't argue my "convenience" over the weight of all the other people who could fall into security traps with the current API. Let's review that, shall we? * My argument is that breaking backward compatibility needs to be justified. People have different priorities. "Security risks should be fixed" isn't (IMO) a free pass. Why should it be? "Windows compatibility issues should be fixed" isn't a free pass. "PyPy/Jython compatibility issues should be fixed" isn't a free pass. Forcing me to adjust my priorities so that I care about security when I don't want (or IMO need) to isn't acceptable. * The security arguments seem to be largely in the context of web application development (cookies, passwords, shared secrets, ...) That's not the only context that matters. * As I said above, in my experience, a compatibility break "to make things more secure" is seen as equating security with inconvenience, and can actually harm attempts to educate users in better security practices. * In many environments, reproducibility of random streams is important. I'm not an expert on those fields, although I've hit some situations where seeding is a requirement. As far as I am aware, most of those situations have no security implications. So for them, the PEP is all cost, no benefit. Sure the cost is small, but it's non-zero. How come the web application development community is the only one whose voice gets heard? Is it because the fact that they *are* public-facing, and frequently open-source, means that data is available? So "back it up with facts or we won't believe you" becomes a debating stance? I'm not arguing that everyone should be allowed to climb up on their soapbox and rant - but I would like to think that bringing a different perspective to the table could be treated with respect and genuine attempts to understand. And "in my experience" is viewed as an offer of information, not as an attempt to bluff on a worthless hand. Just to be clear, I think the current proposal (Nick's pre-PEP) is relatively unobtrusive, and unlikely to cause serious compatibility issues. I'm uncomfortable with the fact that it feels like yet another "imposition in the name of security", and while I'm only one person I feel that I'm not alone. I'm concerned that the people pushing security seem unable to recognise that people becoming sick of such changes is a PR problem they need to address, but that's their issue not mine. So I'm unlikely to vote against the proposal, but I'll feel sad if it's accepted without a more balanced discussion than we've currently had. On the meta-issue of how debates like this are conducted, I think people probably need to listen more than they talk. I'm as guilty as anyone else here. But in particular, when multiple people all end up responding to rebut *every* counter-argument, essentially with the same response, maybe it's time to think "we're in the majority here, let's stop talking so much and see if we're missing anything from what the people with other views are saying". He who shouts loudest isn't always right. Not necessarily wrong, either, but sometimes it's bloody hard to tell one way or the other, if they won't shut up long enough to analyze the objections.
I'm frankly long past caring. I think we'll end up with whatever was on the table when people got too tired to argue any more.
Wait, what? It's *me* that's claiming that security is a yes/no thing??? When all I'm hearing is "education isn't sufficient", "dedicated libraries aren't sufficient", "keeping a deterministic RNG as default isn't an option"? And when I'm suggesting that fixing the PRNG use in code that misuses a PRNG may not be the only security issue with that code? I knew the two sides weren't communicating, but this statement staggers me. We have clearly misunderstood each other even more fundamentally that I had thought possible :-( Thinking hard about the implications of what you said there, I start to see why you might have misinterpreted my stance as the black and white one. But I have absolutely no idea how to explain to you that I find your stance equally (and before I took the time to think through what your statement implied, even more) so. There's little more I can say. I'm going to take my own advice now, and stop talking. I'll keep listening, in the hope that either this post or something else will somehow break the logjam, but right now I'm not sure I have much hope of that. Paul

On Mon, Sep 14, 2015, at 15:14, Paul Moore wrote:
* My argument is that breaking backward compatibility needs to be justified.
I don't think it does. I think that there needs to be a long roadmap of deprecation and provided workarounds for *almost any* backwards-compatibility-breaking change, but that special justification beyond "is this a good feature" is only needed for ignoring that roadmap, not for deprecating/replacing a feature in line with it. No-one, as far as I have seen in this thread to date, has actually put a timeline on this change. No-one's talking about getting rid of the global functions in 3.5.1, or in 3.6, or in 3.7. So with that in mind I can only conclude that the people against making the change are against *ever* making it *at all* - and certainly a lot of the arguments they're making have to do with nebulous educational use-cases (class instances are hard, let's use mutable global state) rather than backwards compatibility. Would you likewise have been against every single thing that Python 3 did?

On September 14, 2015 at 3:14:45 PM, Paul Moore (p.f.moore@gmail.com) wrote:
For the record, I'm not sure what part you don't understand. I'm happy to try and explain it, but I think I'm misunderstanding what you're not understanding or something because I personally feel like I did explain what I think you're misunderstanding. Part of the problem (probably) here is that there isn't an exact person we're trying to protect here. The general gist is that if you use the deterministic APIs in a security sensitive situation, then you may be vulnerable depending on exactly what you're doing. We think that in particular, the API of the random module will lead inexperienced or un(der)informed developers to use the API in situations that it's not appropiate and from that, have an insecure piece of software they wrote. We're people who think that the defaults of the software should be "generally" secure (as much so as is reasonable) and that if you want to do something that isn't safe then you should explicitly opt in to that (the flipside is, things shouldn't be so locked down as to be unusable without having to turn off all of the security knobs, this is where the "generally" in generally secure comes into play). A particularly nasty side effects of this, is that it's almost never the people who wrote this software who are harmed by it being broken and it's almost always their users who didn't have anything to do with it. So essentially the goal is to try and make it harder for people to accidently misuse the random module. If that doesn't answer your confusion, if you can try to reword it to get it through my thick skull better, I'm happy to continue to try an answer it (on or off list).
Right, and this is actually trying to do that. By removing a possibly dangerous default and making the default safer. Defaults matter a lot in security (and sadly, a lot of software doesn't have safe defaults) because a lot of software will never use anything but the defaults.
I think part of this is that a lot of the folks proposing these changes are also sensitive to the backwards compatability needs and have already baked that into their thoughts. We don't generally come into these with "scorched earth" suggestions of fixing some situation where security could be improved but instead try and figure out a decent balance of security and not breaking things to try and cover most of the ground with as little cost as possible. My very first email in this particular thread (that started this thread) was the first one I had with a fully solid proposal in it. The last paragraph in that proposal asked the question "Do we want to protect users by default?" My next email presents two possible options depending on which we considered to be "less" breaking, either deprecating the module scoped functions completely or change their defaults to something secure and mentioned that if we can't change the default, the user-land CSPRNG probably isn't a useful addition because it's benefit is primarily in being able to make it the default option. I don't see anyone who is talking about making a change not also talking about what areas of backwards compatibility it would actually break. I think part of this too is that security is a bit weird, it's not a boolean property but there are particular bars you need to pass before it's an actual solution to the problem. So for a lot of us, we'll figure out that bar and draw a line in the sand and say "If this proposal crosses this line, then doing nothing is better than doing something" because it'd just be churn for churns sake at that point. That's why you'll see particular points that we essentially won't give up, because if they are given up we might as well do nothing. In this particular instance, the point is that the API of the random module leads people to use it incorrectly, so unless we address that, we might as well just leave it alone.
I think I was the one who said that to you, and I'd like to explain why I said it (beyond the fact I was riled up). Essentially I had in my mind something like what Nick has proposed, which you've said later on you think is relatively unobtrusive, and unlikely to cause serious compatibility, which I agree with. Then I saw you arguing against what I felt was a pretty mundane API break that was fairly trivial to work around, and it signaled to me that you were saying that having to type a few extra letters was a bridge too far. This reads to me like someone saying "Well I know how to use it correctly, it's their own fault if others don't". I'm not saying that's what you actually think but that's how it read to me.
The justification is essentially that it will protect some people with minimal impact to others. The main impact will be people who actually needed a deterministic RNG will need to use something like ``random.seeded_random`` instead of just ``random`` and importantly, this will break in a fairly obvious manner instead of the silently wrong situation for people who are currently using the top level API incorrectly. As a bit of a divergence, the "silently wrong" part is why defaults tend to matter a lot in security. Unless you're well versed in it, most people don't think about it and since it "works" they don't inquire further. Something that is security sensitive that always "works" (as in, doesn't raise an error) is broken which is the inverse of how most people think about software. To put it another way, it's the job of security sensitive APIs to break things, ideally only in cases where it's important to break, but unless you're actually testing that it breaks in those attack scenarios, secure and insecure looks exactly the same.
You're right it's not the only context that matters, however it's often brought up for a few reasons: * Security largely doesn't matter for software that doesn't accept or send input from some untrusted source which narrows security down to be mostly network based applications. * The HTTP protocol is "eating the world" and we're seeing more and more things using it as their communication protocol (even for things that are not traditional browser based applications). * Traditional Web Applications/Sites are a pretty large target audience for Python and in particular a lot of the security folks come from that world because the web is a hostile place. But you can replace web application with anything that an untrusted user can interact with over any protocol and the argument is basically the same.
Sadly, I don't think this is fully resolvable :( It is the nature of security that it's purpose is to take something that otherwise "works" and make it no longer work because it doesn't satisfy the constraints of the security system.
Right, and I don't think anyone is saying this isn't an important use case, just that if you need a deterministic RNG and you don't get one, that is a fairly obvious problem but if you need a CSPRNG and you don't get one, that is not obvious. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Tue, Sep 15, 2015 at 6:23 AM, Donald Stufft <donald@stufft.io> wrote:
To add to that: Web application development is a *huge* area (every man and his dog wants a web site, and more than half of them want logins and users and so on), which means that the number of non-experts writing security-sensitive code is higher there than in a lot of places. The only other area I can think of that would be comparably popular would be mobile app development - and a lot of the security concerns there are going to be in a web context anyway. Is it fundamentally insecure to receive passwords over an encrypted HTTP connection and use those to verify user identities? I don't think so (although I'm no expert) - it's what you do with them afterward that matters (improperly hashing - or, worse, using a reversible transformation). Why are so many people advised not to do user authentication at all, but to tie in with one of the auth APIs like Google's or Facebook's? Because it's way easier to explain how to get that right than to explain how to get security/encryption right. How bad is it, really, to tell everyone "use random.SystemRandom for anything sensitive", and leave it at that? ChrisA

On 15 September 2015 at 18:58, Chris Angelico <rosuav@gmail.com> wrote:
How bad is it, really, to tell everyone "use random.SystemRandom for anything sensitive", and leave it at that?
That's the status quo, and has been for a long time. If it was ever going to work in terms of discouraging folks from use the module level functions for security sensitive tasks, it would have worked by now. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 September 2015 at 01:32, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
They don't even have to get onto 2.7.9 per se - the RHEL 7.2 beta just shipped with Robert Kuska's backport of those changes (minus the eventlet breaking internal API change), so it will also filter out through the RHEL/CentOS ecosystem via 7.x and SCLs. (We also looked at a Python 2.6 backport, but decided it was too much work for not enough benefit - folks really need to just upgrade to RHEL/CentOS 7 already, or at least switch to using Software Collections for their Python runtime needs). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14 September 2015 at 16:01, Paul Moore <p.f.moore@gmail.com> wrote:
What makes you think that I didn't take it into account? I did: and then rejected it. On a personal level, I believe that defaulting to more secure is worth backward compatibility breaks. I believe that a major reason for the overwhelming prevalence of security vulnerabilities in modern software is because we are overly attached to making people's lives *easy* at the expense of making them *safe*. I believe that software communities in general are too concerned about keeping the stuff that people used around for far too long, and not concerned enough about pushing users to make good choice. The best example of this is OpenSSL. When compiled from source naively (e.g. ./config && make && make install), OpenSSL includes support for SSLv3, SSL Compression, and SSLv2, all of which are known-broken options. To clarify, SSLv2 has been deprecated for security reasons since 1996, but a version of OpenSSL 1.0.2d you build today will happily enable *and use* it. Hell, OpenSSL's own build instructions include this note[0]:
Why is it that users who do not read the wiki (most of them) get an insecure build? Backwards compatibility is why. This is necessarily a reductio ad absurdum type of argument, because I'm trying to make a rhetorical point: I believe that sacrificing security on the altar of backwards compatibility is a bad idea in the long term, and I want to discourage it as best I can. I appreciate your desire to maintain backward compatibility, Paul, I really do. And I think it is probably for the best that people like you work on projects like CPython, while people like me work outside the standard library. However, that won't stop me trying to drag the stdlib towards more secure defaults: it just might make it futile.

On September 14, 2015 at 11:08:39 AM, Cory Benfield (cory@lukasa.co.uk) wrote:
So I will counter this with what I am fully expecting to be the response: People use distributions that compile and configure OpenSSL for them, e.g., `apt-get install openssl` (not obviously the example that works, but you get the idea). That said, last year, Debian, Ubuntu, Fedora, and other distributions all started compiling openssl without SSLv3 as an available symbol which broke backwards compatibility and TONS of python projects (eventlet, urllib3, requests, etc.). Why did it break backwards compatibility? Because they knew that they were responsible for the security of their users and expecting users to recompile OpenSSL themselves with the correct flags was unrealistic. Their users come from a wide range of people: - System administrators - Desktop users (if you believe anyone actually uses linux on the desktop ;)) - Researchers - Developers - etc.
That said, I’d also like to combat the idea that security experts won’t use random. Currently Helios which is a voting piece of software (that anyone can deploy) uses the random module (https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b7194...) They use it to generate passwords: https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b7194... https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b7194... Ben Adida is a security professional who has written papers on creating secure voting systems but even he uses the random module arguably incorrectly in what should be secure software. Arguing that anyone who knows they need secure random functions will use them, is clearly invalidated. Not everyone who knows they should be generating securely random things are aware that the random module is insufficient for their needs. Perhaps that code was written before the big red box was added to the documentation and so it was ineffective. Perhaps Ben googled and found that everyone else was using random for passwords (as people have shown is easy to find in this discussion several times). That said, your arguments are easily reduced to “No language should protect its users from themselves” which is equivalent to Python’s “We’re all consenting adults philosophy”. In that case, we’re absolutely safe from any blame for the horrible problems that users inflict on themselves. Anyone that used urllib2/httplib/etc. from the standard library to talk to a site over HTTPS (prior to PEP 466) are all to blame because they didn’t read the source and know that their sensitive information was easily intercepted by anyone on their network. Clearly, that’s their fault. This makes core language development so much easier, doesn’t it? Place all the blame on the users for the sake of X (where in this discussion X is the holy grail of backwards compatibility).

On 14 September 2015 at 17:00, Cory Benfield <cory@lukasa.co.uk> wrote:
OK. In *my* experience, systems with appallingly bad security practices run for many years with no sign of an exploit. The vulnerabilities described in this thread pale into insignificance compared to many I have seen. On the other hand, I regularly see systems not being upgraded because the cost of confirming that there are no regressions (much less the cost of making fixes for deliberate incompatibilities) is deemed too high. I'm not trying to justify those things, nor am I trying to say that my experience is in any way "worth more" than yours. These aren't all Python systems. But the culture where such things occur is real, and I have no reason to believe that I'm the only person in this position. (But as it's in-house closed-source, it's essentially impossible to get any good view of how common it is). Paul

On September 14, 2015 at 3:27:22 PM, Paul Moore (p.f.moore@gmail.com) wrote:
What does "no sign of an exploit" mean? Does it mean that if there was an exploit that the attackers didn't put metaphorical giant signs up to say that "Zero Cool" was here? Or is there an active security team running IDS software to ensure that there wasn't a breach? I ask because in my experience, "no sign of an exploit" is often synonymous with "we've never really looked to see if we were exploited, but we haven't noticed anything". This is a dangerous way to look at it, because a lot of exploitation is being done by organized crime where they don't want you to notice that you were exploited because they want to make you part of a botnet or to silently steal data or whatever you have. For these, if they get detected that is a bad thing because they lose that node in their botnet (or whatever). It's a very rare exploit that gets publically exposed like the Ashley Madison hacks, they are jsut the ones that get the most attention because they are bombastic and public.
Absolutely! However, I think these systems largely don't upgrade *at all* and are still on whatever version of $LANG they originally wrote the software for. These systems tend to be so regression adverse that they don't even risk bug fixes because that might cause a regression. For these people, it doesn't really matter what we do because they aren't going to upgrade anyways, and they keep Red Hat in business by paying them for Python 2.4 until the heat death of the universe. I think the more likely case for concern is people who do upgrade and are willing to tolerate some regression in order to stay somewhat current. These people will push back against *massive* breakage (as seen with the Python 3.x migration taking forever) but are often perfectly fine dealing with small breakages. As someone who does write software that supports a lot of versions (currently, 6-7 versions of CPython alone is my standard depending if you count pre-releases or not) having to tweak import statements doesn't even really register in my "give a damn" meter, nor did it for the folks I know who are in similar situations (though this is admittingly a biased and small sample).
I think maybe a problem here is a difference in how we look at the data. It seems that you might focus on the probability of you personally (or the things you work on) getting attacked and thus benefiting from these changes, whereas I, and I suspect the others like me, think about the probability of *anyone* being attacked. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

(The rest of your emails, I'm going to read fully and digest before responding. Might take a day or so.) On 14 September 2015 at 21:36, Donald Stufft <donald@stufft.io> wrote:
This may be true, in some sense. But I'm not willing to accept that you are thinking about everyone, but I'm somehow selfishly only thinking of myself. If that's what you were implying, then frankly it's a pretty offensive way of disregarding my viewpoint. Knowing you, I'm sure that's *not* how you meant it - but do you see how easy it is for the way you word something to make it nearly impossible for me to see past your wording to get to the actual meaning of what you're trying to say? I didn't even consciously notice the implication myself, at first. I simply started writing a pretty argumentative rebuttal, because I felt that somehow I needed to correct what you said, but I couldn't quite say why. Looking at the reality of what I focus on, I'd say it's more like this. I mistrust arguments that work on the basis that "someone, somewhere, might do X bad thing, therefore we must all pay cost Y". The reasons are complex (and I don't know that I fully understand all of my thought processes here) but some aspects that immediately strike me are: * The probability of X isn't really quantified. I may win the lottery, but I don't quit my job - the probability is low. The probability of X matters. * My experience of the probability of X happening varies wildly from that of whoever's making the point. Who is right? Why must one of us "win" and be right? Can't it simply be that my data implies that over the full data set, the actual probability of X is lower than you thought? * The people paying cost Y are not the cause of, nor are they impacted by, X (except in an abstract "we all suffer if bad things happen" sense). I believe in the general principle of "you pay for what you use", so to me you're arguing for the wrong people to be made to pay. Hopefully, those are relatively objective measures. More subjectively, * It's way too easy to say "if X happens once, we have a problem". If you take the stance that we have to prevent X from *ever* happening, you allow yourself the freedom to argue with vague phrases like "might", while leaving the burden of absolute proofs on me. (In the context of RNG proposals, this is where arguments like "let's implement a secure secret library" get dismissed - they still leave open the possibility of *someone* using an inappropriate RNG, so "they don't solve the issue" - even if they reduce the chance of that happening by a certain amount - and neither you nor I can put a figure on how much, so let's not try). * There's little evidence that I can see of preventative security measures having improved things. Maybe this is because it's an "arms race" situation, and keeping up is all we can hope for. Maybe it's because it's hard to demonstrate a lack of evidence, so the demand for evidence is unreasonable. I don't know. * For many years I ran my PC with no anti-virus software. I never got a virus. Does that prove anything? Probably not. The anti-virus software on my work PC is the source of *far* more issues than I have ever seen caused by a virus. Does *that* prove anything? Again, probably not. But my experience with at least *that* class of pressure to implement security is that the cure is worse than the disease. Where does that leave the burden of proof? Again, I don't know, but my experience should at least be considered as relevant data. * Everyone I have ever encountered in a work context (as opposed to in open-source communities) seems to me to be in a similar situation to mine. I believe I'm speaking for them, but because it's a closed-source in house environment, I've got no public data to back my comments. And totally subjective, * I'm extremely tired of the relentless pressure of "we need to do X, because security". While the various examples of X may all have ended up being essentially of no disadvantage to me, feeling obliged to read, understand, and comment on the arguments presented every time, gets pretty wearing. * I can't think of a single occasion where we *don't* do X. That may well be confirmation bias, but again subjectively, it feels like nobody's listening to the objections. I get that the original proposals get modified, but if never once has the result been "you're right, the cost is too high, we'll not do X" then that puts security-related proposals in a pretty unique position. Finally, in relation to that last point, and one thing I think is a key difference in our thinking. I do *not* believe that security proposals (as opposed to security bug fixes) are different from any other type of proposal. I believe that they should be subject to all the same criteria for acceptance that anything else is. I suspect that you don't agree with that stance, and believe that security proposals should be held to different standards (e.g., a demonstrated *probability* of benefit is sufficient, rather than evidence of actual benefit being needed). But please speak for yourself on this - I'm not trying to put words into your mouth, it's just my impression. All of which is completely unrelated to either the default RNG for the Python stdlib, or whether I understand and/or accept the security arguments presented here (for clarity, I believe I understand them, I just don't accept them). Paul

On 15 September 2015 at 08:50, Emile van Sebille <emile@fenx.com> wrote:
Historically, yes, but relying solely on perimeter defence is becoming less and less viable as the workforce decentralises, and we see more people using personal devices and untrusted networks to connect to work systems (whether that's their home network or the local coffee shop), as well as relying on public web services rather than internal applications. Enterprise IT is simply *wrong* in the way we currently go about a lot of things, and the public web service sector is showing us all how to do it right. Facilitating that transition is a key part of my day job in Red Hat's Developer Experience team (it's getting a bit off topic, but for a high level company perspective on that: http://www.redhat-cloudstrategy.com/towards-a-frictionless-it-whether-you-li...). And for folks tempted to think "this is just about the web", for a non-web related example of what we as an industry have unleashed through our historical "security is optional" mindset: http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ That's an article on remotely hacking the UConnect system in a Jeep Cherokee to control all sorts of systems that had no business being connected to the internet in the first place. The number of SCADA industrial control systems accessible through the internet is frankly terrifying - one of the reasons we can comfortably assume most humans are either nice or lazy is because we *don't* see most of the vulnerabilities that are lying around being exploited. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On September 14, 2015 at 6:39:28 PM, Paul Moore (p.f.moore@gmail.com) wrote:
No, I don’t mean it in the way of you being selfish. I'm not quite sure the right wording here, essentially the probably of an event happening to a particular indivdual vs the probablity of an event occuring at all. To use your lottery example, I *think*, and perhaps I'm wrong, that you're looking at it in terms of, the chance of any particular person participating in the lottery winning the lottery is low, so why should each of these people, as an invidual make plans for how to get the money when they win the lottery, because as inviduals they are unlikely to win. Whereas I flip it around and think, that someone, somewhere is likely going to win the lottery, so the lottery system should make plans for how to get them the money when they win. I'm not sure the right "name" for each type, and I don't want to continue to try and hamfist it, because I don't mean it in an offensive or an "I'm better than you" way and I fear putting my foot in my mouth again :(
Just to be clear, I don’t think that "If X happens once, it's a problem" is a reasonable belief and I don't personally have that belief. It's a sliding scale where we need to figure out where the right solution for Python is for each particular problem. I certainly wouldn't want to use a language that took the approach that if X can ever happen, we need to prevent X. I have seen a number of users incorrectly use the random.py module to where I think that the danger is "real". I also think that, if this were a brand new module, it would be a no brainer (but perhaps I'm wrong) for the default, module level to have a safe by default API. Going off that assumption then I think the question is really just "Is it worth it?" not "does this make more sense then the current?".
By preventive security measures, do you mean things like PEP 466? I don't quite know how to accurately state it, but I'm certain that PEP 466 directly improved the security of the entire internet (and continues to do so as it propagates).
Antivirus is a particularly bad example of security software :/ It's a massive failing of the security industry that they exist in the state they do. There's a certain bias here though, because it is the job of security sensitive code to "break" things (as in, take otherwise valid input and make it not work). In an ideal world, security software just sits there doing "nothing" from the POV of someone who isn't a security engineer and then will, often through no fault of their own, pop and and make things go kabloom because it detected something insecure happening. This means that for most people, the only interaction they have with something designed to protect them, is when it steps in to make things stop working. It is relevant data, but I think it goes back to the different way of looking at things (what is the individual chance of an event happening, vs the chance of an event happening across the entire population). This might also be why you'll see the backwards compat folks focus more on experienced driven data and security folks focus more on hypotheticals about what could happen.
I'm not sure what to do about this :( On one side, you're not obligated to read, understand, and comment on every thing that's raised but I totally understand why you do, because I do too, but I'm not sure how to help this without saying that people who care about security shouldn't bring it up either?
Off the top of my head I remember the on by default hash randomization for Python 2.x (or the actual secure hash randomization since 2.x still has the one that is trivial to recover the original seed). I don't actually remember that many cases where python-dev choose to broke backwards compatability for security. The only ones I can think of are: * The hash randomization on Python 3.x (sort of? Only if you depended on dict ordering, which wasn't a guarentee anyways). * The HTTPS improvements where we switched Python to default to default to verifying certificates. * The backports of several security features to 2.7 (backport of 3.4's ssl module, hmac.compare_digest, os.urandom's persistent FD, hashlib.pbkdf2_hmac, hashlib.algorithms_guaranteed, hashlib.algorithms_available). There are probably things that I'm not thinking of, but the hash randomization only broke things if you were depending on dict/set having ordering which isn't a promised property of dict/set. The backports of security features was done in a pretty minimally invasive way where it would (ideally) only break things if you relied on those names *not* existing on Python 2.7 (which was a nonzero but small set). The HTTPS verification is the main thing I can think of where python-dev actually broke backwards compatibility in an obvious way for people relying on something that was documented to work a particular way. Are there example I'm not remembering (probably!)? It doesn't feel like 2 sort of backwards incompatible changes and 1 backwards incompatible change in the lifetime of Python is really that much to me? Is there some cross over between distutils-sig maybe? I've focused a lot more on pushing security on that side of things both because it personally affects me more and because I think insecure defaults there are a lot worse than insecure defaults in any particular module in the Python standard library.
Well, I think that all proposals are based on what the probability is it's going to help some particular percentage of people, and whether it's going to help enough people to be worth the cost. What I think is special about security is the cost of *not* doing something. Security "fails open" in that if someone does something insecure, it's not going to raise an exception or give different results or something like that. It's going to appear to "work" (in that you get the results you expect) while the user is silently insecure. Compare this to, well let's pretend that there was never a deterministic RNG in the standard library. If a scientist or a game designer inappropiately used random.py they'd pretty quickly learn that they couldn't give the RNG a seed, and that even if it was a CSPRNG that had an "add_seed" method that might confuse them it'd be pretty obvious on the second execution of their program that it's giving them a different result. I think that the bar *should* be lower for something that just silently or subtlety does the "wrong" thing vs something that obviously and loudly does the wrong thing. Particularly when the downside of doing the "wrong" thing is as potentionally disasterous as it is with security. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On September 14, 2015 at 8:14:33 PM, Donald Stufft (donald@stufft.io) wrote:
This should read: Security "fails open" in that if someone uses an API that allows something insecure to happen (like not validating HTTPS) it's not going to raise an exception or give different results or something like that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 15 September 2015 at 08:39, Paul Moore <p.f.moore@gmail.com> wrote:
Most of the time, when the cost of change is clearly too high, we simply *don't ask*. hmac.compare_digest() is an example of that, where having a time-constant comparison operation readily available in the standard library is important from a security perspective, but having standard equality comparisons be as fast as possible is obviously more important from a language design perspective. Historically, it was taken for granted that backwards compatibility concerns would always take precedence over improving security defaults, but the never-ending cascade of data breaches involving personally identifiable information are proving that we, as a collective industry are *doing something wrong*: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-bre... A lot of the problems we need to address are operational ones as we upgrade the industry from a "perimiter defence" mindset to a "defence in depth" mindset, and hence we have things like continuous integration, continuous deployment, application and service sandboxing, containerisation, infrastructure-as-code, immutable infrastructure, etc, etc, etc. That side of things is mostly being driven by infrastructure software vendors (whether established ones or startups), where we have the fortunate situation that the security benefits are tied in together with a range of operational efficiency and capability benefits [1]. However, there's also increasing recognition that some of the problems are due to the default behaviours of the programming languages we use to *create* applications, and in particular the fact that many security issues involve silent failure modes. Sometimes the right answer to those is to turn the silent failure into a noisy failure (as with certificate verification in PEP 476), other times it is about turning the silent failure into a silent success (as is being proposed for the random module API), and yet other times it is simply about lowering the barriers to someone doing the right thing once they're alerted to the problem (as with the introduction of hmac.compare_digest() and ssl.create_default_context(), and their backports to the Python 2.7 series) At a lower level, languages like Go and Rust are challenging some of the assumptions in the still dominant C-based memory management model for systems programming. Rust in particular is interesting in that it has a much richer compile time enforced concept of memory ownership than C does, while still aiming to keep the necessary runtime support very light. Regards, Nick. [1] For folks wanting more background on some of the factors this shift, I highly recommend Google's "BeyondCorp" research paper: http://research.google.com/pubs/pub43231.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14 September 2015 at 23:39, Paul Moore <p.f.moore@gmail.com> wrote:
(The rest of your emails, I'm going to read fully and digest before responding. Might take a day or so.)
Point by point responses exhaust and frustrate me, and don't really serve much purpose other than to perpetuate the debate. So I'm going to make some final points, and then stop. This is based on having read the various emails responding to my earlier comments. If it looks like I haven't read something, please assume I have but either you didn't get your point across, or maybe I simply don't agree with you. Why now? -------- First of all, the big question for me is why now? The random module has been around in its current form for many, many years. Security issues are not new, maybe they are slowly increasing, but there's been no step change. The only thing that seems to have changed is that someone (Theo) has drawn attention to the random module. So I feel that the onus is on the people proposing change to address that. Show me the evidence that we've had an actual problem for many years, and demonstrate that it's a good job we spotted it at last, and now have a chance to fix it. Explain to me what has been going wrong all these years that I'd never even noticed. Arguments that people are misusing the module aren't sufficient in themselves - they've (presumably) been doing that for years. In all that time, who was hacked? Who lost data? As a result of random.random being a PRNG rather than being crypto-secure? I'm not asking for an unassailable argument, just acknowledgement that it's *your* job to address that question, and not mine to persuade you that "we've been alright so far" is a compelling reason to reject your proposal. Incorrect code on SO etc ------------------------ As regards people picking up insecure code snippets from the internet and using them, there's no news there. I can look round and find hundreds of bits of incorrect code in any area you want. People copy/paste garbage code all the time. To my embarassment, I've done it myself in the past :-( But I'm reminded of https://xkcd.com/386/ - "somebody is wrong on the internet!" This proposal, and in particular the suggestion that we need to retrospectively make the code snippets quoted here secure, strikes me as a huge exercise in trying to correct all the people who are wrong on the internet. There's certainly value in "safe by default" APIs, I don't disagree with that, but I honestly fail to see how quoting incorrect code off the internet is a compelling argument for anything. Millions of users are affected ------------------------------ The numbers game is also a frustrating exercise here. We keep hearing that "millions of users are affected by bad code", that scans of Google almost immediately find sites with vulnerabilities. But I don't see anyone pointing at a single documented case of an actual exploit caused by Python's random module. There's no bug report. There's no security alert notification. How are those millions of users affected? Their level of risk is increased? Who can tell that? Are any of the sites identified holding personal data? Not all websites on the internet are *worth* hacking. And I feel that expressing that view is somehow frowned on. That "it doesn't matter" is an unacceptable view to hold. And so, the responses to my questions feel personal, they feel like criticisms of me personally, that I'm being unprofessional. I don't want to make this a big deal, but the code of conduct says "we're tactful when approaching differing views", and it really doesn't feel like that. I understand that the whole security thing is a numbers game. And that it's about assessing risk. But what risk is enough to trigger a response? A 10% increased chance of any given website being hacked? 5%? 1%? Again, I'm not asking to use the information to veto a change. I'm asking to *understand your position*. To better assess your arguments, so that I can be open to persuasion, and to *agree* with you, if your arguments are sound. Furthermore, should we not take into account other languages and approaches at this point? Isn't PHP a well-known "soft target"? Isn't phishing and social engineering the best approach to hacking these days, rather than cracking RNGs? I don't know, and I look to security experts for advice here. So please explain for me, how are you assessing the risks, and why do you judge this specific risk high enough to warrant a response? The impression I get is that the security view is that *any* risk, no matter how small, once identified, warrants a response. "Do nothing" is never an option. If that's your position, then I'm sorry, but I simply don't agree with you. I don't want to live in a world that paranoid, and I'm unsure how to get past this point to have a meaningful dialog. History, and security's "bad rep" --------------------------------- Donald asked if I was experiencing some level of spill-over from distutils-sig, where there has *also* been a lot of security churn (far more than here). Yes, I am. No doubt about that. On distutils-sig, and pip in particular, it's clear to see a lot of frustration from users with the long-running series of security changes. The tone of bug reports is frustrated and annoyed. Users want a break from being forced to make changes. Outside of Python, and speaking purely from my own experience in the corporate world, security is pretty uniformly seen as an annoying overhead, and a block on actually getting the job done. You can dismiss that as misguided, but it's a fact. "We need to do this for security" is a direct challenge to people to dismiss it as unnecessary, and often to immediately start looking for ways to bypass the requirement "so that it doesn't get in the way". I try not to take that attitude in this sort of debate, but at the same time, I do try to *represent* that view and ask for help in addressing it. The level of change in core Python is far less than on distutils-sig, and has been relatively isolated from "non-web" areas. People understand (and are grateful for) increases in "secure by default" behaviour in code like urllib and ssl. They know that these are places where security is important, where getting it right is harder than you'd think, and where trusting experts to do the hard thinking for you is important. But things like hash randomisation and the random module are less obviously security related. The feedback from hash randomisation focused on "why did you break my code?". It wasn't a big deal, people were relying on undocumented behaviour and accepted that, but they did see it as a breakage from a security fix. I expect the same to be true with the random module, but with the added dimension that we're proposing changing documented behaviour this time. As a result of similar arguments applying to every security change, and those arguments never *really* seeming to satisfy people, there's a lot of reiterated debate. And that's driving interested but non-expert people away from contributing to the discussion. So we end up with a lack of checks and balances because people without a vested interest in tightening security "tune out" of the debates. I see that as a problem. But ultimately, if we can't find a better way of running these discussions, I don't know how we fix it. I certainly can't continue being devil's advocate every time. Anyway, that's me done on this thread. I hope I've added more benefit than cost to the discussion. Thanks to everyone for responding to my questions - even if we all felt like we were both just repeating the same thing, it's a lot of effort doing so and I appreciate your time. Paul

Paul Moore <p.f.moore@...> writes: [snip well-reasoned paragraphs] I want to add that the dichotomy between "security-minded" and "non-security-minded" that has been used for rhetoric purposes has no basis in reality. Several "non-security-minded" devs (of the kind who have *actually* contributed a lot of code to CPython) have a pretty good grasp of cryptography and just don't like security theater. Stefan Krah

On 15 September 2015 at 13:08, Stefan Krah <skrah@bytereef.org> wrote:
Agreed, and every time I ended up looking for words for the two "sides", I ended up feeling uncomfortable. There are no "sides" here, just a variety of people with a variety of experiences, who want to feel assured that their voices are being heard. Paul

On September 15, 2015 at 7:04:52 AM, Paul Moore (p.f.moore@gmail.com) wrote:
The answer to "Why Now?"" is basically because someone brought it up. I realize that's a pretty arbitrary thing but I'm not sure what answer would even be acceptable here. When is an OK time to do it in your eye? Is it only after there is a public, known attack against the RNG? Is it only when the module is first being added? The sad state of affairs is that it's only been relatively recently that our industry as a whole has really taken security seriously so there is a lot of things out there that are not well designed from a security POV. We can't go back in time and change the original mistake, but we can repair it going into the future.
The argument is basically that security is an important part of API design, and that if you look at what people are doing in practice, it gives you an idea of how people think they should use the API. It's kind of like looking at a situation like this: https://i.imgur.com/0gnb7Us.jpg and concluding that maybe we should pave that worn down footpath, because people are going to use it anyways.
So a big part of this is certainly preventative. It's a fairly relatively recent development that hacking went from indivduals or small teams doing it to big targets to a business on it's own. There are literally giant office complexes in places like Russia and China filled with employees in cubicles, but they aren't writing software like at a normal company, they are just trawling around the internet, looking for targets, trying to expand botnets looking for anything and everything they can get their hands on. It's also true that there isn't going to be a big fanfaire for *most* actual hacked computers/sites. Most of the time the people running the site simply won't ever know, they'll just be silently hosting malware or having their user's passwords being fed into other sites. It's very few exploits that actually get noticed and when noticed it's unlikely they get public attention. I'd also suggest that for changes like these, if someone was exploited by this they'd probably look at the documentation for random.py and see that they were accidently using the module wrong, and then blame themselves and not ever bother to file a bug report. It is my opinion that it's not really their fault that the API lead them to believe that what they were doing was right.
Actually, all sites on the internet *are* worth hacking, depending on what you call hacking. Malware is constantly being hosted on tiny sites that most wouldn't call "worth" hacking, but malware authors were able to hack in some way and then they uploaded their malware there. If there are user logins it's likely that people reused username and passwords, so if you can get the passwords from one smaller site, it's possible you can use that as a door into a larger, more important site. Plus, there's also the desire for botnets to add more and more nodes into their swarm, they don't care what site you're hosting, they just want the machine. One key problem to the security of the internet as a whole is that there are a lot of small sites without dedicated security teams, or anyone who really knows security at all. These are easy targets for people and most languages and libraries make it far too easy for people to do the wrong thing.
It's basically a gut feeling since we can't get any hard data here. Things like being able to look online and find code in the wild that does this wrong within minutes gives us an idea at how likely it is as well as reasoning about what people who don't know what the difference is between ``random.random()`` and ``random.SystemRandom().random()`` as well as just a little bit of guessing based on experience with similar situations. Another input into this equation is how much it's likely that this change would break someone and once broken, how easy it will be to fix things. I sadly can't give anything more specific than that here, because it's a bit of an artform crossed with personal biases :(
Do nothing is absolutely an option, but most security focused folks don't take a scorched earth view of security so we often times don't bother to even mention a possible change unless we think that doing nothing is the wrong answer. An example going back to PEP 476 where we enabled TLS verification by default on HTTPS, we limited it to *only* HTTPS even though TLS is used by many other protocols because it was our opinion that doing nothing for those protocols was the right call. Those are protocols are still insecure by default, but doing something about that by default would break too much for us to be willing to even suggest it. On top of that, we tend to want to prioritize the things we do try to have happen, so we focus on things with the smallest fallout or the biggest upsides and we ignore other things until later. This is probably why there's some bias that it looks like doing nothing is an option, because we already self select what we choose to push forward because we *do* care about backwards compatability too.
I think a lot of these changes are paying down technical debt of two decades of (industry standard) lack of focus on security. It sucks, but when we come out the other side (because hopefully, new APIs and modules will be better designed with security in mind given our new landscape) we should hopefully be in a much better situation. In the distutils-sig side, I think that PEP 470 was the last breaking change that I can think of that we'll need to do in the name of security, we've paid down that particular bit of technical debt, and once that lands we'll have a pretty decent story. We still have other kinds of techincal debt to pay down though :(
Things don't really satisify people because they often times fundamentally don't care about security. That is perfectly reasonable, so don't think that I expect everyone to care about security, but they simply don't. However, In my opinion we have a moral obligation to try and do what we reasonably can to protect people. It's a bit like social safety nets, one person might ask why they are being asked to pay taxes, after all they never needed government assistance but by asking every citizen to pay in, they can try and help people from falling through the cracks. This isn't a social safety net, it's a security safety net.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 15 September 2015 at 01:01, Paul Moore <p.f.moore@gmail.com> wrote:
This may be at the core of the disagreement, as we're not talking "one or two people", we're talking tens of millions. While wearing my "PSF Director" hat, I spend a lot of time talking to professional educators, and recently organised the first "Python in Education" miniconf at PyCon Australia. If you look at the inroads we're making across primary, secondary and tertiary education, as well as through workshops like Software Carpentry and DjangoGirls, a *lot* of people around the world are going to be introduced to text based programming over the coming decades by way of Python. That level of success brings with it a commensurate level of responsibility: if we're setting those students up for future security failures, that's *on us* as language designers, not on them for failing to learn to avoid traps we've accidentally laid for them (because *we* previously didn't know any better). Switching back to my "security wonk" hat, the historical approach to computer security has been "secure settings are opt in, so only qualified experts should be allowed to write security sensitive software". What we've learned as an industry (the hard way) is that this approach *doesn't work*. The main reason it doesn't work is the one that was part of the rationale for the HTTPS changes in PEP 476: when security failures are silent by default, you generally don't find out that you forgot to flip the "I need this to be secure" switch until *after* the system you're responsible for has been compromised (with whatever consequences that may have for your users). The law of large numbers then tells us that even if (for example) only 1 in 1000 people forget to flip the "be secure" switch when they needed it (or don't even know that the switch *exists*), it's a practical certainty that when you have millions of programmers using your language (and you don't climb to near the top of the IEEE rankings without that), you're going to be hitting that failure mode regularly as a collective group. We have the power to mitigate that harm permanently *just by changing the default behaviour of the random module*. However, that has a cost: it causes problems for some current users for the sake of better serving future users. That's what transition strategy design is about, and I'll take that up in the other thread. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On September 10, 2015 at 10:21:11 AM, Donald Stufft (donald@stufft.io) wrote:
I wanted to try and test this. These are not super scientific since I just ran them on a single computer once (but 10 million iterations each) but I think it can probably give us an indication of the differences? I put the code up at https://github.com/dstufft/randtest but it's a pretty simple module. I'm not sure if (double)arc4random() / UINT_MAX is a reasonable way to get a double out of arc4random (which returns a uint) that is between 0.0 and 1.0, but I assume it's fine at least for this test. Here's the results from running the test on my personal computer which is running the OSX El Capitan public Beta: $ python test.py Number of Calls: 10000000 +---------------+--------------------+ | method | usecs per call | +---------------+--------------------+ | deterministic | 0.0586802460020408 | | system | 1.6681434757076203 | | userland | 0.1534261149005033 | +---------------+--------------------+ I'll try it against OpenBSD later to see if their implementation of arc4random is faster than OSX. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

[Donald Stufft <donald@stufft.io>, on arc4random speed]
arc4random() specifically returns uint32_t, which is 21 bits shy of what's needed to generate a reasonable random double. Our MT wrapping internally generates two 32-bit uint32_t thingies, and pastes them together like so (Python's C code here): """ /* random_random is the function named genrand_res53 in the original code; * generates a random number on [0,1) with 53-bit resolution; note that * 9007199254740992 == 2**53; I assume they're spelling "/2**53" as * multiply-by-reciprocal in the (likely vain) hope that the compiler will * optimize the division away at compile-time. 67108864 is 2**26. In * effect, a contains 27 random bits shifted left 26, and b fills in the * lower 26 bits of the 53-bit numerator. * The orginal code credited Isaku Wada for this algorithm, 2002/01/09. */ static PyObject * random_random(RandomObject *self) { PY_UINT32_T a=genrand_int32(self)>>5, b=genrand_int32(self)>>6; return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0)); } """ So now you know how to make it more directly comparable. The high-order bit is that it requires 2 calls to the 32-bit uint integer primitive to get a double, and that can indeed be significant.
Just noting that most people timing the OpenBSD version seem to comment out the "get stuff from the kernel periodically" part first, in order to time the algorithm instead of the kernel ;-) In real life, though, they both count, so I like what you're doing better.

On September 10, 2015 at 1:24:05 PM, Tim Peters (tim.peters@gmail.com) wrote:
It didn’t change the results really though: My OSX El Capitan machine: Number of Calls: 10000000 +---------------+---------------------+ | method | usecs per call | +---------------+---------------------+ | deterministic | 0.05792283279588446 | | system | 1.7192466521984897 | | userland | 0.17901834140066059 | +---------------+——————————+ An OpenBSD 5.7 VM: Number of Calls: 10000000 +---------------+---------------------+ | method | usecs per call | +---------------+---------------------+ | deterministic | 0.06555143180000868 | | system | 0.8929547749999983 | | userland | 0.16291017429998647 | +---------------+---------------------+ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Sep 10, 2015, at 07:21, Donald Stufft <donald@stufft.io> wrote:
But that isn't a fix, unless all your code is in a single module. If I call random.seed in game.py and then call random.choice in aiplayer.py, I'll get different results after your fix than I did before. What I'd need to do instead is create a separate myrandom.py that does this and then exports all of the bound methods of random as top-level functions, and then make game.py, aiplayer.py, etc. all import myrandom as random. Which is, while not exactly hard, certainly harder, and much less obvious, than the incorrect fix that you've suggested, and it may not be immediately obvious that it's wrong until someone files a bug three versions later claiming that when he reloads a game the AI cheats and you have to track through the problem. That's why I suggested the set_default_instance function, which makes this problem trivial to solve in a correct way instead of in an incorrect way.

On Sep 10, 2015, at 15:46, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Actually, I just thought of an even simpler solution: Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random".

On 11 September 2015 at 08:54, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Actually, I just thought of an even simpler solution:
Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random".
Change the spelling to "import random.seeded_random as random" and the user fix is even shorter. I do agree with the idea of continuing to provide a process global instance of the current PRNG for ease of migration - changing a single import is a good way to be able to address a deprecation, and looking for the use of seeded_random in a security sensitive context would still be fairly straightforward. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 10, 2015, at 19:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
OK, sure; I don't care much about the spelling. I think neither name will be unduly confusing to novices, and anyone who actually wants to understand what the choice means will use help or the docs or a Google search and find out in a few seconds.
Personally, I think we're done with that change. Deprecation of the names random.Random, random.random(), etc. is sufficient to prevent people from making mistakes without realizing it. Having a good workaround to prevent code churn for the thousands of affected apps means the cost doesn't outweigh the benefits. So, the problem Theo raised is solved.[1] Which means the more radical solution he offered is unnecessary. Unless we're seriously worried that some people who aren't sure if they need Seeded or System may incorrectly choose Seeded just because of performance, there's no need to add a Chacha choice alongside them. Put it on PyPI, maybe with a link from the SystemRandom docs, and see how things go from there. [1] Well, it's not quite solved, because someone has to figure out how to organize things in the docs, which obviously need to change. Do we tell people how to choose between creating a SeededRandom or SystemRandom instance, then describe their interface, and then include a brief note "... but for porting old code, or when you explicitly need a globally shared Seeded instance, use seeded_random"? Or do we present all three as equally valid choices, and try to explain why you might want the singleton seeded_random vs. constructing and managing an instance or instances?

On 11 September 2015 at 13:18, Andrew Barnert <abarnert@yahoo.com> wrote:
Personally, I think we're done with that change. Deprecation of the names random.Random, random.random(), etc. is sufficient to prevent people from making mistakes without realizing it.
Implementing dice rolling or number guessing for a game as "from random import randint" is *not* a mistake, and I'm adamantly opposed to any proposal that makes it one - the cost imposed on educational use cases would be far too high. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
Implementing dice rolling or number guessing for a game as "from random import randint" is *not* a mistake,
Turning the number guessing game into a text CAPTCHA might be one, though. That randint may as well be crypto strong, modulo the problem that people who use an explicit seed get punished for knowing what they're doing. I suppose it would be too magic to have the seed method substitute the traditional PRNG for the default, while an implicitly seeded RNG defaults to a crypto strong algorithm? Steve

On Fri, Sep 11, 2015 at 2:44 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Ooh. Actually, I rather like that idea. If you don't seed the RNG, its output will be unpredictable; it doesn't matter whether it's a PRNG seeded by an unknown number, a PRNG seeded by /dev/urandom, a CSRNG, or just reading from /dev/urandom every time. Until you explicitly request determinism, you don't have it. If Python changes its RNG algorithm and you haven't been seeding it, would you even know? Could it ever matter to you? It would require a bit of an internals change; is it possible that code depends on random.seed and random.randint are bound methods of the same object? To implement what you describe, they'd probably have to not be. ChrisA

2015-09-11 6:54 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
I have thought of this idea and was quite seduced by it. However in this case on a non seeded generator, getstate/setstate would be meaningless. I also wonder what pickling generators does.

On Fri, Sep 11, 2015 at 6:54 AM, Chris Angelico <rosuav@gmail.com> wrote:
I've also thought about this idea. The problem with it is that seed() and friends affect a global instance of Random. If, after this change, there was a library that used random.random() for crypto, calling seed() in the main program (or any other library) would make it insecure. So we'd still be in a situation where nobody should use random() for crypto.

On Fri, Sep 11, 2015 at 6:08 PM, Petr Viktorin <encukou@gmail.com> wrote:
So library functions shouldn't use random.random() for anything they know needs security. If you write a function generate_password(), the responsibility is yours to ensure that it's entropic rather than deterministic. That's no different from the current situation (seeding the RNG makes it deterministic) except that the unseeded RNG is not just harder to predict, it's actually entropic. In some cases, having the 99% by default is a barrier to people who need the 100%. (Conflating UCS-2 with Unicode deceives people into thinking their program works just fine, and then it fails on astral characters.) But in this case, there's no perfect-by-default solution, so IMO the best two solutions are: Be great, but vulnerable to an external seed(), until someone chooses; or have no random number generation until someone chooses. We know that the latter is a terrible option for learning, so vulnerability to someone else calling random.seed() is a small price to pay. ChrisA

On Fri, Sep 11, 2015, at 00:54, Chris Angelico wrote:
That's a ridiculous thing to depend on.
To implement what you describe, they'd probably have to not be.
You could implement one class that calls either a SystemRandom instance or an instance of another class depending on which mode it is in.

On 11 September 2015 at 05:44, Stephen J. Turnbull <stephen@xemacs.org> wrote:
One issue with that - often, programs simply use a RNG for their own purposes, but offer a means of getting the seed after the fact for reproducibility reasons (the "map seed" case, for example). Pseudo-code: if <user supplied a "seed">: state = <user-supplied value> random.setstate(state) else: state = random.getstate() ... do the program's main job, never calling seed/setstate if <user requests the "seed">: print state So getstate (and setstate) would also need to switch to a PRNG. There's actually very few cases I can think of where I'd need seed() (as opposed to setstate()). Maybe if I let the user *choose* a seed Some games do this. Paul

On Fri, Sep 11, 2015 at 1:02 AM, Paul Moore <p.f.moore@gmail.com> wrote:
You don't really want to use the full 4992 byte state for a "map seed" application anyway (type 'random.getstate()' in a REPL and watch your terminal scroll down multiple pages...). No game actually uses map seeds that look anything like that. I'm 99% sure that real applications in this category are actually using logic like: if <user supplied a "seed">: seed = user_seed() else: # use some RNG that was seeded with real entropy seed = random_short_printable_string() r = random.Random(seed) # now use 'r' to generate the map -n -- Nathaniel J. Smith -- http://vorpus.org

On 11 September 2015 at 10:52, Nathaniel Smith <njs@pobox.com> wrote:
Yeah, good point. As I say, I don't actually *use* this in the example program I'm thinking of, I just know it's a feature I need to add in due course. So when I do, I'll have to look into how to best implement it. (And I'll probably nick the approach you show above, thanks ;-)) Paul

On 11 September 2015 at 11:07, Andrew Barnert <abarnert@yahoo.com> wrote:
But games do store the entire map state with saved games if they want repeatable saves (e.g., to prevent players from defeating the RNG by save scumming).
So far off-topic it's not true, but a number of games I know of (e.g., Factorio, Minecraft) include a means to get a map seed (a simple text string) which you can publish, that allows other users to (in effect) play on the same map as you. That's different from saves. Paul

On Fri, Sep 11, 2015, at 06:10, Paul Moore wrote:
Of course, Minecraft doesn't actually use the seed in such a simple way as seeding a single-sequence random number generator. If it did, the map would depend on what order you visited regions in. (This is less of an issue for games with finite worlds)

On 10 September 2015 at 23:46, Andrew Barnert <abarnert@yahoo.com> wrote:
Note that this is another case of wanting "correct by default". Requiring the user to pass around a RNG object makes it easy to do the wrong thing - because (as above) people can too easily create multiple independent RNGs by mistake, which means your numbers don't necessarily satisfy the randomness criteria any more. "Secure by default" isn't (and shouldn't be) the only example of "correct by default" that matters here. Whether "secure" is more important than "gives the right results" is a matter of opinion, and application dependent. Password generators have more need to be secure than to be mathematically random, Monte Carlo simulations (and to a lesser extent games) the other way around. Many things care about neither. If we can't manage "correct and secure by default", someone (and it won't be me) has to decide which end of the scale gets preference. Paul.

On Fri, Sep 11, 2015 at 1:11 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Accidentally creating multiple independent RNGs is not going to cause any problems with respect to randomness. It only creates a problem with respect to determinism/reproducibility. Beyond that I just find your message a bit baffling. I guess I believe you that you find passing around RNG objects to make it easy to do the wrong thing, but it's exactly the opposite of my experience: when writing code that cares about determinism/reproducibility, then for me, passing around RNG objects makes it way *easier* to get things right. It makes it much more obvious what kinds of refactoring will break reproducibility, and it enables all kinds of useful tricks. E.g., keeping to the example of games and "aiplayer.py", a common thing game designers want to do is to record playthroughs so they can be replayed again as demos or whatever. And a common way to do that is to (1) record the player's inputs, (2) make sure that the way the game state evolves through time is deterministic given the players inputs. (This isn't necessarily the *best* strategy, but it is a common one.) Now suppose we're writing a game like this, and we have a bunch of "enemies", each of whose behavior is partially random. So on each "tick" we have to iterate through each enemy and update its state. If we are using a single global RNG, then for correctness it becomes crucial that we always iterate over all enemies in exactly the same order. Which is a mess. A better strategy is, keep one global RNG for the level, but then when each new enemy is spawned, assign it its own RNG that will be used to determine its actions, and seed this RNG using a value sampled from the global RNG (!). Now the overall pattern of the game will be just as random, still be deterministic, and -- crucially -- it no longer matters what order we iterate over the enemies in. I particularly would not want to use the global RNG in any program that was complicated enough to involve multiple modules. Passing state between inter-module calls using a global variable is pretty much always a bad plan, and that's exactly what you're talking about here. Non-deterministic global RNGs are fine, b/c they're semantically stateless; it's exactly the cases where you care about the determinism of the RNG state that you want to *stop* using the global RNG. -n -- Nathaniel J. Smith -- http://vorpus.org

On Fri, Sep 11, 2015 at 8:26 PM, Nathaniel Smith <njs@pobox.com> wrote:
As long as the order you seed their RNGs is deterministic. And if you can do that, can't you iterate over them in a deterministic order too? ChrisA

On Thu, Sep 10, 2015 at 09:10:09AM -0400, Donald Stufft wrote:
Ironically, the spelling mistake in your example is a good example of how this is worse. Another reason why it's worse is that if you create a new instance every single time you need a random number, as you do above, performance is definitely going to suffer. By my timings, creating a new SystemRandom instance each time is around two times slower; creating a new DeterministicRandom (i.e. the current MT default) instance each time is over 100 times slower. Hypothetically, it may even hurt your randomness: it may be that some future (or current) (C)PRNG's quality will be "less random" (biased, predictable, or correlated) because you keep using a fresh instance rather than the same one. TL;DR: Yes, calling `random.choice` is *significantly better* than calling `random.SomethingRandom().choice`. It's better for beginners, it's even better for expert users whose random needs are small, and those whose needs are greater shouldn't be using the later anyway.
Is this a trick question? In the absense of a keylogger and screen reader monitoring my system while I run that code snippet, of course it is safe. In the absence of any credible attack on the password based on how it was generated, of course it is safe.
Nobody is saying that To put that question another way: "If you exclude the case where crypto would
This might be acceptable, although I wouldn't necessarily deprecate the random module.

On 11 September 2015 at 14:36, Steven D'Aprano <steve@pearwood.info> wrote:
I feel like I must have misunderstood you Steven. Didn't you just exclude the attack vector that we're discussing here? What we are saying is that a deterministic PRNG definitionally allows attacks on the password based on how it was generated. The very nature of a deterministic PRNG is that it is possible to predict subsequent outputs based on previous ones, or at least to dramatically constrain the search space. This is not a hypothetical attack, and it's not even a very complicated one. Now, it's possible that the way the system is constructed precludes this attack, but let me tell you that vastly more engineers think that about their systems than are actually right about it. Generally, if the word 'password' appears anywhere near something, you want to keep a Mersenne Twister as far away from it as possible. The concern being highlighted in this thread is that users who don't know what I just said (the vast majority) are at risk of writing deeply insecure code. We think the default should be changed.

On Sat, Sep 12, 2015 at 12:28 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
Only if an attacker can access many passwords generated from the same MT stream, right? If the entire program is as was posted (importing random and using random.choice(), then terminating), then an attack would have to be based on the seeding of the RNG, not on the RNG itself. There simply isn't enough content being generated for you to be able to learn the internal state, and even if you did, the next run of the program will be freshly seeded anyway. ChrisA

On 11 September 2015 at 15:33, Chris Angelico <rosuav@gmail.com> wrote:
Sure, if the entire program is as posted, but we should probably assume it isn't. Some programs definitely are, but I'm not worried about them: I'm worried about the ones that aren't.

On September 11, 2015 at 10:33:55 AM, Chris Angelico (rosuav@gmail.com) wrote:
This is a silly, take that code, stick it in a web application and have it generating API keys or session identifiers instead of passwords, or hell, even passwords or random tokens to reset password or any other such thing. Suddenly you have a case where you have a persistent process, so there isn't a new seed, and the attacker can more or less request an unlimited number of outputs. This isn't some mind boggling uncommon case. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Ah crap. Sorry folks, this post was *not supposed to go to the list* in this state. I'm having some trouble with my mail client (mutt) not saving drafts, so I intended to email it to myself for later editing, and didn't notice that the list was CCed. On Fri, Sep 11, 2015 at 11:36:13PM +1000, Steven D'Aprano wrote: [...] -- Steve

On Thu, Sep 10, 2015, at 08:29, Paul Moore wrote:
I don't understand why. What other word would you use to describe a generator that can be given a specific set of inputs to generate the same exact sequence of numbers every single time? If you want that feature, then you're not going to think "deterministic" means "not good enough". And if you don't want it, you, well, don't want it, so there's really no harm in the fact that you don't choose it. Personally, though, I don't see why we're not talking about calling it MersenneTwister.

On Thu, Sep 10, 2015 at 8:13 AM, <random832@fastmail.us> wrote:
Because while we want to reduce foot guns, we don't want to reduce usability. DeterministicRandom is fairly easy for anyone to understand. I would venture a guess that most people looking for that wouldn't know (or care) what the backing algorithm is. Further, if we stop using mersenne twister in the future, we would have to remove that class name. DeterministicRandom can be agnostic of the underlying algorithm and is friendlier to people who don't need to know or care about what algorithm is generating the numbers, they only need to understand the properties of that generator.

On Thu, Sep 10, 2015, at 09:44, Ian Cordasco wrote:
If we're serious about being deterministic, then we should keep that class under that name and provide a new class for the new algorithm. What's the point of having a deterministic algorithm if you can't reproduce your results in the new version because the algorithm was deleted?

On Thu, Sep 10, 2015 at 8:55 AM, <random832@fastmail.us> wrote:
This is totally off topic. That said as a counter-point: What's the point of carrying around code you don't want people to use if they're just going to use it anyway?

On Sep 10, 2015 5:29 AM, "Paul Moore" <p.f.moore@gmail.com> wrote: [...]
Regarding the "harder to use" point (which is obviously just one of many considerations in this while debate): I trained myself a few years ago to stop using the global random functions and instead always pass around an explicit RNG object, and my experience is that once I got into the habit it gave me a strict improvement in code quality. Suddenly many more of my functions are deterministic ... well ... functions ... of their inputs, and suddenly it's clearly marked in the source which ones have randomness in their semantics, and suddenly it's much easier to do things like refactor the code while preserving the output for a given seed. (This is tricky because just changing the order in which you do things can break your code. I wince in sympathy at people who have to maintain code like your map-generation-from-a-seed example and *aren't* using RNG objects explicitly.) The implicit global RNG is a piece of global state, like global variables, and causes similar unpleasantness. Now that I don't use it, I look back and it's like "huh, why did I always used to hit myself in the face like that? That wasn't very pleasant." So this is what I teach my collaborators and students now. Most of them just use the global state by default because they don't even know about the OO option. YMMV but that's my experience FWIW. -n

Donald Stufft <donald@stufft.io> writes: ...
"security minded folks" [1] recommend "always use os.urandom()" and advise against *random* module [2,3] despite being aware of random.SystemRandom() [4] i.e., if they are right then *random* module probably only need to care about group #1 and avoid creating the false sense of security in group #3. [1] https://github.com/pyca/cryptography/blob/92d8bd12609586bfa53cf8c7a691e37474... [2] https://cryptography.io/en/latest/random-numbers/ [3] https://github.com/pyca/cryptography/blob/92d8bd12609586bfa53cf8c7a691e37474... [4] https://github.com/pyca/cryptography/issues/2278

On September 10, 2015 at 2:08:46 PM, Akira Li (4kir4.1i@gmail.com) wrote:
Maybe you didn't notice you’re talking to the third name in the list of authors that you linked too, but that documentation is there primarily because the random module's API is problematic and it's easier to recommend people to not use it than to try and explain how to use it safely. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Thu, Sep 10, 2015 at 9:19 PM, Donald Stufft <donald@stufft.io> wrote:
Obviously, I've noticed it but I didn't want to call you out. but that documentation is there primarily because the
"it's easier to recommend people to not use it than to try and explain how to use it safely." that is exactly the point if random.SystemRandom() is not safe to use while being based on "secure" os.urandom() then providing the same API based on (possibly less secure) arc4random() won't be any safer.

On September 10, 2015 at 2:40:54 PM, Akira Li (4kir4.1i@gmail.com) wrote:
"If the mountain won't come to Muhammad then Muhammad must go to the mountain." In other words, we can write all the documentation in the world we want, and it doesn't change the simple fact that by choosing a default, there is going to be some people who will use it when it's inappropiate due to the fact that it is the default. The pratical effect of changing the default will be that some cases are broken, but in a way that is obvious and trivial to fix, some cases won't have any pratical effect at all, and finally, for some people it's going to take code that was previously completely insecure and make it either secure or harder to exploit for people who are incorrectly using the API. I wouldn't expect the documentation in pyca/cryptography to change, it'd still recommend people to use os.urandom directly and we'd still recommend that people should use SystemRandom/os.urandom in the random.py docs for things that need to be cryptographically secure, this is just a safety net for people who don't know or didn't listen. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Deprecating the module-level functions has one problem for backward compatibility: if you're using random across multiple modules, changing them all from this: import random ... to this: from random import DeterministicRandom random = DeterministicRandom() ... gives a separate MT for each module. You can work around that by, e.g., providing your own myrandom.py that does that and then using "from myrandom import random" everywhere, or by stashing a random_inst inside the random module or builtins or something and only creating it if it doesn't exist, etc., but all of these are things that people will rightly complain about. One possible solution is to make DeterministicRandom a module instead of a class, and move all the module-level functions there, so people can just change their import to "from random import DeterministicRandom as random". (Or, alternatively, give it classmethods that create a singleton just like the module global.) For people who decide they want to switch to SystemRandom, I don't think it's as much of a problem, as they probably won't care that they have a separate instance in each module. (And I don't think there's any security problem with using multiple instances, but I haven't thought it through...) So, the change is probably only needed in DeterministicRandom. There are hopefully better solutions than that. But I think some solution is needed. People who have existing code (or textbooks, etc.) that do things the "wrong" way and get a DeprecationWarning should be able to easily figure out how to make their code correct. Sent from my iPhone

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
Of course, this brings to mind the fact that there's *already* an instance stashed inside the random module. At that point, you might as well just keep the module-level functions, and rewrite them to be able to pick up on it if you replace _inst (perhaps suitably renamed as it would be a public variable) with an instance of a different class. Proof-of-concept implementation: class _method: def __init__(self, name): self.__name__ = name def __call__(self, *args, **kwargs): return getattr(_inst, self.__name__)(*args, **kwargs) def __repr__(self): return "<random method wrapper " + repr(self.__name__) + ">" _inst = Random() seed = _method('seed') random = _method('random') ...etc...

On Sep 9, 2015, at 18:25, Random832 <random832@fastmail.com> wrote:
The whole point is to make people using the top-level functions see a DeprecationWarning that leads them to make a choice between SystemRandom and DeterministicRandom. Just making inst public (and dynamically switchable) doesn't do that, so it doesn't solve anything. However, it seems like there's a way to extend it to do that: First, rename Random to DeterministicRandom. Then, add a subclass called Random that raises a DeprecationWarning whenever its methods are called. Then preinitialize inst to Random(), just as we already to. Existing code will work, but with a warning. And the text of that warning or the help it leads to or the obvious google result or whatever can just suggest "add random.inst = random.DeterministicRandom() or random.inst = random.SystemRandom() at the start of your program". That has most of the benefit of deprecating the top-level functions, without the cost of the solution being non-obvious (and the most obvious solution being wrong for some use cases). Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst.

On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst.
+1. A single function call that replaces all the methods adds a minuscule constant to code size, run time, etc, and it's no less readable than assignment to a module attribute. (If anything, it makes it more clearly a supported operation - I've seen novices not realize that "module.xyz = foo" is valid, but nobody would misunderstand the validity of a function call.) ChrisA

On Sep 9, 2015, at 23:08, Chris Angelico <rosuav@gmail.com> wrote:
I was only half-serious about this, but now I think I like it: it provides exactly the fix people are hoping to fix by deprecating the top-level functions, but with less risk, less user code churn, a smaller patch, and a much easier fix for novice users. (And it's much better than my earlier suggestion, too.) See https://gist.github.com/abarnert/e0fced7569e7d77f7464 for the patch, and a patched copy of random.py. The source comments in the patch should be enough to understand everything that's changed. A couple things: I'm not sure the normal deprecation path makes sense here. For a couple versions, everything continues to work (because most novices, the people we're thing to help, don't see DeprecationWarnings), and then suddenly their code breaks. Maybe making it a UserWarning makes more sense here? I made Random a synonym for UnsafeRandom (the class that warns and then passes through to DeterministicRandom). But is that really necessary? Someone who's explicitly using an instance of class Random rather than the top-level functions probably isn't someone who needs this warning, right? Also, if this is the way we'd want to go, the docs change would be a lot more substantial than the code change. I think the docs should be organized around choosing a random generator and using its methods, and only then mention set_default_instance as being useful for porting old code (and for making it easy for multiple modules to share a single generator, but that shouldn't be a common need for novices).

On Sep 10, 2015, at 01:32, Serhiy Storchaka <storchaka@gmail.com> wrote:
Well, the goal of the deprecation idea was to eventually get people to explicitly use instances, so the fact that doesn't work out of the box is a good thing, not a problem. But for people just trying to retrofit existing code, all they have to do is call random.set_default_instance at the top of the main module, and all their other modules can just import what they need this way. Which is why it's better than straightforward deprecation.

On Thu, Sep 10, 2015 at 04:08:09PM +1000, Chris Angelico wrote:
Making monkey-patching the official, recommended way to choose a PRNG is a risky solution, to put it mildly. That means that at any time, some other module that is directly or indirectly imported might change the random number generators you are using without your knowledge. You want a crypto PRNG, but some module replaces it with MT. Or visa versa. Technically, it is true that (this being Python) they can do this now, just by assigning to the random module: random.random = lambda: 9 but that is clearly abusive, and if you write code to do that, you're asking for whatever trouble you get. There's no official API to screw over other callers of the random module behind their back. You're suggesting that we add one.
(If anything, it makes it more clearly a supported operation
Which is exactly why this is a terrible idea. You're making monkey- patching not only officially supported, but encouraged. That will not end well. -- Steve

On Sep 11, 2015, at 06:49, Steven D'Aprano <steve@pearwood.info> wrote:
But that's not the proposal. The proposal is to make explicitly passing around an instance the official, recommended way to choose a PRNG; monkey-patching is only the official, recommended way to quickly get legacy code working: once you see the warning about the potential problem and decide that the problem doesn't affect you, you write one standard line of code at the top of your main script instead of rewriting all of your modules and patching or updating every third-party module you use. As I said later, I think my later suggestion of just having a singleton DeterministicRandom instance (or even a submodule with the same interface) that you can explicitly import in place or random serves the same needs well enough, and is even simpler, and is more flexible (in particular, it can also be used for novices' "my first game" programs), so I'm no longer suggesting this. But that doesn't mean there's any benefit to mischaracterizing the suggestion (especially if Chris or anyone else still supports it even though I don't).

On September 9, 2015 at 8:01:17 PM, Donald Stufft (donald@stufft.io) wrote:
Ok, I've talked to an honest to god cryptographer as well as some other smart folks! Here's the general gist: Using a userland CSPRNG like arc4random is not advisable for things that you absolutely need cryptographic security for (this is group #2 from my original email). These people should use os.urandom or random.SystemRandom as they should be doing now. In addition os.urandom or random.SystemRandom is probably fast enough for most use cases of the random.py module, however it is true that using os.urandom/random.SystemRandom would be slower than MT. It is reasonable to use a userland CSPRNG as a "default" source of randomness or in cases where people care about speed but maybe not about security and don't need determinism. However, they've said that the primary benefit in using a userland CSPRNG for a faster cryptographically secure source of randomness is if we can make it the default source of randomness for a "probably safe depending on your app" safety net for people who didn't read or understand the documentation. This would make most uses of random.random and friends secure but not deterministic. If we're unwilling to change the default, but we are willing to deprecate the module scoped functions and force users to make a choice between random.SystemRandom and random.DeterministicRandom then there is unlikely to be much benefit to also adding a userland CSPRNG into the mix since there's no class of people who are using an ambiguous "random" that we don't know if they need it to be secure or deterministic/fast. So I guess my suggestion would be, let's deprecate the module scope functions and rename random.Random to random.DeterministicRandom. This absolves us of needing to change the behavior of people's existing code (besides deprecating it) and we don't need to decide if a userland CSPRNG is safe or not while still moving us to a situation that is far more likely to have users doing the right thing. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Thu, Sep 10, 2015 at 3:30 AM, Donald Stufft <donald@stufft.io> wrote: [...]
There is one use case that would be hit by that: the kid writing their first rock-paper-scissors game. A beginner who just learned the `if` statement isn't ready for a discussion of cryptography vs. reproducible results, and random.SystemRandom.random() would just become a magic incantation to learn. It would feel like requiring sys.stdout.write() instead of print(). Functions like paretovariate(), getstate(), or seed(), which require some understanding of (pseudo)randomness, can be moved to a specific class, but I don't think deprecating random(), randint(), randrange(), choice(), and shuffle() would not be a good idea. Switching them to a cryptographically safe RNG is OK from this perspective, though.

On Sep 10, 2015, at 00:35, Petr Viktorin <encukou@gmail.com> wrote:
Silently switching them could break a lot of code. I don't think there's any way around making them warn the user that they need to do something. I think the patch I just sent is a good way of doing that: the minimum thing they need to do is a one-liner, which is explained in the warning, and it also gives them enough information to check the docs or google the message and get some understanding of the choice if they're at all inclined to do so. (And if they aren't, well, either one works for the use case you're talking about, so let them flip a coin, or call random.choice.;))

Can I just ask what is the actual problem we are trying to solve here? Python has third party cryptography modules, that bring their own sources of randomness (or cryptography libraries that do the same). Python has a good random library for everything other than cryptography. Why in the heck are we trying to make the random module do something that it is already documented as being a poor choice, where there is already third party modules that do just this? Who needs cryptographic randomness in the standard library anyways (even though one line of code give you access to it)? Have we identified even ONE person who does cryptography in python who is kicking themselves that they cant use the random module as implemented? Is this just indulging a paranoid developer?

On September 10, 2015 at 5:21:29 AM, Alexander Walters (tritium-list@sdamon.com) wrote:
Because there are a situations where you need a securely generated randomness where you are *NOT* "doing cryptography". Blaming people for the fact that the random module has a bad UX that naturally leads them to use it when it isn't appropriate is a shitty thing to do. What harm is there in making people explicitly choose between deterministic randomness and secure randomness? Is your use case so much better than theirs that you thing you deserve to type a few characters less to the detriment of people who don't know any better? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 9/10/2015 07:40, Donald Stufft wrote:
API Breakage. This is not worth the break in backwards compatibility. My use case is using the API that has been available for... 20 years? And for what benefit? None, and it can be argued that it would do the opposite of what is intended (false sense of security and all).

On Wed, Sep 09, 2015 at 08:01:16PM -0400, Donald Stufft wrote: [...]
You're worried about attacks on the random number generator that produces the characters in the password? I think I'm going to have to see an attack before I believe that this is meaningful. Excluding PRNGs that are hopelessly biased ("nine, nine, nine, nine...") or predictable, how does knowing the PRNG help in an attack? Here's a password I just generated using your "corrected" version using SystemRandom: 06XW0X0X (Honest, that's exactly what I got on my first try.) Here's one I generated using the "bad" code snippet: V6CFKCF2 How can you tell them apart, or attack one but not the other based on the PRNG?
Shouldn't it be using a single instance of SystemRandom rather than a new instance for each call? [...]
According to Theo, modern userland CSPRNGs can create random bytes faster than memcpy
That is an astonishing claim, and I'd want to see evidence for it before accepting it. -- Steve

Steven D'Aprano <steve@pearwood.info> writes:
Isn't the only difference between generating a password and generating a key the length (and base) of the string? Where is the line?
That is an astonishing claim, and I'd want to see evidence for it before accepting it.
I assume it's comparing a CSPRNG all of whose state is in cache (or registers, if a large block of random bytes is requested from the CSPRNG in one go, with memcpy of data which must be retrieved from main memory.

On 10 September 2015 at 01:01, Donald Stufft <donald@stufft.io> wrote:
Wrong. There is a fourth basic type. People (like me!) whose code absolutely doesn't have any security issues, but want a simple, convenient, fast RNG. Determinism is not an absolute requirement, but is very useful (for writing tests, maybe, or for offering a deterministic rerun option to the program). Simulation-style games often provide a way to find the "map seed", which allows users to share interesting maps - this is non-essential but a big quality-of-life benefit in such games. IMO, the current module perfectly serves this fourth group. While I accept your point that far too many people are using insecure RNGs in "generate a random password" scripts, they are *not* the core target audience of the default module-level functions in the random module (did you find any examples of insecure use that *weren't* password generators?). We should educate people that this is bad practice, not change the module. Also, while it may be imperfect, it's still better than what many people *actually* do, which is to use "password" as a password on sensitive systems :-( Maybe what Python *actually* needs is a good-quality "random password generator" module in the stdlib? (Semi-serious suggestion...) Paul

On September 10, 2015 at 4:41:56 AM, Paul Moore (p.f.moore@gmail.com) wrote:
This group is the same as #3 except for the map seed thing which is group #1. In particular, it wouldn’t hurt you if the random you were using was cryptographically secure as long as it was fast and if you needed determinism, it would hurt you to say so. Which is the point that Theo was making.
IMO, the current module perfectly serves this fourth group.
Making the user pick between Deterministic and Secure random would serve this purpose too, especially in a language where "In the face of ambiguity, refuse the temptation to guess" is one of the core tenets of the language. The largest downside would be typing a few extra characters, which Python is not a language that attempts to do things in the fewest number of characters.
You cannot document your way out of a UX problem. The problem isn’t people doing this once on the command line to generate a password, the problem is people doing it in applications where they generate an API key, a session identifier, a random password which they then give to their users. If you give a way to get the output of the MT base random enough times, it can be used to determine what every random it generated was and will be. Here’s a game a friend of mine created where the purpose of the game is to essentially unrandomize some random data, which is only possible because it’s (purposely) using MT to make it possible https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia case, it’s a real concern that will absolutely fix some insecure software out there instead of telling them “welp typing a little bit extra once an import is too much of a burden for me and really it’s your own fault anyways”.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10 September 2015 at 12:26, Donald Stufft <donald@stufft.io> wrote:
I don't understand the phrase "if you needed determinism, it would hurt you to say so". Could you clarify?
And yet I know that I would routinely, and (this is the problem) without thinking, choose Deterministic, because I know that my use cases all get a (small) benefit from being able to capture the seed, but I also know I'm not doing security-related stuff. No amount of making me choose is going to help me spot security implications that I've missed. And also, calling the non-crypto choice "Deterministic" is unhelpful, because I *don't* want something deterministic, I want something random (I understand PRNGs aren't truly random, but "good enough for my purposes" is what I want, and "deterministic" reads to me as saying it's *not* good enough...)
What I'm trying to say is that this is an education problem more than a UX problem. Personally, I think I know enough about security for my (not a security specialist) purposes. To that extent, if I'm working on something with security implications, I'm looking for things that say "Crypto" in the name. The rest of the time, I just use non-specialist stuff. It's a similar situation to that of the "statistics" module. If I'm doing "proper" maths, I'd go for numpy/scipy. If I just want some averages and I'm not bothered about numerical stability, rounding behaviour, etc, I'd go for the stdlib statistics package.
To me, that's crypto and I'd look to the cryptography module, or to something in the stdlib that explicitly said it was suitable for crypto. Saying people write bad code isn't enough - how does the current module *encourage* them to write bad code? How much API change must we allow to cater for people who won't read the statement in the docs (in a big red box) "Warning: The pseudo-random generators of this module should not be used for security purposes." (Specifically people writing security related code who won't read the docs).
I don't understand how that game (which is an interesting way of showing people how attacks on crypto work, sure, but that's just education, which you dismissed above) relates to the issue here. And I hope you don't really think that your quote is even remotely what I'm trying to say (I'm not that selfish) - my point is that not everything is security related. Not every application people write, and not every API in the stdlib. You're claiming that the random module is security related. I'm claiming it's not, it's documented as not being, and that's clear to the people who use it for its intended purpose. Telling those people that you want to make a module designed for their use harder to use because people for whom it's not intended can't read the documentation which explicitly states that it's not suitable for them, is doing a disservice to those people who are already using the module correctly for its stated purpose. By the same argument, we should remove the statistics module because it can be used by people with numerically unstable problems. (I doubt you'll find StackOverflow questions along these lines yet, but that's only because (a) the module's pretty new, and (b) it actually works pretty hard to handle the hard corner cases, but I bet they'll start turning up in due course, if only from the people who don't understand floating point...) Paul

On September 10, 2015 at 8:29:16 AM, Paul Moore (p.f.moore@gmail.com) wrote:
I transposed some words, fixed: "If you needed determinism, would it hurt you to say so?"" Essentially, other than typing a little bit more, why is: import random print(random.choice([“a”, “b”, “c”])) better than import random; print(random.DetereministicRandom().choice([“a”, “b”, “C”])) As far as I can tell, you've made your code and what properties it has much clearer to someone reading it at the cost of 22 characters. If you're going to reuse the DeterministicRandom class you can assign it to a variable and actually end up saving characters if the variable you save it to can be accessed at less than 6 characters.
You're allowed to pick DeterministicRandom, you're even allowed to do it without thinking. This isn't about making it impossible to ever insecurely use random numbers, that's obviously a boil the ocean level of problem, this is about trying to make it more likely that someone won't be hit by a fairly easy to hit footgun if it does matter for them, even if they don't know it. It's also about making code that is easier to understand on the surface, for example without using the prior knowledge that it's using MT, tell me how you'd know if this was safe or not: import random import string password = "".join(random.choice(string.ascii_letters) for _ in range(9)) print("Your random password is",)
But you *DO* want something deterministic, the *ONLY* way you can get this small benefit of capturing the seed is if you can put that seed back into the system and get a deterministic result. If the seed didn’t exactly determine the output of the randomness then you wouldn't be able to do that. If you don't need to be able to capture the seed and essentially "replay" the PRNG in a deterministic way then there is exactly zero downsides to using a CSPRNG other than speed, which is why Theo suggested using a very fast, modern CSPRNG to solve the speed issues. Can you point out one use case where cryptographically safe random numbers, assuming we could generate them as quickly as you asked for them, would hurt you unless you needed/wanted to be able to save the seed and thus require or want deterministic results?
Reminder that this warning does not show up (in any color, much less red) if you’re using ``help(random)`` or ``dir(random)`` to explore the random module. It also does not show up in code review when you see someone doing random.random. It encourages you to write bad code, because it has a baked in assumption that there is a sane default for a random number generator and expects people to understand a fairly dificult concept, which is that not all "random" is equal. For instance, you've already made the mistake of saying you wanted "random" not deterministic, but the two are not mutually exlusive and deterministic is a property that a source of random can have, and one that you need for one of the features you say you like.
I'm claiming that the term random is ambiguously both security related and not security related and we should either get rid of the default and expect people to pick whether or not their use case is security related, or we should assume that it is unless otherwise instructed. I don't particularly care what the exact spelling of this looks like, random.(System|Secure)Random and random.DeterministicRandom is just one option. Another option is to look at something closer to what Go did and deprecate the "random" module and move the MT based thing to ``math.random`` and the CSPRNG can be moved to something like crypto.random.
No, by this argument we shouldn't have a function called statistics in the statistics module because there is no globally "right" answer for what the default should be. Should it be mean? mode? median? Why is *your* use case the "right" use case for the default option, particularly in a situation where picking the wrong option can be disastrous. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10 September 2015 at 14:10, Donald Stufft <donald@stufft.io> wrote:
Thanks. In one sense, no it wouldn't. Nor would it matter to me if "the default random number generator" was fast and cryptographically secure. What matters is just that I get a load of random (enough) numbers. What hurts somewhat (not enormously, I'll admit) is up front having to think about whether I need to be able to capture a seed and replay it. That's nearly always something I'd think of way down the line, as a "wouldn't it be nice if I could get the user to send me a reproducible test case" or something like that. And of course it's just a matter of switching the underlying RNG at that point. None of this is hard. But once again, I'm currently using the module correctly, as documented. I've omitted most of the rest of your response largely because we're probably just going to have to agree to differ. I'm probably too worn out being annoyed at the way that everything ends up needing to be security related, and the needs of people who won't read the docs determines API design, to respond clearly and rationally :-( Paul

On Thu, Sep 10, 2015 at 8:44 AM, Paul Moore <p.f.moore@gmail.com> wrote:
No one in this thread is accusing everyone of using the module incorrectly. The fact that you do use it correctly is a testament to the fact that you read the docs carefully and have some level of experience with the module to know that you're using it correctly.
I think the people Theo, Donald, and others (including myself) are worried about are the people who have used some book or online tutorial to write games in Python and have seen random.random() or random.choice() used. Later on they start working on something else (including but not limited to the examples of what Donald has otherwise pointed out). They also have enough experience with the random module to know it produced randomness (what kind, they don't know... in fact they probably don't know there are different kinds yet) and they use what they know because Python has batteries included and they're awesome and easy to use. The reality is that past experiences bias current decisions. If that person went and read the docs, they probably won't know if what they're doing warrants using a CSPRNG instead of the default Python one. If they're not willing to learn, or read enough (and I stress enough) (or just really don't have the time because this is a side project) about the topic before making a decision, they'll say "Well the module level functions seemed random enough to me, so I'll just use those". That could end up being rather awful for them. The reality is that your past experiences (and other people's past experiences, especially those who refuse to do some research and are demanding others prove that these are insecure with examples) are biasing this discussion because they fail to empathize with new users whose past experiences are coloring their decisions. People choose Python for a variety of reasons, and one of those reasons is that in their past experience it was "fast enough" to be an acceptable choice. This is how most people behave. Being angry at people for reading a two sentence long warning in the middle of the docs isn't helping anyone or arguing the validity of this discussion.

On September 10, 2015 at 9:44:13 AM, Paul Moore (p.f.moore@gmail.com) wrote:
This is actually exactly why Theo suggested using a modern, userland CSPRNG because it can generate random numbers faster than /dev/urandom can and, unless you need deterministic results, there's little downside to doing so. There's really two possible ideas here that depends on what sort of balance we'd want to strike. We can make a default "I don't want to think about it" implementation of random that is both *generally* secure and fast, however it won't be deterministic and you won't be able to explicitly seed it. This would be a backwards compatible change [1] for people who are simply calling these functions [2]: random.getrandbits random.randrange random.randint random.choice random.shuffle random.sample random.random random.uniform random.triangular random.betavariate random.expovariate random.gammavariate random.gauss random.lognormvariate random.normalvariate random.vonmisesvariate random.paretovariate random.weibullvariate If this were all that the top level functions in random.py provided we could simply replace the default and people wouldn't notice, they'd just automatically get safer randomness whether that's actually useful for their use case or not. However, random.py also has these functions: random.seed random.getstate random.setstate random.jumpahead and these functions are where the problem comes. These functions only really make sense for deterministic sources of random which are not "safe" for use in security sensitive applications. So pretending for a moment that we've already decided to do "something" about this, the question boils down to what do we do about these 4 functions. Either we can change the default to a secure CSPRNG and break these functions (and the people using them) which is however easily fixed by changing ``import random`` to ``import random; random = random.DeterministicRandom()`` or we can deprecate the top level functions and try to guide people to choose up front what kind of random they need. Either of these solutions will end up with people being safer and, if we pretend we've agreed to do "something", it comes down to whether we'd prefer breaking compatability for some people while keeping a default random generator that is probably good enough for most people, or if we'd prefer to not break compatability and try to push people to always deciding what kind of random they want. Of course, we still haven't decided that we should do "something", I think that we should because I think that secure by default (or at least, not insecure by default) is a good situation to be in. Over the history of computing it's been shown that time and time again that trying to document or educate users is error prone and doesn't scale, but if you can design APIs to make the "right" thing obvious and opt-out and require opting in to specialist [3] cases which require some particular property. [1] Assuming Theo's claim of the speed of the ChaCha based arc4random function is accurate, which I haven't tested but I assume he's smart enough to know what he's talking about WRT to speed of it. [2] I believe anyways, I don't think that any of these rely on the properties of MT or a deterministic source of random, just a source of random. [3] In this case, their are two specialist use cases, those that require deterministic results and those that require specific security properties that are not satisified by a userland CSPRNG because a userland CSPRNG is not as secure as /dev/urandom but is able to be much faster.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10 September 2015 at 15:21, Donald Stufft <donald@stufft.io> wrote:
Switching (somewhat hypocritically :-)) from an "I'm a naive user" stance, to talking about deeper issues as if I knew what I was talking about, this change results in each module getting a separate instance of the generator. That has implications on the risks of correlated results. It's unlikely to cause issues in real life, conceded. Paul

On Thu, 10 Sep 2015 at 07:22 Donald Stufft <donald@stufft.io> wrote:
+1 for deprecating module-level functions and putting everything into classes to force a choice +0 for deprecating the seed-related functions and saying "the stdlib uses was it uses as a RNG and you have to live with it if you don't make your own choice" and switching to a crypto-secure RNG. -0 leaving it as-is -Brett

On 11 September 2015 at 02:05, Brett Cannon <brett@python.org> wrote:
+1 for deprecating module-level functions and putting everything into classes to force a choice
-1000, as this would be a *huge* regression in Python's usability for educational use cases. (Think 7-8 year olds that are still learning to read, not teenagers or adults with more fully developed vocabularies) A reasonable "Hello world!" equivalent for introducing randomness to students is rolling a 6-sided die, as that relates to a real world object they'll often be familiar with. At the moment that reads as follows:
Another popular educational exercise is the "Guess a number" game, where the program chooses a random number from 1-100, and the person playing the game has to guess what it is. Again, randint() works fine here. Shuffling decks of cards, flipping coins, these are all things used to introduce learners to modelling random events in the real world in software, and we absolutely do *not* want to invalidate the extensive body of educational material that assumes the current module level API for the random module.
However, this I'm +1 on. People *do* use the module level APIs inappropriately, and we can get them to a much safer place, while nudging folks that genuinely need deterministic randomness towards an alternative API. The key for me is that folks that actually *need* deterministic randomness *will* be calling the stateful module level APIs. This means we can put the deprecation warnings on *those* methods, and leave them out for the others. In terms of practical suggestions, rather than DeterministicRandom and NonDeterministicRandom, I'd actually go with the simpler terms SeededRandom and SeedlessRandom (there's a case to be made that those are misnomers, but I'll go into that more below): SeededRandom: Mersenne Twister SeedlessRandom: new CSPRNG SystemRandom: os.urandom() Phase one of transition: * add SeedlessRandom * rename Random to SeededRandom * Random becomes a subclass of SeededRandom that deprecates all methods not shared with SeedlessRandom * this will also effectively deprecate the corresponding module level functions * any SystemRandom methods that are no-ops (like seed()) are deprecated Phase two of transition: * Random becomes an alias for SeedlessRandom * deprecated methods are removed from SystemRandom * deprecated module level functions are removed As far as the proposed Seeded/Seedless naming goes, that deliberately glosses over the fact that "seed" gets used to refer to two different things - seeding a PRNG with entropy, and seeding a deterministic PRNG with a particular seed value. The key is that "SeedlessRandom" won't have a "seed()" *method*, and that's the single most salient fact about it from a user experience perspective: you can't get the same output by providing the same seed value, because we wouldn't let you provide a seed value at all. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 11, 2015 at 3:00 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Aside from sounding like varieties of grapes in a grocery, those names seem just fine. From the POV of someone with a bit of comprehension of crypto (as in, "use /dev/urandom rather than a PRNG", but not enough knowledge to actually build or verify these things), the distinction is precise: with SeededRandom, I can give it a seed and get back a predictable sequence of numbers, but with SeedlessRandom, I can't. I'm not sure what the difference is between "seeding a PRNG with entropy" and "seeding a deterministic PRNG with a particular seed value", though; aside from the fact that one of them uses a known value and the other doesn't, of course. Back in my BASIC programming days, we used to use "RANDOMIZE TIMER" to seed the RNG with time-of-day, or "RANDOMIZE 12345" (or other value) to seed with a particular value; they're the same operation, but one's considered random and the other's considered predictable. (Of course, bytes from /dev/urandom will be a lot more entropic than "number of centiseconds since midnight", but for a single-player game that wants to provide a different starting layout every time you play, the latter is sufficient.) ChrisA

On 11 September 2015 at 03:11, Chris Angelico <rosuav@gmail.com> wrote:
Actually, that was just a mistake on my part - they're really the same thing, and the only distinction is the one you mention: setting the seed to a known value. Thus the main seed-related difference between something like arc4random and other random APIs is the same one I'm proposing to make here: it's seedless at the API level because it takes care of collecting its own initial entropy from the operating system's random number API. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Angelico wrote:
I think the only other difference is that the Linux kernel is continually re-seeding its generator whenever more unpredictable bits become available. It's not something you need to explicitly do yourself, as in your BASIC example. -- Greg

Nick Coghlan <ncoghlan@...> writes:
Fully agreed with Nick. That this is being seriously considered shows a massive disregard for usability. Python is not C++, it places convenience first. Besides, a deterministic RNG is a feature: you can reproduce exactly a random sequence by re-using the same seed, which helps fix rare input-dependent failures (we actually have good example of that in CPython development with `regrtest -r`). Good luck debugging such issues when using a RNG which reseeds itself in a random (!) way. Endly, the premise of this discussion is idealistic in the first place. If someone doesn't realize their code is security-sensitive, there are other mistakes they will make than simply choosing the wrong RNG. If you want to help people generate secure passwords, best would be perhaps to write a password-generating (or more generally secret-generating, for different kinds of secrets: passwords, session ids, etc.) library. Regards Antoine.

On 14 September 2015 at 13:59, Antoine Pitrou <antoine@python.org> wrote:>
Is your argument that there are lots of ways to get security wrong, and for that reason we shouldn't try to fix any of them? After all, I could have made this argument against PEP 466, or against the deprecation of SHA1 in TLS certificates, or against any security improvement ever made that simply changed defaults. The fact that there are secure options available is not a good excuse for leaving the insecure ones as the defaults. And let's be clear, this is not a theoretical error that people don't hit in real life. Investigating your last comment, Antoine, I googled "python password generator". The results: - The first one is a StackOverflow question which incorrectly uses random.choice (though seeded from os.urandom, which is an improvement). The answer to that says to just use os.urandom everywhere, but does not provide sample code. Only the third answer gets so far as to provide sample code, and it's way overkill. - The second option, entitled "A Better Password Generator", incorrectly uses random.randrange. This code is *aimed at beginners*, and is kindly handing them a gun to point at their own foot. - The third one uses urandom, which is fine - The fourth, an XKCD-based password generator, uses SystemRandom *if available* but then falls back to the MT approach, which is an unexpected decision, but there we go. - The fifth, from "pythonforbeginners.com", incorrectly uses random.choice - The sixth goes into an intensive discussion about 'password strength', including a discussion about the 'bit strength' of the password, despite the fact that they use random.randint which means that the analysis about bit strength is totally flawed. - For the seventh we get a security.stackexchange question with the first answer saying not to use Random, though the questioner does use it and no sample code is provided. - The eight is a library that "generates randomized strings of characters". It attempts to use SystemRandom but falls back silently if it's unavailable. At this point I gave up. Of that list of 8 responses, three are completely wrong, two provide sample code that is wrong with no correct sample code to be found on the page, two attempt to do the right thing but will fall into a silent failure mode if they can't, and only one is unambiguously correct. Similarly, a quick search of GitHub for Python repositories that contain random.choice and the string 'password' returns 40,000 results.[0] Even if 95% of them are safe, that leaves 2000 people who wrote wrong code and uploaded it to GitHub. It is disingenuous to say that only people who know enough write security-critical code. They don't. The reason for this is that most people don't know they don't know enough. And for those people, Python's default approach screws them over, and then they write blog posts which screw over more people. If the Python standard library would like to keep the insecure default of random.random that's totally fine, but we shouldn't pretend that the resulting security failures aren't our fault: they absolutely are. [0]: https://github.com/search?l=python&q=random.choice+password&ref=searchresults&type=Code&utf8=%E2%9C%93

On 14 September 2015 at 14:29, Cory Benfield <cory@lukasa.co.uk> wrote:
Is your argument that there are lots of ways to get security wrong, and for that reason we shouldn't try to fix any of them?
This debate seems to repeatedly degenerate into this type of accusation. Why is backward compatibility not being taken into account here? To be clear, the proposed change *breaks backward compatibility* and while that's allowed in 3.6, just because it is allowed, doesn't mean we have free rein to break compatibility - any change needs a good justification. The arguments presented here are valid up to a point, but every time anyone tries to suggest a weak area in the argument, the "we should fix security issues" trump card gets pulled out. For example, as this is a compatibility break, it'll only be allowed into 3.6+ (I've not seen anyone suggest that this is sufficiently serious to warrant breaking compatibility on older versions). Almost all of those SO questions, and google hits, are probably going to be referenced by people who are using 2.7, or maybe some version of 3.x earlier than 3.6 (at what stage do we allow for the possibility of 3.x users who are *not* on the latest release?) So is a solution which won't impact most of the people making the mistake, worth it? I fully expect the response to this to be "just because it'll take time, doesn't mean we should do nothing". Or "even if it just fixes it for one or two people, it's still worth it". But *that's* the argument I don't find compelling - not that a fix won't help some situations, but that because it's security, (a) all the usual trade-off calculations are irrelevant, and (b) other proposed solutions (such as education, adding specialised modules like a "shared secret" library, etc) are off the table. Honestly, this type of debate doesn't do the security community much good - there's too little willingness to compromise, and as a result the more neutral participants (which, frankly, is pretty much anyone who doesn't have a security agenda to promote) end up pushed into a "reject everything" stance simply as a reaction to the black and white argument style. Paul

On September 14, 2015 at 11:01:36 AM, Paul Moore (p.f.moore@gmail.com) wrote:
How has it not been taken into account? The current proposal (best summed up by Nick in the other thread) will not break compatability for anyone except those calling the functions that are specifically about setting a seed or getting/setting the current state. In looking around I don't see a lot of people using those particular functions so most people likely won't notice the change at all, and for those who there is a very trivial change they can make to their code to cope with the change.
We can't go back in time and fix those versions that is true. However, one of the biggest groups of people who are most likely to be helped by this change is new and inexperienced developers who don't fully grasp the security sensitive nature of whatever they are doing with random. That group of people are also more likely to be using Python 3.x than experienced programmers.
If I/we were not willing to compromise, I'd be pushing for it to use SystemRandom everywhere because that removes all of the possibly problematic parts of using using a user-space CSPRNG like is being proposed. However, I/we are willing to compromise by sacrificing possible security in order to not regress things where we can, in particular a user-space CSPRNG is being proposed over SystemRandom because it will provide you with random numbers almost as fast as MT will. However, when proposing this possible compromise, we are met with people refusing to meet us in the middle. There are some folks who are trying to propose other middle grounds, and there will undoubtably be some discussion around which ones are the best. We've gone from suggesting to replacing the default random with SystemRandom (a lot slower than MT) to removing the default altogether, to deprecating the default and replacing it with a fast user-space CSPRNG. However, folks who don't want to see it change at all have thus far been unwilling to compromise at all. I'm confused how you're saying that the security minded folks have been unwilling to compromise when we've done that repeatidly in this thread, whereas the backwards compat minded folks have consistently said "No, it would break compatability" or "We don't need to change" or "They are probably insecure anyways". Can you explain what compromise you're willing to accept here? If it doesn't involve breaking at least a little compatability then it's not a compromise it's you demanding that your opinion is the correct one (which isn't wrong, we're also asserting that our opinion is the correct one, we've just been willing to move the goal posts to try and limit the damage while still getting most of the benefit). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft <donald@...> writes:
That's a pretty big "except". Paul's and my concern is about compatibility breakage, saying "it doesn't break compatibility except..." sounds like a lot of empty rhetoric.
In looking around I don't see a lot of people using those particular functions
Given that when you "look around" you only end up looking around amongst the Web developer crowd, I may not be surprised. You know, when I "look around" I don't see a lot of people using the random module to generate passwords. Your anecdote would be more valuable than other people's?
Yes, because generating passwords is a common and reasonable task for new and inexperienced developers? Really? Again, why don't you propose a dedicated API for that? That's what we did for constant-time comparisons. That's what people did for password hashing. That's what other people did for cryptography. I haven't seen a reasonable rebuttal to this. Why would generating passwords be any different from all those use cases? After all, if you provide a convenient API people should flock to it, instead of cumbersomely reinventing the wheel... That's what libraries are for.
Really, it's not so much a performance issue as a compatibility issue. The random module provides, by default, a *deterministic* stream of random numbers. That os.urandom() may be a tad slower isn't very important when you're generating one number at a time and processing it with a slow interpreter (besides, MT itself is hardly the fastest PRNG out there). That os.urandom() doesn't give you a way to seed it once and get predictable results is a big *regression* if made the default RNG in the random module. And the same can be said for a user-space CSRNG, as far as I understand the explanations here.
However, when proposing this possible compromise, we are met with people refusing to meet us in the middle.
See, people are fed up with the incompatibilities arising "in the name of the public good" in each new feature release of Python. When the "middle" doesn't sound much more desirable than the "extreme", I don't see why I should call it a "compromise". Some people have to support code in 4 different Python versions and further gratuitous breakage in the stdlib doesn't help. Yes, they can change their code. Yes, they can use the "six" module, the "future" module or whatever new bandaid exists on PyPI. Still they must change their code in a way or another because it was deemed "necessary" to break compatibility to solve a concern that doesn't seem grounded in any reasonable analysis. Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. Not Python 3.6. (in case you're wondering, trying to make all published code on the Internet secure by appropriately changing the interpreter's "behaviour" to match erroneous expectations - even *documented* as erroneous - is *not* reasonable - no matter how hard you try, there will always be occurrences of broken code that people copy and paste around)
Can you explain what compromise you're willing to accept here?
Let's rephrase this: are *you* willing to accept an admittedly "insecure by default" compromise? No you aren't, evidently. There's no evidence that you would accept to leave the top-level random functions intact, even if a new UserSpaceSecureRandom class was added to the module, right? So why would we accept a compatibility-breaking compromise? Because we are more "reasonable" than you? (which in this context really reads: more willing to quit the discussion because of boredom, exhaustion, lack of time or any other quite humane reason; which, btw, sums up of significant part of what the dynamics of python-ideas have become: "victory of the most obstinate") Yeah, that's always what you are betting on, because it's not like *you* will ever be reasonable except if it's the last resort for getting something accepted. And that's why every discussion about security with security-minded (read: "obsessed") people is a massive annoyance, even if at the end it succeeds in reaching a "compromise", after 500+ excruciating backs and forths on a mailing-list. Regards Antoine.

On 14 September 2015 at 16:55, Antoine Pitrou <antoine@python.org> wrote:
Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. Not Python 3.6.
To clarify: your position is that we cannot break backward compatibility in Python 3.6?

Cory Benfield <cory@...> writes:
It is. Not breaking backward compatibility in feature releases (except 3.0, which was a deliberate special case) is a very long standing policy, and it is so because users have a much better time with such a policy, especially when people have to maintain code that's compatible accross multiple versions (again, the 2->3 transition is a special case, which justifies the existence of tools such as "six", and has incidently created a lot of turmoil in the community that has only recently begin to recede). Of course, fixing a bug is not necessarily breaking compatibility (although sometimes we may even refuse to fix a bug because the impact on working code would be too large). But changing or removing a documented behaviour that people rely on definitely is. We do break feature compatibility, from time to time, in exceptional and generally discussed-at-length cases, but there is a sad pressure recently to push for more compatibility breakage - and, strangely, always in the name of "security". (also note that some library modules such as asyncio are or were temporarily exempted from the compatibility requirements, because they are in very active development; the random module evidently isn't part of them) Regards Antoine.

On Mon, Sep 14, 2015 at 10:01 AM, Paul Moore <p.f.moore@gmail.com> wrote:
So people who are arguing that the defaults shouldn't be fixed on Python 2.7 are likely the same people who also argued that PEP 466 was a terrible, awful, end-of-the-world type change. Yes it broke things (like eventlet) but the net benefit for users who can get onto Python 2.7.9 (and later) is immense. Now I'm not arguing that we should do the same to the random module, but a backport (that is part of the stdlib) would probably be a good idea under the same idea of allowing users to opt into security early.
They're not irrelevant. I personally think they're of a lower impact to the discussion, but the reality is that the people who are educating others are few and far between. If there are public domain works, free tutorials, etc. that all advocate using a module in the standard library and no one can update those, they still exist and are still recommendations. People prefer free to correct when possible because there's nothing free to correct them (until they get hacked or worse). Do we have a team in the Python community that goes out to educate for free people on security related best practices? I haven't seen them. The best we have is a few people on crufty mailing lists like this one trying to make an impact because education is a much larger and harder to solve problem than making something secure by default. Perhaps instead of bickering like fools on a mailing list, we could all be spending our time better educating others. That said, I can't make that decision for you just like you can't make that for me.
Except you seem to have missed much of the compromises being discussed and conceded by the security minded folks. Personally, names that describe the outputs of the algorithms make much more sense to me than "Seedless" and "Seeded" but no one has really bothered to shave that yak further out of a desire to compromise and make things better as a whole. Much of the lack of gradation has come from the opponents to this change who seem to think of security as a step function where a subjective measurement of "good enough for me" counts as secure.

On 14 September 2015 at 16:32, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
You may well be right. Personally, I'm pretty sick of the way all of these debates degenerate into content-free reiteration of the same old points, and unwillingness to hear other people's views. Here's a point - it seems likely that the people arguing for this change are of the opinion that I'm not appreciating their position. (For the record, I'm not being deliberately obstructive in case anyone thought otherwise. In my view at least, I don't understand the security guys' position). Assuming that's the case, then I'm probably one of the people who needs educating. But I don't feel like anyone's trying to educate me, just that I'm being browbeaten until I give in. Education != indoctrination.
That said, I can't make that decision for you just like you can't make that for me.
Indeed. Personally, I spend quite a lot of time in my day job (closed source corporate environment) trying to educate people in sane security practices, usually ones I have learned from people in communities like this one. One of the biggest challenges I have is stopping people from viewing security as "an annoying set of rules that get in the way of what I'm trying to do". But you would not believe the sorts of things I see routinely - I'm not willing to give examples or even outlines on a public mailing list because I can't assess whether such information could be turned into an exploit. I can say, though, that crypto-safe RNGs is *not* a relevant factor :-) At its best, good security practice should *help* people write reliable, easy to use systems. Or at a minimum, not get in the way. But the PR message needs always to be "I understand the constraints you're dealing with", not "you must do this for your own good". Otherwise the "follow the rules until the auditors go away" attitude just gets reinforced. Hence my focus on seeing proof that breakages are justified *in the context of the target audience I am responsible for*. Conversely, you're right that I can't force anyone else to try to educate people in good security practices, however much better than me at it I might think they are. In actual fact, though, I think a lot of people do a lot of good work educating others - as I say, most of what I've learned has been from lists like these.
OK, you have a point - there have been changes to the proposals. But there are fundamental points that have (as far as I can see) never been acknowledged. As a result, the changes feel less like compromises based on understanding each other's viewpoints, and more like repeated attempts to push something through, even if it's not what was originally proposed. (I *know* this is an emotional position - please understand I'm fed up and not always managing to word things objectively). Specifically, I have been told that I can't argue my "convenience" over the weight of all the other people who could fall into security traps with the current API. Let's review that, shall we? * My argument is that breaking backward compatibility needs to be justified. People have different priorities. "Security risks should be fixed" isn't (IMO) a free pass. Why should it be? "Windows compatibility issues should be fixed" isn't a free pass. "PyPy/Jython compatibility issues should be fixed" isn't a free pass. Forcing me to adjust my priorities so that I care about security when I don't want (or IMO need) to isn't acceptable. * The security arguments seem to be largely in the context of web application development (cookies, passwords, shared secrets, ...) That's not the only context that matters. * As I said above, in my experience, a compatibility break "to make things more secure" is seen as equating security with inconvenience, and can actually harm attempts to educate users in better security practices. * In many environments, reproducibility of random streams is important. I'm not an expert on those fields, although I've hit some situations where seeding is a requirement. As far as I am aware, most of those situations have no security implications. So for them, the PEP is all cost, no benefit. Sure the cost is small, but it's non-zero. How come the web application development community is the only one whose voice gets heard? Is it because the fact that they *are* public-facing, and frequently open-source, means that data is available? So "back it up with facts or we won't believe you" becomes a debating stance? I'm not arguing that everyone should be allowed to climb up on their soapbox and rant - but I would like to think that bringing a different perspective to the table could be treated with respect and genuine attempts to understand. And "in my experience" is viewed as an offer of information, not as an attempt to bluff on a worthless hand. Just to be clear, I think the current proposal (Nick's pre-PEP) is relatively unobtrusive, and unlikely to cause serious compatibility issues. I'm uncomfortable with the fact that it feels like yet another "imposition in the name of security", and while I'm only one person I feel that I'm not alone. I'm concerned that the people pushing security seem unable to recognise that people becoming sick of such changes is a PR problem they need to address, but that's their issue not mine. So I'm unlikely to vote against the proposal, but I'll feel sad if it's accepted without a more balanced discussion than we've currently had. On the meta-issue of how debates like this are conducted, I think people probably need to listen more than they talk. I'm as guilty as anyone else here. But in particular, when multiple people all end up responding to rebut *every* counter-argument, essentially with the same response, maybe it's time to think "we're in the majority here, let's stop talking so much and see if we're missing anything from what the people with other views are saying". He who shouts loudest isn't always right. Not necessarily wrong, either, but sometimes it's bloody hard to tell one way or the other, if they won't shut up long enough to analyze the objections.
I'm frankly long past caring. I think we'll end up with whatever was on the table when people got too tired to argue any more.
Wait, what? It's *me* that's claiming that security is a yes/no thing??? When all I'm hearing is "education isn't sufficient", "dedicated libraries aren't sufficient", "keeping a deterministic RNG as default isn't an option"? And when I'm suggesting that fixing the PRNG use in code that misuses a PRNG may not be the only security issue with that code? I knew the two sides weren't communicating, but this statement staggers me. We have clearly misunderstood each other even more fundamentally that I had thought possible :-( Thinking hard about the implications of what you said there, I start to see why you might have misinterpreted my stance as the black and white one. But I have absolutely no idea how to explain to you that I find your stance equally (and before I took the time to think through what your statement implied, even more) so. There's little more I can say. I'm going to take my own advice now, and stop talking. I'll keep listening, in the hope that either this post or something else will somehow break the logjam, but right now I'm not sure I have much hope of that. Paul

On Mon, Sep 14, 2015, at 15:14, Paul Moore wrote:
* My argument is that breaking backward compatibility needs to be justified.
I don't think it does. I think that there needs to be a long roadmap of deprecation and provided workarounds for *almost any* backwards-compatibility-breaking change, but that special justification beyond "is this a good feature" is only needed for ignoring that roadmap, not for deprecating/replacing a feature in line with it. No-one, as far as I have seen in this thread to date, has actually put a timeline on this change. No-one's talking about getting rid of the global functions in 3.5.1, or in 3.6, or in 3.7. So with that in mind I can only conclude that the people against making the change are against *ever* making it *at all* - and certainly a lot of the arguments they're making have to do with nebulous educational use-cases (class instances are hard, let's use mutable global state) rather than backwards compatibility. Would you likewise have been against every single thing that Python 3 did?

On September 14, 2015 at 3:14:45 PM, Paul Moore (p.f.moore@gmail.com) wrote:
For the record, I'm not sure what part you don't understand. I'm happy to try and explain it, but I think I'm misunderstanding what you're not understanding or something because I personally feel like I did explain what I think you're misunderstanding. Part of the problem (probably) here is that there isn't an exact person we're trying to protect here. The general gist is that if you use the deterministic APIs in a security sensitive situation, then you may be vulnerable depending on exactly what you're doing. We think that in particular, the API of the random module will lead inexperienced or un(der)informed developers to use the API in situations that it's not appropiate and from that, have an insecure piece of software they wrote. We're people who think that the defaults of the software should be "generally" secure (as much so as is reasonable) and that if you want to do something that isn't safe then you should explicitly opt in to that (the flipside is, things shouldn't be so locked down as to be unusable without having to turn off all of the security knobs, this is where the "generally" in generally secure comes into play). A particularly nasty side effects of this, is that it's almost never the people who wrote this software who are harmed by it being broken and it's almost always their users who didn't have anything to do with it. So essentially the goal is to try and make it harder for people to accidently misuse the random module. If that doesn't answer your confusion, if you can try to reword it to get it through my thick skull better, I'm happy to continue to try an answer it (on or off list).
Right, and this is actually trying to do that. By removing a possibly dangerous default and making the default safer. Defaults matter a lot in security (and sadly, a lot of software doesn't have safe defaults) because a lot of software will never use anything but the defaults.
I think part of this is that a lot of the folks proposing these changes are also sensitive to the backwards compatability needs and have already baked that into their thoughts. We don't generally come into these with "scorched earth" suggestions of fixing some situation where security could be improved but instead try and figure out a decent balance of security and not breaking things to try and cover most of the ground with as little cost as possible. My very first email in this particular thread (that started this thread) was the first one I had with a fully solid proposal in it. The last paragraph in that proposal asked the question "Do we want to protect users by default?" My next email presents two possible options depending on which we considered to be "less" breaking, either deprecating the module scoped functions completely or change their defaults to something secure and mentioned that if we can't change the default, the user-land CSPRNG probably isn't a useful addition because it's benefit is primarily in being able to make it the default option. I don't see anyone who is talking about making a change not also talking about what areas of backwards compatibility it would actually break. I think part of this too is that security is a bit weird, it's not a boolean property but there are particular bars you need to pass before it's an actual solution to the problem. So for a lot of us, we'll figure out that bar and draw a line in the sand and say "If this proposal crosses this line, then doing nothing is better than doing something" because it'd just be churn for churns sake at that point. That's why you'll see particular points that we essentially won't give up, because if they are given up we might as well do nothing. In this particular instance, the point is that the API of the random module leads people to use it incorrectly, so unless we address that, we might as well just leave it alone.
I think I was the one who said that to you, and I'd like to explain why I said it (beyond the fact I was riled up). Essentially I had in my mind something like what Nick has proposed, which you've said later on you think is relatively unobtrusive, and unlikely to cause serious compatibility, which I agree with. Then I saw you arguing against what I felt was a pretty mundane API break that was fairly trivial to work around, and it signaled to me that you were saying that having to type a few extra letters was a bridge too far. This reads to me like someone saying "Well I know how to use it correctly, it's their own fault if others don't". I'm not saying that's what you actually think but that's how it read to me.
The justification is essentially that it will protect some people with minimal impact to others. The main impact will be people who actually needed a deterministic RNG will need to use something like ``random.seeded_random`` instead of just ``random`` and importantly, this will break in a fairly obvious manner instead of the silently wrong situation for people who are currently using the top level API incorrectly. As a bit of a divergence, the "silently wrong" part is why defaults tend to matter a lot in security. Unless you're well versed in it, most people don't think about it and since it "works" they don't inquire further. Something that is security sensitive that always "works" (as in, doesn't raise an error) is broken which is the inverse of how most people think about software. To put it another way, it's the job of security sensitive APIs to break things, ideally only in cases where it's important to break, but unless you're actually testing that it breaks in those attack scenarios, secure and insecure looks exactly the same.
You're right it's not the only context that matters, however it's often brought up for a few reasons: * Security largely doesn't matter for software that doesn't accept or send input from some untrusted source which narrows security down to be mostly network based applications. * The HTTP protocol is "eating the world" and we're seeing more and more things using it as their communication protocol (even for things that are not traditional browser based applications). * Traditional Web Applications/Sites are a pretty large target audience for Python and in particular a lot of the security folks come from that world because the web is a hostile place. But you can replace web application with anything that an untrusted user can interact with over any protocol and the argument is basically the same.
Sadly, I don't think this is fully resolvable :( It is the nature of security that it's purpose is to take something that otherwise "works" and make it no longer work because it doesn't satisfy the constraints of the security system.
Right, and I don't think anyone is saying this isn't an important use case, just that if you need a deterministic RNG and you don't get one, that is a fairly obvious problem but if you need a CSPRNG and you don't get one, that is not obvious. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Tue, Sep 15, 2015 at 6:23 AM, Donald Stufft <donald@stufft.io> wrote:
To add to that: Web application development is a *huge* area (every man and his dog wants a web site, and more than half of them want logins and users and so on), which means that the number of non-experts writing security-sensitive code is higher there than in a lot of places. The only other area I can think of that would be comparably popular would be mobile app development - and a lot of the security concerns there are going to be in a web context anyway. Is it fundamentally insecure to receive passwords over an encrypted HTTP connection and use those to verify user identities? I don't think so (although I'm no expert) - it's what you do with them afterward that matters (improperly hashing - or, worse, using a reversible transformation). Why are so many people advised not to do user authentication at all, but to tie in with one of the auth APIs like Google's or Facebook's? Because it's way easier to explain how to get that right than to explain how to get security/encryption right. How bad is it, really, to tell everyone "use random.SystemRandom for anything sensitive", and leave it at that? ChrisA

On 15 September 2015 at 18:58, Chris Angelico <rosuav@gmail.com> wrote:
How bad is it, really, to tell everyone "use random.SystemRandom for anything sensitive", and leave it at that?
That's the status quo, and has been for a long time. If it was ever going to work in terms of discouraging folks from use the module level functions for security sensitive tasks, it would have worked by now. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 September 2015 at 01:32, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
They don't even have to get onto 2.7.9 per se - the RHEL 7.2 beta just shipped with Robert Kuska's backport of those changes (minus the eventlet breaking internal API change), so it will also filter out through the RHEL/CentOS ecosystem via 7.x and SCLs. (We also looked at a Python 2.6 backport, but decided it was too much work for not enough benefit - folks really need to just upgrade to RHEL/CentOS 7 already, or at least switch to using Software Collections for their Python runtime needs). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14 September 2015 at 16:01, Paul Moore <p.f.moore@gmail.com> wrote:
What makes you think that I didn't take it into account? I did: and then rejected it. On a personal level, I believe that defaulting to more secure is worth backward compatibility breaks. I believe that a major reason for the overwhelming prevalence of security vulnerabilities in modern software is because we are overly attached to making people's lives *easy* at the expense of making them *safe*. I believe that software communities in general are too concerned about keeping the stuff that people used around for far too long, and not concerned enough about pushing users to make good choice. The best example of this is OpenSSL. When compiled from source naively (e.g. ./config && make && make install), OpenSSL includes support for SSLv3, SSL Compression, and SSLv2, all of which are known-broken options. To clarify, SSLv2 has been deprecated for security reasons since 1996, but a version of OpenSSL 1.0.2d you build today will happily enable *and use* it. Hell, OpenSSL's own build instructions include this note[0]:
Why is it that users who do not read the wiki (most of them) get an insecure build? Backwards compatibility is why. This is necessarily a reductio ad absurdum type of argument, because I'm trying to make a rhetorical point: I believe that sacrificing security on the altar of backwards compatibility is a bad idea in the long term, and I want to discourage it as best I can. I appreciate your desire to maintain backward compatibility, Paul, I really do. And I think it is probably for the best that people like you work on projects like CPython, while people like me work outside the standard library. However, that won't stop me trying to drag the stdlib towards more secure defaults: it just might make it futile.

On September 14, 2015 at 11:08:39 AM, Cory Benfield (cory@lukasa.co.uk) wrote:
So I will counter this with what I am fully expecting to be the response: People use distributions that compile and configure OpenSSL for them, e.g., `apt-get install openssl` (not obviously the example that works, but you get the idea). That said, last year, Debian, Ubuntu, Fedora, and other distributions all started compiling openssl without SSLv3 as an available symbol which broke backwards compatibility and TONS of python projects (eventlet, urllib3, requests, etc.). Why did it break backwards compatibility? Because they knew that they were responsible for the security of their users and expecting users to recompile OpenSSL themselves with the correct flags was unrealistic. Their users come from a wide range of people: - System administrators - Desktop users (if you believe anyone actually uses linux on the desktop ;)) - Researchers - Developers - etc.
That said, I’d also like to combat the idea that security experts won’t use random. Currently Helios which is a voting piece of software (that anyone can deploy) uses the random module (https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b7194...) They use it to generate passwords: https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b7194... https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b7194... Ben Adida is a security professional who has written papers on creating secure voting systems but even he uses the random module arguably incorrectly in what should be secure software. Arguing that anyone who knows they need secure random functions will use them, is clearly invalidated. Not everyone who knows they should be generating securely random things are aware that the random module is insufficient for their needs. Perhaps that code was written before the big red box was added to the documentation and so it was ineffective. Perhaps Ben googled and found that everyone else was using random for passwords (as people have shown is easy to find in this discussion several times). That said, your arguments are easily reduced to “No language should protect its users from themselves” which is equivalent to Python’s “We’re all consenting adults philosophy”. In that case, we’re absolutely safe from any blame for the horrible problems that users inflict on themselves. Anyone that used urllib2/httplib/etc. from the standard library to talk to a site over HTTPS (prior to PEP 466) are all to blame because they didn’t read the source and know that their sensitive information was easily intercepted by anyone on their network. Clearly, that’s their fault. This makes core language development so much easier, doesn’t it? Place all the blame on the users for the sake of X (where in this discussion X is the holy grail of backwards compatibility).

On 14 September 2015 at 17:00, Cory Benfield <cory@lukasa.co.uk> wrote:
OK. In *my* experience, systems with appallingly bad security practices run for many years with no sign of an exploit. The vulnerabilities described in this thread pale into insignificance compared to many I have seen. On the other hand, I regularly see systems not being upgraded because the cost of confirming that there are no regressions (much less the cost of making fixes for deliberate incompatibilities) is deemed too high. I'm not trying to justify those things, nor am I trying to say that my experience is in any way "worth more" than yours. These aren't all Python systems. But the culture where such things occur is real, and I have no reason to believe that I'm the only person in this position. (But as it's in-house closed-source, it's essentially impossible to get any good view of how common it is). Paul

On September 14, 2015 at 3:27:22 PM, Paul Moore (p.f.moore@gmail.com) wrote:
What does "no sign of an exploit" mean? Does it mean that if there was an exploit that the attackers didn't put metaphorical giant signs up to say that "Zero Cool" was here? Or is there an active security team running IDS software to ensure that there wasn't a breach? I ask because in my experience, "no sign of an exploit" is often synonymous with "we've never really looked to see if we were exploited, but we haven't noticed anything". This is a dangerous way to look at it, because a lot of exploitation is being done by organized crime where they don't want you to notice that you were exploited because they want to make you part of a botnet or to silently steal data or whatever you have. For these, if they get detected that is a bad thing because they lose that node in their botnet (or whatever). It's a very rare exploit that gets publically exposed like the Ashley Madison hacks, they are jsut the ones that get the most attention because they are bombastic and public.
Absolutely! However, I think these systems largely don't upgrade *at all* and are still on whatever version of $LANG they originally wrote the software for. These systems tend to be so regression adverse that they don't even risk bug fixes because that might cause a regression. For these people, it doesn't really matter what we do because they aren't going to upgrade anyways, and they keep Red Hat in business by paying them for Python 2.4 until the heat death of the universe. I think the more likely case for concern is people who do upgrade and are willing to tolerate some regression in order to stay somewhat current. These people will push back against *massive* breakage (as seen with the Python 3.x migration taking forever) but are often perfectly fine dealing with small breakages. As someone who does write software that supports a lot of versions (currently, 6-7 versions of CPython alone is my standard depending if you count pre-releases or not) having to tweak import statements doesn't even really register in my "give a damn" meter, nor did it for the folks I know who are in similar situations (though this is admittingly a biased and small sample).
I think maybe a problem here is a difference in how we look at the data. It seems that you might focus on the probability of you personally (or the things you work on) getting attacked and thus benefiting from these changes, whereas I, and I suspect the others like me, think about the probability of *anyone* being attacked. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

(The rest of your emails, I'm going to read fully and digest before responding. Might take a day or so.) On 14 September 2015 at 21:36, Donald Stufft <donald@stufft.io> wrote:
This may be true, in some sense. But I'm not willing to accept that you are thinking about everyone, but I'm somehow selfishly only thinking of myself. If that's what you were implying, then frankly it's a pretty offensive way of disregarding my viewpoint. Knowing you, I'm sure that's *not* how you meant it - but do you see how easy it is for the way you word something to make it nearly impossible for me to see past your wording to get to the actual meaning of what you're trying to say? I didn't even consciously notice the implication myself, at first. I simply started writing a pretty argumentative rebuttal, because I felt that somehow I needed to correct what you said, but I couldn't quite say why. Looking at the reality of what I focus on, I'd say it's more like this. I mistrust arguments that work on the basis that "someone, somewhere, might do X bad thing, therefore we must all pay cost Y". The reasons are complex (and I don't know that I fully understand all of my thought processes here) but some aspects that immediately strike me are: * The probability of X isn't really quantified. I may win the lottery, but I don't quit my job - the probability is low. The probability of X matters. * My experience of the probability of X happening varies wildly from that of whoever's making the point. Who is right? Why must one of us "win" and be right? Can't it simply be that my data implies that over the full data set, the actual probability of X is lower than you thought? * The people paying cost Y are not the cause of, nor are they impacted by, X (except in an abstract "we all suffer if bad things happen" sense). I believe in the general principle of "you pay for what you use", so to me you're arguing for the wrong people to be made to pay. Hopefully, those are relatively objective measures. More subjectively, * It's way too easy to say "if X happens once, we have a problem". If you take the stance that we have to prevent X from *ever* happening, you allow yourself the freedom to argue with vague phrases like "might", while leaving the burden of absolute proofs on me. (In the context of RNG proposals, this is where arguments like "let's implement a secure secret library" get dismissed - they still leave open the possibility of *someone* using an inappropriate RNG, so "they don't solve the issue" - even if they reduce the chance of that happening by a certain amount - and neither you nor I can put a figure on how much, so let's not try). * There's little evidence that I can see of preventative security measures having improved things. Maybe this is because it's an "arms race" situation, and keeping up is all we can hope for. Maybe it's because it's hard to demonstrate a lack of evidence, so the demand for evidence is unreasonable. I don't know. * For many years I ran my PC with no anti-virus software. I never got a virus. Does that prove anything? Probably not. The anti-virus software on my work PC is the source of *far* more issues than I have ever seen caused by a virus. Does *that* prove anything? Again, probably not. But my experience with at least *that* class of pressure to implement security is that the cure is worse than the disease. Where does that leave the burden of proof? Again, I don't know, but my experience should at least be considered as relevant data. * Everyone I have ever encountered in a work context (as opposed to in open-source communities) seems to me to be in a similar situation to mine. I believe I'm speaking for them, but because it's a closed-source in house environment, I've got no public data to back my comments. And totally subjective, * I'm extremely tired of the relentless pressure of "we need to do X, because security". While the various examples of X may all have ended up being essentially of no disadvantage to me, feeling obliged to read, understand, and comment on the arguments presented every time, gets pretty wearing. * I can't think of a single occasion where we *don't* do X. That may well be confirmation bias, but again subjectively, it feels like nobody's listening to the objections. I get that the original proposals get modified, but if never once has the result been "you're right, the cost is too high, we'll not do X" then that puts security-related proposals in a pretty unique position. Finally, in relation to that last point, and one thing I think is a key difference in our thinking. I do *not* believe that security proposals (as opposed to security bug fixes) are different from any other type of proposal. I believe that they should be subject to all the same criteria for acceptance that anything else is. I suspect that you don't agree with that stance, and believe that security proposals should be held to different standards (e.g., a demonstrated *probability* of benefit is sufficient, rather than evidence of actual benefit being needed). But please speak for yourself on this - I'm not trying to put words into your mouth, it's just my impression. All of which is completely unrelated to either the default RNG for the Python stdlib, or whether I understand and/or accept the security arguments presented here (for clarity, I believe I understand them, I just don't accept them). Paul

On 15 September 2015 at 08:50, Emile van Sebille <emile@fenx.com> wrote:
Historically, yes, but relying solely on perimeter defence is becoming less and less viable as the workforce decentralises, and we see more people using personal devices and untrusted networks to connect to work systems (whether that's their home network or the local coffee shop), as well as relying on public web services rather than internal applications. Enterprise IT is simply *wrong* in the way we currently go about a lot of things, and the public web service sector is showing us all how to do it right. Facilitating that transition is a key part of my day job in Red Hat's Developer Experience team (it's getting a bit off topic, but for a high level company perspective on that: http://www.redhat-cloudstrategy.com/towards-a-frictionless-it-whether-you-li...). And for folks tempted to think "this is just about the web", for a non-web related example of what we as an industry have unleashed through our historical "security is optional" mindset: http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ That's an article on remotely hacking the UConnect system in a Jeep Cherokee to control all sorts of systems that had no business being connected to the internet in the first place. The number of SCADA industrial control systems accessible through the internet is frankly terrifying - one of the reasons we can comfortably assume most humans are either nice or lazy is because we *don't* see most of the vulnerabilities that are lying around being exploited. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On September 14, 2015 at 6:39:28 PM, Paul Moore (p.f.moore@gmail.com) wrote:
No, I don’t mean it in the way of you being selfish. I'm not quite sure the right wording here, essentially the probably of an event happening to a particular indivdual vs the probablity of an event occuring at all. To use your lottery example, I *think*, and perhaps I'm wrong, that you're looking at it in terms of, the chance of any particular person participating in the lottery winning the lottery is low, so why should each of these people, as an invidual make plans for how to get the money when they win the lottery, because as inviduals they are unlikely to win. Whereas I flip it around and think, that someone, somewhere is likely going to win the lottery, so the lottery system should make plans for how to get them the money when they win. I'm not sure the right "name" for each type, and I don't want to continue to try and hamfist it, because I don't mean it in an offensive or an "I'm better than you" way and I fear putting my foot in my mouth again :(
Just to be clear, I don’t think that "If X happens once, it's a problem" is a reasonable belief and I don't personally have that belief. It's a sliding scale where we need to figure out where the right solution for Python is for each particular problem. I certainly wouldn't want to use a language that took the approach that if X can ever happen, we need to prevent X. I have seen a number of users incorrectly use the random.py module to where I think that the danger is "real". I also think that, if this were a brand new module, it would be a no brainer (but perhaps I'm wrong) for the default, module level to have a safe by default API. Going off that assumption then I think the question is really just "Is it worth it?" not "does this make more sense then the current?".
By preventive security measures, do you mean things like PEP 466? I don't quite know how to accurately state it, but I'm certain that PEP 466 directly improved the security of the entire internet (and continues to do so as it propagates).
Antivirus is a particularly bad example of security software :/ It's a massive failing of the security industry that they exist in the state they do. There's a certain bias here though, because it is the job of security sensitive code to "break" things (as in, take otherwise valid input and make it not work). In an ideal world, security software just sits there doing "nothing" from the POV of someone who isn't a security engineer and then will, often through no fault of their own, pop and and make things go kabloom because it detected something insecure happening. This means that for most people, the only interaction they have with something designed to protect them, is when it steps in to make things stop working. It is relevant data, but I think it goes back to the different way of looking at things (what is the individual chance of an event happening, vs the chance of an event happening across the entire population). This might also be why you'll see the backwards compat folks focus more on experienced driven data and security folks focus more on hypotheticals about what could happen.
I'm not sure what to do about this :( On one side, you're not obligated to read, understand, and comment on every thing that's raised but I totally understand why you do, because I do too, but I'm not sure how to help this without saying that people who care about security shouldn't bring it up either?
Off the top of my head I remember the on by default hash randomization for Python 2.x (or the actual secure hash randomization since 2.x still has the one that is trivial to recover the original seed). I don't actually remember that many cases where python-dev choose to broke backwards compatability for security. The only ones I can think of are: * The hash randomization on Python 3.x (sort of? Only if you depended on dict ordering, which wasn't a guarentee anyways). * The HTTPS improvements where we switched Python to default to default to verifying certificates. * The backports of several security features to 2.7 (backport of 3.4's ssl module, hmac.compare_digest, os.urandom's persistent FD, hashlib.pbkdf2_hmac, hashlib.algorithms_guaranteed, hashlib.algorithms_available). There are probably things that I'm not thinking of, but the hash randomization only broke things if you were depending on dict/set having ordering which isn't a promised property of dict/set. The backports of security features was done in a pretty minimally invasive way where it would (ideally) only break things if you relied on those names *not* existing on Python 2.7 (which was a nonzero but small set). The HTTPS verification is the main thing I can think of where python-dev actually broke backwards compatibility in an obvious way for people relying on something that was documented to work a particular way. Are there example I'm not remembering (probably!)? It doesn't feel like 2 sort of backwards incompatible changes and 1 backwards incompatible change in the lifetime of Python is really that much to me? Is there some cross over between distutils-sig maybe? I've focused a lot more on pushing security on that side of things both because it personally affects me more and because I think insecure defaults there are a lot worse than insecure defaults in any particular module in the Python standard library.
Well, I think that all proposals are based on what the probability is it's going to help some particular percentage of people, and whether it's going to help enough people to be worth the cost. What I think is special about security is the cost of *not* doing something. Security "fails open" in that if someone does something insecure, it's not going to raise an exception or give different results or something like that. It's going to appear to "work" (in that you get the results you expect) while the user is silently insecure. Compare this to, well let's pretend that there was never a deterministic RNG in the standard library. If a scientist or a game designer inappropiately used random.py they'd pretty quickly learn that they couldn't give the RNG a seed, and that even if it was a CSPRNG that had an "add_seed" method that might confuse them it'd be pretty obvious on the second execution of their program that it's giving them a different result. I think that the bar *should* be lower for something that just silently or subtlety does the "wrong" thing vs something that obviously and loudly does the wrong thing. Particularly when the downside of doing the "wrong" thing is as potentionally disasterous as it is with security. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On September 14, 2015 at 8:14:33 PM, Donald Stufft (donald@stufft.io) wrote:
This should read: Security "fails open" in that if someone uses an API that allows something insecure to happen (like not validating HTTPS) it's not going to raise an exception or give different results or something like that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 15 September 2015 at 08:39, Paul Moore <p.f.moore@gmail.com> wrote:
Most of the time, when the cost of change is clearly too high, we simply *don't ask*. hmac.compare_digest() is an example of that, where having a time-constant comparison operation readily available in the standard library is important from a security perspective, but having standard equality comparisons be as fast as possible is obviously more important from a language design perspective. Historically, it was taken for granted that backwards compatibility concerns would always take precedence over improving security defaults, but the never-ending cascade of data breaches involving personally identifiable information are proving that we, as a collective industry are *doing something wrong*: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-bre... A lot of the problems we need to address are operational ones as we upgrade the industry from a "perimiter defence" mindset to a "defence in depth" mindset, and hence we have things like continuous integration, continuous deployment, application and service sandboxing, containerisation, infrastructure-as-code, immutable infrastructure, etc, etc, etc. That side of things is mostly being driven by infrastructure software vendors (whether established ones or startups), where we have the fortunate situation that the security benefits are tied in together with a range of operational efficiency and capability benefits [1]. However, there's also increasing recognition that some of the problems are due to the default behaviours of the programming languages we use to *create* applications, and in particular the fact that many security issues involve silent failure modes. Sometimes the right answer to those is to turn the silent failure into a noisy failure (as with certificate verification in PEP 476), other times it is about turning the silent failure into a silent success (as is being proposed for the random module API), and yet other times it is simply about lowering the barriers to someone doing the right thing once they're alerted to the problem (as with the introduction of hmac.compare_digest() and ssl.create_default_context(), and their backports to the Python 2.7 series) At a lower level, languages like Go and Rust are challenging some of the assumptions in the still dominant C-based memory management model for systems programming. Rust in particular is interesting in that it has a much richer compile time enforced concept of memory ownership than C does, while still aiming to keep the necessary runtime support very light. Regards, Nick. [1] For folks wanting more background on some of the factors this shift, I highly recommend Google's "BeyondCorp" research paper: http://research.google.com/pubs/pub43231.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14 September 2015 at 23:39, Paul Moore <p.f.moore@gmail.com> wrote:
(The rest of your emails, I'm going to read fully and digest before responding. Might take a day or so.)
Point by point responses exhaust and frustrate me, and don't really serve much purpose other than to perpetuate the debate. So I'm going to make some final points, and then stop. This is based on having read the various emails responding to my earlier comments. If it looks like I haven't read something, please assume I have but either you didn't get your point across, or maybe I simply don't agree with you. Why now? -------- First of all, the big question for me is why now? The random module has been around in its current form for many, many years. Security issues are not new, maybe they are slowly increasing, but there's been no step change. The only thing that seems to have changed is that someone (Theo) has drawn attention to the random module. So I feel that the onus is on the people proposing change to address that. Show me the evidence that we've had an actual problem for many years, and demonstrate that it's a good job we spotted it at last, and now have a chance to fix it. Explain to me what has been going wrong all these years that I'd never even noticed. Arguments that people are misusing the module aren't sufficient in themselves - they've (presumably) been doing that for years. In all that time, who was hacked? Who lost data? As a result of random.random being a PRNG rather than being crypto-secure? I'm not asking for an unassailable argument, just acknowledgement that it's *your* job to address that question, and not mine to persuade you that "we've been alright so far" is a compelling reason to reject your proposal. Incorrect code on SO etc ------------------------ As regards people picking up insecure code snippets from the internet and using them, there's no news there. I can look round and find hundreds of bits of incorrect code in any area you want. People copy/paste garbage code all the time. To my embarassment, I've done it myself in the past :-( But I'm reminded of https://xkcd.com/386/ - "somebody is wrong on the internet!" This proposal, and in particular the suggestion that we need to retrospectively make the code snippets quoted here secure, strikes me as a huge exercise in trying to correct all the people who are wrong on the internet. There's certainly value in "safe by default" APIs, I don't disagree with that, but I honestly fail to see how quoting incorrect code off the internet is a compelling argument for anything. Millions of users are affected ------------------------------ The numbers game is also a frustrating exercise here. We keep hearing that "millions of users are affected by bad code", that scans of Google almost immediately find sites with vulnerabilities. But I don't see anyone pointing at a single documented case of an actual exploit caused by Python's random module. There's no bug report. There's no security alert notification. How are those millions of users affected? Their level of risk is increased? Who can tell that? Are any of the sites identified holding personal data? Not all websites on the internet are *worth* hacking. And I feel that expressing that view is somehow frowned on. That "it doesn't matter" is an unacceptable view to hold. And so, the responses to my questions feel personal, they feel like criticisms of me personally, that I'm being unprofessional. I don't want to make this a big deal, but the code of conduct says "we're tactful when approaching differing views", and it really doesn't feel like that. I understand that the whole security thing is a numbers game. And that it's about assessing risk. But what risk is enough to trigger a response? A 10% increased chance of any given website being hacked? 5%? 1%? Again, I'm not asking to use the information to veto a change. I'm asking to *understand your position*. To better assess your arguments, so that I can be open to persuasion, and to *agree* with you, if your arguments are sound. Furthermore, should we not take into account other languages and approaches at this point? Isn't PHP a well-known "soft target"? Isn't phishing and social engineering the best approach to hacking these days, rather than cracking RNGs? I don't know, and I look to security experts for advice here. So please explain for me, how are you assessing the risks, and why do you judge this specific risk high enough to warrant a response? The impression I get is that the security view is that *any* risk, no matter how small, once identified, warrants a response. "Do nothing" is never an option. If that's your position, then I'm sorry, but I simply don't agree with you. I don't want to live in a world that paranoid, and I'm unsure how to get past this point to have a meaningful dialog. History, and security's "bad rep" --------------------------------- Donald asked if I was experiencing some level of spill-over from distutils-sig, where there has *also* been a lot of security churn (far more than here). Yes, I am. No doubt about that. On distutils-sig, and pip in particular, it's clear to see a lot of frustration from users with the long-running series of security changes. The tone of bug reports is frustrated and annoyed. Users want a break from being forced to make changes. Outside of Python, and speaking purely from my own experience in the corporate world, security is pretty uniformly seen as an annoying overhead, and a block on actually getting the job done. You can dismiss that as misguided, but it's a fact. "We need to do this for security" is a direct challenge to people to dismiss it as unnecessary, and often to immediately start looking for ways to bypass the requirement "so that it doesn't get in the way". I try not to take that attitude in this sort of debate, but at the same time, I do try to *represent* that view and ask for help in addressing it. The level of change in core Python is far less than on distutils-sig, and has been relatively isolated from "non-web" areas. People understand (and are grateful for) increases in "secure by default" behaviour in code like urllib and ssl. They know that these are places where security is important, where getting it right is harder than you'd think, and where trusting experts to do the hard thinking for you is important. But things like hash randomisation and the random module are less obviously security related. The feedback from hash randomisation focused on "why did you break my code?". It wasn't a big deal, people were relying on undocumented behaviour and accepted that, but they did see it as a breakage from a security fix. I expect the same to be true with the random module, but with the added dimension that we're proposing changing documented behaviour this time. As a result of similar arguments applying to every security change, and those arguments never *really* seeming to satisfy people, there's a lot of reiterated debate. And that's driving interested but non-expert people away from contributing to the discussion. So we end up with a lack of checks and balances because people without a vested interest in tightening security "tune out" of the debates. I see that as a problem. But ultimately, if we can't find a better way of running these discussions, I don't know how we fix it. I certainly can't continue being devil's advocate every time. Anyway, that's me done on this thread. I hope I've added more benefit than cost to the discussion. Thanks to everyone for responding to my questions - even if we all felt like we were both just repeating the same thing, it's a lot of effort doing so and I appreciate your time. Paul

Paul Moore <p.f.moore@...> writes: [snip well-reasoned paragraphs] I want to add that the dichotomy between "security-minded" and "non-security-minded" that has been used for rhetoric purposes has no basis in reality. Several "non-security-minded" devs (of the kind who have *actually* contributed a lot of code to CPython) have a pretty good grasp of cryptography and just don't like security theater. Stefan Krah

On 15 September 2015 at 13:08, Stefan Krah <skrah@bytereef.org> wrote:
Agreed, and every time I ended up looking for words for the two "sides", I ended up feeling uncomfortable. There are no "sides" here, just a variety of people with a variety of experiences, who want to feel assured that their voices are being heard. Paul

On September 15, 2015 at 7:04:52 AM, Paul Moore (p.f.moore@gmail.com) wrote:
The answer to "Why Now?"" is basically because someone brought it up. I realize that's a pretty arbitrary thing but I'm not sure what answer would even be acceptable here. When is an OK time to do it in your eye? Is it only after there is a public, known attack against the RNG? Is it only when the module is first being added? The sad state of affairs is that it's only been relatively recently that our industry as a whole has really taken security seriously so there is a lot of things out there that are not well designed from a security POV. We can't go back in time and change the original mistake, but we can repair it going into the future.
The argument is basically that security is an important part of API design, and that if you look at what people are doing in practice, it gives you an idea of how people think they should use the API. It's kind of like looking at a situation like this: https://i.imgur.com/0gnb7Us.jpg and concluding that maybe we should pave that worn down footpath, because people are going to use it anyways.
So a big part of this is certainly preventative. It's a fairly relatively recent development that hacking went from indivduals or small teams doing it to big targets to a business on it's own. There are literally giant office complexes in places like Russia and China filled with employees in cubicles, but they aren't writing software like at a normal company, they are just trawling around the internet, looking for targets, trying to expand botnets looking for anything and everything they can get their hands on. It's also true that there isn't going to be a big fanfaire for *most* actual hacked computers/sites. Most of the time the people running the site simply won't ever know, they'll just be silently hosting malware or having their user's passwords being fed into other sites. It's very few exploits that actually get noticed and when noticed it's unlikely they get public attention. I'd also suggest that for changes like these, if someone was exploited by this they'd probably look at the documentation for random.py and see that they were accidently using the module wrong, and then blame themselves and not ever bother to file a bug report. It is my opinion that it's not really their fault that the API lead them to believe that what they were doing was right.
Actually, all sites on the internet *are* worth hacking, depending on what you call hacking. Malware is constantly being hosted on tiny sites that most wouldn't call "worth" hacking, but malware authors were able to hack in some way and then they uploaded their malware there. If there are user logins it's likely that people reused username and passwords, so if you can get the passwords from one smaller site, it's possible you can use that as a door into a larger, more important site. Plus, there's also the desire for botnets to add more and more nodes into their swarm, they don't care what site you're hosting, they just want the machine. One key problem to the security of the internet as a whole is that there are a lot of small sites without dedicated security teams, or anyone who really knows security at all. These are easy targets for people and most languages and libraries make it far too easy for people to do the wrong thing.
It's basically a gut feeling since we can't get any hard data here. Things like being able to look online and find code in the wild that does this wrong within minutes gives us an idea at how likely it is as well as reasoning about what people who don't know what the difference is between ``random.random()`` and ``random.SystemRandom().random()`` as well as just a little bit of guessing based on experience with similar situations. Another input into this equation is how much it's likely that this change would break someone and once broken, how easy it will be to fix things. I sadly can't give anything more specific than that here, because it's a bit of an artform crossed with personal biases :(
Do nothing is absolutely an option, but most security focused folks don't take a scorched earth view of security so we often times don't bother to even mention a possible change unless we think that doing nothing is the wrong answer. An example going back to PEP 476 where we enabled TLS verification by default on HTTPS, we limited it to *only* HTTPS even though TLS is used by many other protocols because it was our opinion that doing nothing for those protocols was the right call. Those are protocols are still insecure by default, but doing something about that by default would break too much for us to be willing to even suggest it. On top of that, we tend to want to prioritize the things we do try to have happen, so we focus on things with the smallest fallout or the biggest upsides and we ignore other things until later. This is probably why there's some bias that it looks like doing nothing is an option, because we already self select what we choose to push forward because we *do* care about backwards compatability too.
I think a lot of these changes are paying down technical debt of two decades of (industry standard) lack of focus on security. It sucks, but when we come out the other side (because hopefully, new APIs and modules will be better designed with security in mind given our new landscape) we should hopefully be in a much better situation. In the distutils-sig side, I think that PEP 470 was the last breaking change that I can think of that we'll need to do in the name of security, we've paid down that particular bit of technical debt, and once that lands we'll have a pretty decent story. We still have other kinds of techincal debt to pay down though :(
Things don't really satisify people because they often times fundamentally don't care about security. That is perfectly reasonable, so don't think that I expect everyone to care about security, but they simply don't. However, In my opinion we have a moral obligation to try and do what we reasonably can to protect people. It's a bit like social safety nets, one person might ask why they are being asked to pay taxes, after all they never needed government assistance but by asking every citizen to pay in, they can try and help people from falling through the cracks. This isn't a social safety net, it's a security safety net.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 15 September 2015 at 01:01, Paul Moore <p.f.moore@gmail.com> wrote:
This may be at the core of the disagreement, as we're not talking "one or two people", we're talking tens of millions. While wearing my "PSF Director" hat, I spend a lot of time talking to professional educators, and recently organised the first "Python in Education" miniconf at PyCon Australia. If you look at the inroads we're making across primary, secondary and tertiary education, as well as through workshops like Software Carpentry and DjangoGirls, a *lot* of people around the world are going to be introduced to text based programming over the coming decades by way of Python. That level of success brings with it a commensurate level of responsibility: if we're setting those students up for future security failures, that's *on us* as language designers, not on them for failing to learn to avoid traps we've accidentally laid for them (because *we* previously didn't know any better). Switching back to my "security wonk" hat, the historical approach to computer security has been "secure settings are opt in, so only qualified experts should be allowed to write security sensitive software". What we've learned as an industry (the hard way) is that this approach *doesn't work*. The main reason it doesn't work is the one that was part of the rationale for the HTTPS changes in PEP 476: when security failures are silent by default, you generally don't find out that you forgot to flip the "I need this to be secure" switch until *after* the system you're responsible for has been compromised (with whatever consequences that may have for your users). The law of large numbers then tells us that even if (for example) only 1 in 1000 people forget to flip the "be secure" switch when they needed it (or don't even know that the switch *exists*), it's a practical certainty that when you have millions of programmers using your language (and you don't climb to near the top of the IEEE rankings without that), you're going to be hitting that failure mode regularly as a collective group. We have the power to mitigate that harm permanently *just by changing the default behaviour of the random module*. However, that has a cost: it causes problems for some current users for the sake of better serving future users. That's what transition strategy design is about, and I'll take that up in the other thread. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On September 10, 2015 at 10:21:11 AM, Donald Stufft (donald@stufft.io) wrote:
I wanted to try and test this. These are not super scientific since I just ran them on a single computer once (but 10 million iterations each) but I think it can probably give us an indication of the differences? I put the code up at https://github.com/dstufft/randtest but it's a pretty simple module. I'm not sure if (double)arc4random() / UINT_MAX is a reasonable way to get a double out of arc4random (which returns a uint) that is between 0.0 and 1.0, but I assume it's fine at least for this test. Here's the results from running the test on my personal computer which is running the OSX El Capitan public Beta: $ python test.py Number of Calls: 10000000 +---------------+--------------------+ | method | usecs per call | +---------------+--------------------+ | deterministic | 0.0586802460020408 | | system | 1.6681434757076203 | | userland | 0.1534261149005033 | +---------------+--------------------+ I'll try it against OpenBSD later to see if their implementation of arc4random is faster than OSX. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

[Donald Stufft <donald@stufft.io>, on arc4random speed]
arc4random() specifically returns uint32_t, which is 21 bits shy of what's needed to generate a reasonable random double. Our MT wrapping internally generates two 32-bit uint32_t thingies, and pastes them together like so (Python's C code here): """ /* random_random is the function named genrand_res53 in the original code; * generates a random number on [0,1) with 53-bit resolution; note that * 9007199254740992 == 2**53; I assume they're spelling "/2**53" as * multiply-by-reciprocal in the (likely vain) hope that the compiler will * optimize the division away at compile-time. 67108864 is 2**26. In * effect, a contains 27 random bits shifted left 26, and b fills in the * lower 26 bits of the 53-bit numerator. * The orginal code credited Isaku Wada for this algorithm, 2002/01/09. */ static PyObject * random_random(RandomObject *self) { PY_UINT32_T a=genrand_int32(self)>>5, b=genrand_int32(self)>>6; return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0)); } """ So now you know how to make it more directly comparable. The high-order bit is that it requires 2 calls to the 32-bit uint integer primitive to get a double, and that can indeed be significant.
Just noting that most people timing the OpenBSD version seem to comment out the "get stuff from the kernel periodically" part first, in order to time the algorithm instead of the kernel ;-) In real life, though, they both count, so I like what you're doing better.

On September 10, 2015 at 1:24:05 PM, Tim Peters (tim.peters@gmail.com) wrote:
It didn’t change the results really though: My OSX El Capitan machine: Number of Calls: 10000000 +---------------+---------------------+ | method | usecs per call | +---------------+---------------------+ | deterministic | 0.05792283279588446 | | system | 1.7192466521984897 | | userland | 0.17901834140066059 | +---------------+——————————+ An OpenBSD 5.7 VM: Number of Calls: 10000000 +---------------+---------------------+ | method | usecs per call | +---------------+---------------------+ | deterministic | 0.06555143180000868 | | system | 0.8929547749999983 | | userland | 0.16291017429998647 | +---------------+---------------------+ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Sep 10, 2015, at 07:21, Donald Stufft <donald@stufft.io> wrote:
But that isn't a fix, unless all your code is in a single module. If I call random.seed in game.py and then call random.choice in aiplayer.py, I'll get different results after your fix than I did before. What I'd need to do instead is create a separate myrandom.py that does this and then exports all of the bound methods of random as top-level functions, and then make game.py, aiplayer.py, etc. all import myrandom as random. Which is, while not exactly hard, certainly harder, and much less obvious, than the incorrect fix that you've suggested, and it may not be immediately obvious that it's wrong until someone files a bug three versions later claiming that when he reloads a game the AI cheats and you have to track through the problem. That's why I suggested the set_default_instance function, which makes this problem trivial to solve in a correct way instead of in an incorrect way.

On Sep 10, 2015, at 15:46, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Actually, I just thought of an even simpler solution: Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random".

On 11 September 2015 at 08:54, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Actually, I just thought of an even simpler solution:
Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random".
Change the spelling to "import random.seeded_random as random" and the user fix is even shorter. I do agree with the idea of continuing to provide a process global instance of the current PRNG for ease of migration - changing a single import is a good way to be able to address a deprecation, and looking for the use of seeded_random in a security sensitive context would still be fairly straightforward. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 10, 2015, at 19:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
OK, sure; I don't care much about the spelling. I think neither name will be unduly confusing to novices, and anyone who actually wants to understand what the choice means will use help or the docs or a Google search and find out in a few seconds.
Personally, I think we're done with that change. Deprecation of the names random.Random, random.random(), etc. is sufficient to prevent people from making mistakes without realizing it. Having a good workaround to prevent code churn for the thousands of affected apps means the cost doesn't outweigh the benefits. So, the problem Theo raised is solved.[1] Which means the more radical solution he offered is unnecessary. Unless we're seriously worried that some people who aren't sure if they need Seeded or System may incorrectly choose Seeded just because of performance, there's no need to add a Chacha choice alongside them. Put it on PyPI, maybe with a link from the SystemRandom docs, and see how things go from there. [1] Well, it's not quite solved, because someone has to figure out how to organize things in the docs, which obviously need to change. Do we tell people how to choose between creating a SeededRandom or SystemRandom instance, then describe their interface, and then include a brief note "... but for porting old code, or when you explicitly need a globally shared Seeded instance, use seeded_random"? Or do we present all three as equally valid choices, and try to explain why you might want the singleton seeded_random vs. constructing and managing an instance or instances?

On 11 September 2015 at 13:18, Andrew Barnert <abarnert@yahoo.com> wrote:
Personally, I think we're done with that change. Deprecation of the names random.Random, random.random(), etc. is sufficient to prevent people from making mistakes without realizing it.
Implementing dice rolling or number guessing for a game as "from random import randint" is *not* a mistake, and I'm adamantly opposed to any proposal that makes it one - the cost imposed on educational use cases would be far too high. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
Implementing dice rolling or number guessing for a game as "from random import randint" is *not* a mistake,
Turning the number guessing game into a text CAPTCHA might be one, though. That randint may as well be crypto strong, modulo the problem that people who use an explicit seed get punished for knowing what they're doing. I suppose it would be too magic to have the seed method substitute the traditional PRNG for the default, while an implicitly seeded RNG defaults to a crypto strong algorithm? Steve

On Fri, Sep 11, 2015 at 2:44 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Ooh. Actually, I rather like that idea. If you don't seed the RNG, its output will be unpredictable; it doesn't matter whether it's a PRNG seeded by an unknown number, a PRNG seeded by /dev/urandom, a CSRNG, or just reading from /dev/urandom every time. Until you explicitly request determinism, you don't have it. If Python changes its RNG algorithm and you haven't been seeding it, would you even know? Could it ever matter to you? It would require a bit of an internals change; is it possible that code depends on random.seed and random.randint are bound methods of the same object? To implement what you describe, they'd probably have to not be. ChrisA

2015-09-11 6:54 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
I have thought of this idea and was quite seduced by it. However in this case on a non seeded generator, getstate/setstate would be meaningless. I also wonder what pickling generators does.

On Fri, Sep 11, 2015 at 6:54 AM, Chris Angelico <rosuav@gmail.com> wrote:
I've also thought about this idea. The problem with it is that seed() and friends affect a global instance of Random. If, after this change, there was a library that used random.random() for crypto, calling seed() in the main program (or any other library) would make it insecure. So we'd still be in a situation where nobody should use random() for crypto.

On Fri, Sep 11, 2015 at 6:08 PM, Petr Viktorin <encukou@gmail.com> wrote:
So library functions shouldn't use random.random() for anything they know needs security. If you write a function generate_password(), the responsibility is yours to ensure that it's entropic rather than deterministic. That's no different from the current situation (seeding the RNG makes it deterministic) except that the unseeded RNG is not just harder to predict, it's actually entropic. In some cases, having the 99% by default is a barrier to people who need the 100%. (Conflating UCS-2 with Unicode deceives people into thinking their program works just fine, and then it fails on astral characters.) But in this case, there's no perfect-by-default solution, so IMO the best two solutions are: Be great, but vulnerable to an external seed(), until someone chooses; or have no random number generation until someone chooses. We know that the latter is a terrible option for learning, so vulnerability to someone else calling random.seed() is a small price to pay. ChrisA

On Fri, Sep 11, 2015, at 00:54, Chris Angelico wrote:
That's a ridiculous thing to depend on.
To implement what you describe, they'd probably have to not be.
You could implement one class that calls either a SystemRandom instance or an instance of another class depending on which mode it is in.

On 11 September 2015 at 05:44, Stephen J. Turnbull <stephen@xemacs.org> wrote:
One issue with that - often, programs simply use a RNG for their own purposes, but offer a means of getting the seed after the fact for reproducibility reasons (the "map seed" case, for example). Pseudo-code: if <user supplied a "seed">: state = <user-supplied value> random.setstate(state) else: state = random.getstate() ... do the program's main job, never calling seed/setstate if <user requests the "seed">: print state So getstate (and setstate) would also need to switch to a PRNG. There's actually very few cases I can think of where I'd need seed() (as opposed to setstate()). Maybe if I let the user *choose* a seed Some games do this. Paul

On Fri, Sep 11, 2015 at 1:02 AM, Paul Moore <p.f.moore@gmail.com> wrote:
You don't really want to use the full 4992 byte state for a "map seed" application anyway (type 'random.getstate()' in a REPL and watch your terminal scroll down multiple pages...). No game actually uses map seeds that look anything like that. I'm 99% sure that real applications in this category are actually using logic like: if <user supplied a "seed">: seed = user_seed() else: # use some RNG that was seeded with real entropy seed = random_short_printable_string() r = random.Random(seed) # now use 'r' to generate the map -n -- Nathaniel J. Smith -- http://vorpus.org

On 11 September 2015 at 10:52, Nathaniel Smith <njs@pobox.com> wrote:
Yeah, good point. As I say, I don't actually *use* this in the example program I'm thinking of, I just know it's a feature I need to add in due course. So when I do, I'll have to look into how to best implement it. (And I'll probably nick the approach you show above, thanks ;-)) Paul

On 11 September 2015 at 11:07, Andrew Barnert <abarnert@yahoo.com> wrote:
But games do store the entire map state with saved games if they want repeatable saves (e.g., to prevent players from defeating the RNG by save scumming).
So far off-topic it's not true, but a number of games I know of (e.g., Factorio, Minecraft) include a means to get a map seed (a simple text string) which you can publish, that allows other users to (in effect) play on the same map as you. That's different from saves. Paul

On Fri, Sep 11, 2015, at 06:10, Paul Moore wrote:
Of course, Minecraft doesn't actually use the seed in such a simple way as seeding a single-sequence random number generator. If it did, the map would depend on what order you visited regions in. (This is less of an issue for games with finite worlds)

On 10 September 2015 at 23:46, Andrew Barnert <abarnert@yahoo.com> wrote:
Note that this is another case of wanting "correct by default". Requiring the user to pass around a RNG object makes it easy to do the wrong thing - because (as above) people can too easily create multiple independent RNGs by mistake, which means your numbers don't necessarily satisfy the randomness criteria any more. "Secure by default" isn't (and shouldn't be) the only example of "correct by default" that matters here. Whether "secure" is more important than "gives the right results" is a matter of opinion, and application dependent. Password generators have more need to be secure than to be mathematically random, Monte Carlo simulations (and to a lesser extent games) the other way around. Many things care about neither. If we can't manage "correct and secure by default", someone (and it won't be me) has to decide which end of the scale gets preference. Paul.

On Fri, Sep 11, 2015 at 1:11 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Accidentally creating multiple independent RNGs is not going to cause any problems with respect to randomness. It only creates a problem with respect to determinism/reproducibility. Beyond that I just find your message a bit baffling. I guess I believe you that you find passing around RNG objects to make it easy to do the wrong thing, but it's exactly the opposite of my experience: when writing code that cares about determinism/reproducibility, then for me, passing around RNG objects makes it way *easier* to get things right. It makes it much more obvious what kinds of refactoring will break reproducibility, and it enables all kinds of useful tricks. E.g., keeping to the example of games and "aiplayer.py", a common thing game designers want to do is to record playthroughs so they can be replayed again as demos or whatever. And a common way to do that is to (1) record the player's inputs, (2) make sure that the way the game state evolves through time is deterministic given the players inputs. (This isn't necessarily the *best* strategy, but it is a common one.) Now suppose we're writing a game like this, and we have a bunch of "enemies", each of whose behavior is partially random. So on each "tick" we have to iterate through each enemy and update its state. If we are using a single global RNG, then for correctness it becomes crucial that we always iterate over all enemies in exactly the same order. Which is a mess. A better strategy is, keep one global RNG for the level, but then when each new enemy is spawned, assign it its own RNG that will be used to determine its actions, and seed this RNG using a value sampled from the global RNG (!). Now the overall pattern of the game will be just as random, still be deterministic, and -- crucially -- it no longer matters what order we iterate over the enemies in. I particularly would not want to use the global RNG in any program that was complicated enough to involve multiple modules. Passing state between inter-module calls using a global variable is pretty much always a bad plan, and that's exactly what you're talking about here. Non-deterministic global RNGs are fine, b/c they're semantically stateless; it's exactly the cases where you care about the determinism of the RNG state that you want to *stop* using the global RNG. -n -- Nathaniel J. Smith -- http://vorpus.org

On Fri, Sep 11, 2015 at 8:26 PM, Nathaniel Smith <njs@pobox.com> wrote:
As long as the order you seed their RNGs is deterministic. And if you can do that, can't you iterate over them in a deterministic order too? ChrisA

On Thu, Sep 10, 2015 at 09:10:09AM -0400, Donald Stufft wrote:
Ironically, the spelling mistake in your example is a good example of how this is worse. Another reason why it's worse is that if you create a new instance every single time you need a random number, as you do above, performance is definitely going to suffer. By my timings, creating a new SystemRandom instance each time is around two times slower; creating a new DeterministicRandom (i.e. the current MT default) instance each time is over 100 times slower. Hypothetically, it may even hurt your randomness: it may be that some future (or current) (C)PRNG's quality will be "less random" (biased, predictable, or correlated) because you keep using a fresh instance rather than the same one. TL;DR: Yes, calling `random.choice` is *significantly better* than calling `random.SomethingRandom().choice`. It's better for beginners, it's even better for expert users whose random needs are small, and those whose needs are greater shouldn't be using the later anyway.
Is this a trick question? In the absense of a keylogger and screen reader monitoring my system while I run that code snippet, of course it is safe. In the absence of any credible attack on the password based on how it was generated, of course it is safe.
Nobody is saying that To put that question another way: "If you exclude the case where crypto would
This might be acceptable, although I wouldn't necessarily deprecate the random module.

On 11 September 2015 at 14:36, Steven D'Aprano <steve@pearwood.info> wrote:
I feel like I must have misunderstood you Steven. Didn't you just exclude the attack vector that we're discussing here? What we are saying is that a deterministic PRNG definitionally allows attacks on the password based on how it was generated. The very nature of a deterministic PRNG is that it is possible to predict subsequent outputs based on previous ones, or at least to dramatically constrain the search space. This is not a hypothetical attack, and it's not even a very complicated one. Now, it's possible that the way the system is constructed precludes this attack, but let me tell you that vastly more engineers think that about their systems than are actually right about it. Generally, if the word 'password' appears anywhere near something, you want to keep a Mersenne Twister as far away from it as possible. The concern being highlighted in this thread is that users who don't know what I just said (the vast majority) are at risk of writing deeply insecure code. We think the default should be changed.

On Sat, Sep 12, 2015 at 12:28 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
Only if an attacker can access many passwords generated from the same MT stream, right? If the entire program is as was posted (importing random and using random.choice(), then terminating), then an attack would have to be based on the seeding of the RNG, not on the RNG itself. There simply isn't enough content being generated for you to be able to learn the internal state, and even if you did, the next run of the program will be freshly seeded anyway. ChrisA

On 11 September 2015 at 15:33, Chris Angelico <rosuav@gmail.com> wrote:
Sure, if the entire program is as posted, but we should probably assume it isn't. Some programs definitely are, but I'm not worried about them: I'm worried about the ones that aren't.

On September 11, 2015 at 10:33:55 AM, Chris Angelico (rosuav@gmail.com) wrote:
This is a silly, take that code, stick it in a web application and have it generating API keys or session identifiers instead of passwords, or hell, even passwords or random tokens to reset password or any other such thing. Suddenly you have a case where you have a persistent process, so there isn't a new seed, and the attacker can more or less request an unlimited number of outputs. This isn't some mind boggling uncommon case. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Ah crap. Sorry folks, this post was *not supposed to go to the list* in this state. I'm having some trouble with my mail client (mutt) not saving drafts, so I intended to email it to myself for later editing, and didn't notice that the list was CCed. On Fri, Sep 11, 2015 at 11:36:13PM +1000, Steven D'Aprano wrote: [...] -- Steve

On Thu, Sep 10, 2015, at 08:29, Paul Moore wrote:
I don't understand why. What other word would you use to describe a generator that can be given a specific set of inputs to generate the same exact sequence of numbers every single time? If you want that feature, then you're not going to think "deterministic" means "not good enough". And if you don't want it, you, well, don't want it, so there's really no harm in the fact that you don't choose it. Personally, though, I don't see why we're not talking about calling it MersenneTwister.

On Thu, Sep 10, 2015 at 8:13 AM, <random832@fastmail.us> wrote:
Because while we want to reduce foot guns, we don't want to reduce usability. DeterministicRandom is fairly easy for anyone to understand. I would venture a guess that most people looking for that wouldn't know (or care) what the backing algorithm is. Further, if we stop using mersenne twister in the future, we would have to remove that class name. DeterministicRandom can be agnostic of the underlying algorithm and is friendlier to people who don't need to know or care about what algorithm is generating the numbers, they only need to understand the properties of that generator.

On Thu, Sep 10, 2015, at 09:44, Ian Cordasco wrote:
If we're serious about being deterministic, then we should keep that class under that name and provide a new class for the new algorithm. What's the point of having a deterministic algorithm if you can't reproduce your results in the new version because the algorithm was deleted?

On Thu, Sep 10, 2015 at 8:55 AM, <random832@fastmail.us> wrote:
This is totally off topic. That said as a counter-point: What's the point of carrying around code you don't want people to use if they're just going to use it anyway?

On Sep 10, 2015 5:29 AM, "Paul Moore" <p.f.moore@gmail.com> wrote: [...]
Regarding the "harder to use" point (which is obviously just one of many considerations in this while debate): I trained myself a few years ago to stop using the global random functions and instead always pass around an explicit RNG object, and my experience is that once I got into the habit it gave me a strict improvement in code quality. Suddenly many more of my functions are deterministic ... well ... functions ... of their inputs, and suddenly it's clearly marked in the source which ones have randomness in their semantics, and suddenly it's much easier to do things like refactor the code while preserving the output for a given seed. (This is tricky because just changing the order in which you do things can break your code. I wince in sympathy at people who have to maintain code like your map-generation-from-a-seed example and *aren't* using RNG objects explicitly.) The implicit global RNG is a piece of global state, like global variables, and causes similar unpleasantness. Now that I don't use it, I look back and it's like "huh, why did I always used to hit myself in the face like that? That wasn't very pleasant." So this is what I teach my collaborators and students now. Most of them just use the global state by default because they don't even know about the OO option. YMMV but that's my experience FWIW. -n

Donald Stufft <donald@stufft.io> writes: ...
"security minded folks" [1] recommend "always use os.urandom()" and advise against *random* module [2,3] despite being aware of random.SystemRandom() [4] i.e., if they are right then *random* module probably only need to care about group #1 and avoid creating the false sense of security in group #3. [1] https://github.com/pyca/cryptography/blob/92d8bd12609586bfa53cf8c7a691e37474... [2] https://cryptography.io/en/latest/random-numbers/ [3] https://github.com/pyca/cryptography/blob/92d8bd12609586bfa53cf8c7a691e37474... [4] https://github.com/pyca/cryptography/issues/2278

On September 10, 2015 at 2:08:46 PM, Akira Li (4kir4.1i@gmail.com) wrote:
Maybe you didn't notice you’re talking to the third name in the list of authors that you linked too, but that documentation is there primarily because the random module's API is problematic and it's easier to recommend people to not use it than to try and explain how to use it safely. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Thu, Sep 10, 2015 at 9:19 PM, Donald Stufft <donald@stufft.io> wrote:
Obviously, I've noticed it but I didn't want to call you out. but that documentation is there primarily because the
"it's easier to recommend people to not use it than to try and explain how to use it safely." that is exactly the point if random.SystemRandom() is not safe to use while being based on "secure" os.urandom() then providing the same API based on (possibly less secure) arc4random() won't be any safer.

On September 10, 2015 at 2:40:54 PM, Akira Li (4kir4.1i@gmail.com) wrote:
"If the mountain won't come to Muhammad then Muhammad must go to the mountain." In other words, we can write all the documentation in the world we want, and it doesn't change the simple fact that by choosing a default, there is going to be some people who will use it when it's inappropiate due to the fact that it is the default. The pratical effect of changing the default will be that some cases are broken, but in a way that is obvious and trivial to fix, some cases won't have any pratical effect at all, and finally, for some people it's going to take code that was previously completely insecure and make it either secure or harder to exploit for people who are incorrectly using the API. I wouldn't expect the documentation in pyca/cryptography to change, it'd still recommend people to use os.urandom directly and we'd still recommend that people should use SystemRandom/os.urandom in the random.py docs for things that need to be cryptographically secure, this is just a safety net for people who don't know or didn't listen. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
participants (24)
-
Akira Li
-
Alexander Walters
-
Andrew Barnert
-
Antoine Pitrou
-
Brett Cannon
-
Chris Angelico
-
Cory Benfield
-
Donald Stufft
-
Emile van Sebille
-
Georg Brandl
-
Greg Ewing
-
Ian Cordasco
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
Random832
-
random832@fastmail.us
-
Serhiy Storchaka
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tim Peters
-
Xavier Combelle