[Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux
Steven D'Aprano
steve at pearwood.info
Fri Sep 11 15:36:13 CEST 2015
On Thu, Sep 10, 2015 at 09:10:09AM -0400, Donald Stufft wrote:
> Essentially, other than typing a little bit more, why is:
>
> import random
> print(random.choice([“a”, “b”, “c”]))
>
> better than
>
> import random;
> print(random.DetereministicRandom().choice([“a”, “b”, “C”]))
Ironically, the spelling mistake in your example is a good example of
how this is worse.
Another reason why it's worse is that if you create a new instance every
single time you need a random number, as you do above, performance is
definitely going to suffer. By my timings, creating a new SystemRandom
instance each time is around two times slower; creating a new
DeterministicRandom (i.e. the current MT default) instance each time is
over 100 times slower.
Hypothetically, it may even hurt your randomness: it may be that some
future (or current) (C)PRNG's quality will be "less random" (biased,
predictable, or correlated) because you keep using a fresh instance
rather than the same one.
TL;DR:
Yes, calling `random.choice` is *significantly better* than calling
`random.SomethingRandom().choice`. It's better for beginners, it's even
better for expert users whose random needs are small, and those whose
needs are greater shouldn't be using the later anyway.
> You're allowed to pick DeterministicRandom, you're even allowed to do it
> without thinking. This isn't about making it impossible to ever insecurely use
> random numbers, that's obviously a boil the ocean level of problem, this is
> about trying to make it more likely that someone won't be hit by a fairly easy
> to hit footgun if it does matter for them, even if they don't know it. It's
> also about making code that is easier to understand on the surface, for example
> without using the prior knowledge that it's using MT, tell me how you'd know
> if this was safe or not:
>
> import random
> import string
> password = "".join(random.choice(string.ascii_letters) for _ in range(9))
> print("Your random password is",)
Is this a trick question?
In the absense of a keylogger and screen
reader monitoring my system while I run that code snippet, of course it
is safe.
In the absence of any credible attack on the password based on how it
was generated, of course it is safe.
> Can you point out one use case where cryptographically safe random numbers,
> assuming we could generate them as quickly as you asked for them, would hurt
> you unless you needed/wanted to be able to save the seed and thus require or
> want deterministic results?
Nobody is saying that
To put that question another way: "If you exclude the case where crypto
would
> Reminder that this warning does not show up (in any color, much less red)
> if you’re using ``help(random)`` or ``dir(random)`` to explore the random
> module. It also does not show up in code review when you see someone doing
> random.random.
>
> It encourages you to write bad code, because it has a baked in assumption that
> there is a sane default for a random number generator and expects people to
> understand a fairly dificult concept, which is that not all "random" is equal.
>
> For instance, you've already made the mistake of saying you wanted "random" not
> deterministic, but the two are not mutually exlusive and deterministic is a
> property that a source of random can have, and one that you need for one of the
> features you say you like.
>
> >
> > > Here’s a game a friend of mine created where the purpose of the game is
> > > to essentially unrandomize some random data, which is only possible
> > > because it’s (purposely) using MT to make it possible
> > > https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia
> > > case, it’s a real concern that will absolutely fix some insecure software
> > > out there instead of telling them “welp typing a little bit extra once
> > > an import is too much of a burden for me and really it’s your own fault
> > > anyways”.
> >
> > I don't understand how that game (which is an interesting way of
> > showing people how attacks on crypto work, sure, but that's just
> > education, which you dismissed above) relates to the issue here.
> >
> > And I hope you don't really think that your quote is even remotely
> > what I'm trying to say (I'm not that selfish) - my point is that not
> > everything is security related. Not every application people write,
> > and not every API in the stdlib. You're claiming that the random
> > module is security related. I'm claiming it's not, it's documented as
> > not being, and that's clear to the people who use it for its intended
> > purpose. Telling those people that you want to make a module designed
> > for their use harder to use because people for whom it's not intended
> > can't read the documentation which explicitly states that it's not
> > suitable for them, is doing a disservice to those people who are
> > already using the module correctly for its stated purpose.
>
> I'm claiming that the term random is ambiguously both security related and
> people to pick whether or not their use case is security related, or we should
> assume that it is unless otherwise instructed. I don't particularly care what
> the exact spelling of this looks like, random.(System|Secure)Random and
> random.DeterministicRandom is just one option.
> Another option is to look at
> something closer to what Go did and deprecate the "random" module and move the
> MT based thing to ``math.random`` and the CSPRNG can be moved to something like
> crypto.random.
This might be acceptable, although I wouldn't necessarily deprecate the
random module.
>
> >
> > By the same argument, we should remove the statistics module because
> > it can be used by people with numerically unstable problems. (I doubt
> > you'll find StackOverflow questions along these lines yet, but that's
> > only because (a) the module's pretty new, and (b) it actually works
> > pretty hard to handle the hard corner cases, but I bet they'll start
> > turning up in due course, if only from the people who don't understand
> > floating point...)
> >
>
> No, by this argument we shouldn't have a function called statistics in the
> statistics module because there is no globally "right" answer for what the
> default should be. Should it be mean? mode? median? Why is *your* use case the
> "right" use case for the default option, particularly in a situation where
> picking the wrong option can be disastrous.
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas
mailing list