[Python-ideas] Python's Source of Randomness and the random.py module Redux

Thu Sep 10 14:29:13 CEST 2015

On 10 September 2015 at 12:26, Donald Stufft <donald at stufft.io> wrote:
>> There is a fourth basic type. People (like me!) whose code absolutely
>> doesn't have any security issues, but want a simple, convenient, fast
>> RNG. Determinism is not an absolute requirement, but is very useful
>> (for writing tests, maybe, or for offering a deterministic rerun
>> option to the program). Simulation-style games often provide a way to
>> find the "map seed", which allows users to share interesting maps -
>> this is non-essential but a big quality-of-life benefit in such games.
>
> This group is the same as #3 except for the map seed thing which is
> group #1. In particular, it wouldn’t hurt you if the random you were
> using was cryptographically secure as long as it was fast and if you
> needed determinism, it would hurt you to say so. Which is the point
> that Theo was making.

I don't understand the phrase "if you needed determinism, it would
hurt you to say so". Could you clarify?

>>
>> IMO, the current module perfectly serves this fourth group.
>
> Making the user pick between Deterministic and Secure random would serve
> this purpose too, especially in a language where "In the face of ambiguity,
> refuse the temptation to guess" is one of the core tenets of the language. The
> largest downside would be typing a few extra characters, which Python is not
> a language that attempts to do things in the fewest number of characters.

And yet I know that I would routinely, and (this is the problem)
without thinking, choose Deterministic, because I know that my use
cases all get a (small) benefit from being able to capture the seed,
but I also know I'm not doing security-related stuff.

No amount of making me choose is going to help me spot security
implications that I've missed.

And also, calling the non-crypto choice "Deterministic" is unhelpful,
because I *don't* want something deterministic, I want something
random (I understand PRNGs aren't truly random, but "good enough for
my purposes" is what I want, and "deterministic" reads to me as saying
it's *not* good enough...)

>> While I accept your point that far too many people are using insecure
>> RNGs in "generate a random password" scripts, they are *not* the core
>> target audience of the default module-level functions in the random
>> module (did you find any examples of insecure use that *weren't*
>> password generators?). We should educate people that this is bad
>> practice, not change the module. Also, while it may be imperfect, it's
>> still better than what many people *actually* do, which is to use
>> "password" as a password on sensitive systems :-(
>
> You cannot document your way out of a UX problem.

What I'm trying to say is that this is an education problem more than
a UX problem.

Personally, I think I know enough about security for my (not a
security specialist) purposes. To that extent, if I'm working on
something with security implications, I'm looking for things that say
"Crypto" in the name. The rest of the time, I just use non-specialist
stuff. It's a similar situation to that of the "statistics" module. If
I'm doing "proper" maths, I'd go for numpy/scipy. If I just want some
averages and I'm not bothered about numerical stability, rounding
behaviour, etc, I'd go for the stdlib statistics package.

> The problem isn’t people doing this once on the command line to generate
> a password, the problem is people doing it in applications where they
> generate an API key, a session identifier, a random password which they
> then give to their users. If you give a way to get the output of the MT
> base random enough times, it can be used to determine what every random
> it generated was and will be.

To me, that's crypto and I'd look to the cryptography module, or to
something in the stdlib that explicitly said it was suitable for
crypto.

Saying people write bad code isn't enough - how does the current
module *encourage* them to write bad code? How much API change must we
allow to cater for people who won't read the statement in the docs (in
a big red box) "Warning: The pseudo-random generators of this module
should not be used for security purposes." (Specifically people
writing security related code who won't read the docs).

> Here’s a game a friend of mine created where the purpose of the game is
> to essentially unrandomize some random data, which is only possible
> because it’s (purposely) using MT to make it possible
> https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia
> case, it’s a real concern that will absolutely fix some insecure software
> out there instead of telling them “welp typing a little bit extra once
> an import is too much of a burden for me and really it’s your own fault
> anyways”.

I don't understand how that game (which is an interesting way of
showing people how attacks on crypto work, sure, but that's just
education, which you dismissed above) relates to the issue here.

And I hope you don't really think that your quote is even remotely
what I'm trying to say (I'm not that selfish) - my point is that not
everything is security related. Not every application people write,
and not every API in the stdlib. You're claiming that the random
module is security related. I'm claiming it's not, it's documented as
not being, and that's clear to the people who use it for its intended
purpose. Telling those people that you want to make a module designed
for their use harder to use because people for whom it's not intended
can't read the documentation which explicitly states that it's not
suitable for them, is doing a disservice to those people who are
already using the module correctly for its stated purpose.

By the same argument, we should remove the statistics module because
it can be used by people with numerically unstable problems. (I doubt
you'll find StackOverflow questions along these lines yet, but that's
only because (a) the module's pretty new, and (b) it actually works
pretty hard to handle the hard corner cases, but I bet they'll start
turning up in due course, if only from the people who don't understand
floating point...)

Paul