[Python-ideas] PEP 504: Using the system RNG by default

Thu Sep 17 20:07:28 CEST 2015

[Tim]
>> Just two things to note:
>>
>> 1. Whatever task-appropriate higher-level functions people want, as
>> you've shown "secure" implementations are easy to write for someone
>> who knows what's available to build on.  It will take 10000 times
>> longer for people to bikeshed what "secrets" should offer than to
>> implement it ;-)

[Nick Coghlan <ncoghlan at gmail.com>]
> Agreed, although the 4 I listed are fairly well-credentialed - the
> implementations of the first two (raw bytes and integers) are the
> patterns cryptography.io uses, the token generator is comparable to
> the Django one (with a couple of extra punctuation characters in the
> alphabet), and the hex digit generator is the Pyramid one.

I will immodestly claim that nobody needs to be a crypto-wonk to see
that these implementations are exactly as secure (or insecure) as the
platform urandom():  in each case, it's trivial to invert the output
to recover the exact bytes urandom() returned.  So if there's any
attack against the outputs, that's also an attack against what
urandom() returned.  The outputs just spell what urandom returned
using a different alphabet.

For the same reason, e.g., it would be fine to replace each 0 bit in
urandom's result with the string "egg", and each 1 bit with the string
"turtle".  An attack on the output of that is exactly as hard (or
easy) as an attack on the output of urandom.  Obvious, right?  It's
only a little harder to see that the same is true of even the fanciest
of your 4 functions.

Where you _may_ get in trouble is creating a non-invertible output.  Like:

    def secure_int(nbytes):
        n = int.from_bytes(os.urandom(nbytes), "big")
        return n - n

That's not likely to be useful ;-)

> You can get more exotic with full arbitrary alphabet password and
> passphrase generators, but I think we're getting beyond stdlib level
> functionality at that point - it's getting into the realm of password
> managers and attack software.

I'll leave that for the discussion of Steven's PEP.  I think he was on
the right track to, e.g., suggest a secure choice() as one his few
base building blocks.  It _does_ take some expertise to implement a
secure choice() correctly, but not so much from the crypto view as
from the free-from-statistical-bias view.  SystemRandom.choice()
already gets both right.

>> 2. I'd personally be surprised if a function taking a "number of bits"
>> argument silently replaced argument `bits` with `bits - bits % 8`.  If
>> the app-level programmers at issue can't think in terms of bytes
>> instead (and use functions with a `bytes` argument), then, e.g.,
>> better to raise an exception if `bits % 8 != 0` to begin with.  Or to
>> round up, taking "bits" as meaning "a number of bytes covering _at
>> least_ the number of bits asked for".

> Yeah, I took a shortcut to keep them all as pretty one liners. A
> proper rand_bits with that API would look something like:
>
>     def rand_bits(bits):
>         num_bytes, add_byte = divmod(bits)
>         if add_byte:
>             num_bytes += 1
>         return os.urandom(bits)

You should really be calling that with "num_bytes" now ;-)

> Compared to the os.urandom() call itself, the bits -> bytes
> calculation should disappear into the noise from a speed perspective
> (and a JIT compiled runtime like PyPy could likely optimise it away
> entirely).

Goodness - "premature optimization" already?! ;-)  Fastest in pure
Python is likely

    num_bytes = (bits + 7) >> 3

But if I were bikeshedding I'd question why the function weren't:

    def rand_bytes(nbytes):
       return os.urandom(nbytes)

instead.  A rand_bits(nbits) that meant what it said would likely also
be useful: