[Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library

Mon Sep 21 00:07:43 CEST 2015

On Sep 20, 2015, at 05:26, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
>> On 20 September 2015 at 20:56, Paul Moore <p.f.moore at gmail.com> wrote:
>> Given where this started, I'd suggest renaming token_alpha as
>> "password". Beginners wouldn't necessarily associate the term "token"
>> with the problem "I want to generate a random password" [1]. Maybe add
>> a short recipe showing how to meet constraints like "at least 2
>> digits" by simply generating repeatedly until a valid password is
>> found.
>> 
>> For a bit of extra bikeshedding, I'd make alphabet the second,
>> optional, parameter and default it to
>> string.ascii_letters+string.digits+string.punctuation, as that's often
>> what password constraints require.
>> 
>> Or at the very least, document how to use the module functions for the
>> common tasks we see people getting wrong. But I thought the idea here
>> was to make doing things the right way obvious, for people who don't
>> read documentation, so I'd prefer to see the functions exposed by the
>> module named based on the problems they solve, not on the features
>> they provide. (Even if that involves a little duplication, and/or a
>> split between "high level" and "low level" APIs).
> 
> Right, I'd suggest the following breakdown.
> 
> * Arbitrary password generation (also covers passphrase generation
> from a word list):
> 
>    secrets.password(result_len: int,
> alphabet=string.ascii_letters+string.digits+string.punctuation: T) ->
> T

If T is a word list--that is, an Iterable of str or bytes--you want to return a str or a bytes, not a T.

Also, making it work that generically will make the code much more complicated, to the point where it no longer serves as useful sample code to rank novices. You have to extract the first element of T, then do your choosing off chain([first], T) instead of off T, then type(first).join; all of that is more complicated than the actual logic, and will obscure the important part we want novices to learn if they read the source.

Also, I think for word lists, I think you'd want a way to specify actual passphrases vs. the xkcd 936 idea of using passphrases as passwords even for sites that don't accept spaces, like "correcthorsebatterystaple". Maybe via a sep=' ' parameter? That would be very confusing if it's ignored when T is string-like but used when T is a non-string-like iterable of string-likes.

I think it's better to require T to be string-like than to try to generalize it, and maybe add a separate passphrase function that takes (words: Sequence[T], sep: T) -> T. (Although I'm not sure how to default to ' ' vs b' ' based on the type of T... But maybe this does need to handle bytes, so Sequence[str] is fine?)

> * Binary token generation ("num_random_bytes" is the arg to
> os.urandom, not the length of result):
> 
>    secrets.token(num_random_bytes: int) -> bytes
>    secrets.token_hex(num_random_bytes: int) -> bytes
>    secrets.token_urlsafe_base64(num_random_bytes: int) -> bytes
> 
> * Serial number generation ("num_random_bytes" is the arg to
> os.urandom, not the length of result):
> 
>    secrets.serial_number(num_random_bytes: int) -> int
> 
> * Constant time secret comparison (aka hmac.compare_digest):
> 
>    secrets.equal(a: T, b: T) -> bool
> 
> * Lower level building blocks:
> 
>    secrets.choice(container)
>    # Hold off on other SystemRandom methods?
> 
> (I don't have a strong opinion on that last point, as it's the higher
> level APIs that I think are the important aspect of this proposal)

I think randrange is definitely worth having. Even the OpenSSL and arc4random APIs provide something equivalent. If you're a novice, and following a blog post that says to use your language's equivalent of randbelow(1000000), are you going to think of choice(range(1000000))? And, if you do, are you going to convince yourself that this is reasonable and not going to create a slew of million-element lists?