Pre-PEP Adding A Secrets Module To The Standard Library

Following on to the discussions about changing the default random number generator, I would like to propose an alternative: adding a secrets module to the standard library. Attached is a draft PEP. Feedback is requested. (I'm going to only be intermittently at the keyboard for the next day or so, so my responses may be rather slow.) -- Steve

Thanks! I'd accept this (and I'd reject 504 at the same time). I like the secrets name. I wonder though, should the PEP propose a specific set of functions? (With the understanding that we might add more later.) Hopefully someone on the peps team can commit your PEP in the repo. It's probably going to be PEP 506. On Sat, Sep 19, 2015 at 11:16 AM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

[Guido]
The bikeshedding on that will be far more tedious than the implementation. I'll get it started :-) No attempt to be minimal here. More-than-less "obvious" is more important: Bound methods of a SystemRandom instance .randrange() .randint() .randbits() renamed from .getrandbits() .randbelow(exclusive_upper_bound) renamed from private ._randbelow() .choice() Token functions .token_bytes(nbytes) another name for os.urandom() .token_hex(nbytes) same, but return string of ASCII hex digits .token_url(nbytes) same, but return URL-safe base64-encoded ASCII .token_alpha(alphabet, nchars) string of `nchars` characters drawn uniformly from `alphabet`

On Sun, Sep 20, 2015 at 9:40 AM, Tim Peters <tim.peters@gmail.com> wrote:
token_bytes "obviously" should return a bytes, and token_alpha equally obviously should be returning a str. (Or maybe it should return the same type as alphabet, which could be either?) What about the other two? Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)? ChrisA

[Tim Peters]
[Chris Angelico <rosuav@gmail.com>]
token_bytes "obviously" should return a bytes,
Which os.urandom() does in Python 3. I'm not writing docs, just suggesting the functions.
and token_alpha equally obviously should be returning a str.
Which part of "string" doesn't suggest "str"?
Which part of "ASCII" is ambiguous?
Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)?
And which part of "same"? ;-) Bikeshed away.; I'm outta this now ;-)

On Sun, Sep 20, 2015 at 10:19 AM, Tim Peters <tim.peters@gmail.com> wrote:
Heh :) My personal preference for shed colour: token_bytes returns a bytestring, its length being the number provided. All the others return Unicode strings, their lengths again being the number provided. So they're all text bar the one that explicitly says it's in bytes. But I'm aware others may disagree, and while "ASCII" might not be ambiguous, Py3 does still distinguish between b"asdf" and u"asdf" :) ChrisA

Chris Angelico writes:
I think that token_url may need a bytes mode, for the same reasons that bytes needs __mod__: such tokens will often be created and parsed by programs that never leave the "ASCII-compatible bytes" world.

On Sun, Sep 20, 2015 at 01:45:36PM +0900, Stephen J. Turnbull wrote:
I expect that token_url would return a string (Unicode), but since it's pure ASCII (being base64 encoded), if you want bytes, you can just call token_url().encode('ascii'). Or maybe it should return bytes, and if you want a string, you just say token_url().decode('ascii'). Out of the two, I'm very slightly leaning towards the first (Unicode by default, encode to ASCII if you want bytes) than the second. I'm very much not in favour of a "return_bytes=True" argument. -- Steve

On 20.09.2015 02:27, Chris Angelico wrote:
My personal preference would be for the number of bytes to rather reflect the entropy in the result. This would be a safer use when migrating from using e.g. token_url to token_alpha with the base32 alphabet [1], for example because you want to have better readable tokens. Speaking of which, a token_base32 would probably make sense, too. regards, jwi [1]: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt

On 22 September 2015 at 18:26, Jonas Wielicki <j.wielicki@sotecware.net> wrote:
This isn't something to decide by personal preference, it's something to be decide by considering the consequences of someone misunderstanding the API and not noticing that the result isn't what they expected. Scenario 1: API specifies bytes of entropy Consequence of misunderstanding: result is twice as long as expected, with more entropy than expected Scenario 2: API specifies length of result Consequence of misunderstanding: result is half as long as expected, with less entropy than expected Scenario 1 fails safe, scenario 2 doesn't, so for the APIs that are just reversible data transforms around os.urandom, it makes the most sense to specify the number of bytes of entropy you want. Building a password from an alphabet is different, as that involves repeated applications of secrets.choice() to the given alphabet, so you need to specify the result length directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 22, 2015, at 08:03, Nick Coghlan wrote:
Well, in principle, the length could be calculated from the number of bytes of entropy desired by using ceil(nbytes*log(256)/log(len(alphabet))), if all that matters is to "fail safe" [i.e. longer] rather than to not be surprising. Being calculated by repeated application of choice rather than some other algorithm is an implementation detail.

On Tue, Sep 22, 2015 at 10:26:13AM +0200, Jonas Wielicki wrote:
I think the answer there has to be 8. I interpret Tim's reference to "same" as that the intent of token_hex is to call os.urandom(nbytes), then convert it to a hex string. So the implementation might be as simple as: def token_hex(nbytes): return binascii.hexlify(os.urandom(nbytes)) modulo a call to .decode('ascii') if we want it to return a string. One obvious question is, how many bytes is enough? Perhaps we should set a default value for nbytes, with the understanding that the default value will increase in the future.
Oh oh, scope creep already! And so it begins... *wink* What you are referring to isn't the standard base32, which already exists in the stdlib (in base64.py, together with base16). It's is referred to by its creators as z-base-32, and the reasoning they give seems sound. It's not intended as a replacement for RFC-3458 base32, but an alternative. If the std lib already included a z-base-32 implementation, I would be happy to include token_zbase32 in the same spirit as token_base64. But it doesn't. So first you would have to convince somebody to add zbase32 to the standard library.
[1]: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
-- Steve

Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)?
[Steven D'Aprano]
Absolutely. If we're trying to "fail safe", it's the number of unpredictable source bytes that's important, not the length of the string produced. And, e.g., in the case of a URL-safe base64 encoding, passing "number of characters in the string" would be plain idiotic ;-)
Nick Coghlan already posted implementation of these things, before this thread started. They're all easy, _provided that_ you know which obscure functions to call; e.g., def token_url(nbytes): return base64.urlsafe_b64encode(os.urandom(nbytes)).decode("ascii")

On 20.09.15 02:40, Tim Peters wrote:
randbelow() is just an alias for randrange() with single argument. randint(a, b) == randrange(a, b+1). These functions are redundant and they have non-zero cost. Would not renaming getrandbits be confused?
token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? token_url(nbytes) == token_alpha( 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', nchars) ?

On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote:
But they already exist in the random module, so adding them to secrets doesn't cost anything extra. It's just a reference to the bound method of the private SystemRandom() instance: # suggested implementation import random _systemrandom = random.SystemRandom() randint= _systemrandom.randint randrange = _systemrandom.randrange etc.
They may be reasonable implementations for the functions, but simple as they are, I think we still want to provide them as named functions rather than expect the user to write things like the above. If they're doing it more than once, they'll want to write a helper function, we might as well provide that for them. -- Steve

On 2015-09-21 17:22, Steven D'Aprano wrote:
On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote:
On 20.09.15 02:40, Tim Peters wrote:
Actually, I don't think those are the semantics that Tim intended. Rather, token_hex(nbytes) would return a string twice as long as nbytes. The idea is that you want to get nbytes-worth of random bits, just encoded in a common "safe" format. Similarly, token_url(nbytes) would get nbytes of random bits then base64-encode it, not just pick nbytes characters from a URL-safe list of characters. This makes it easier to reason about how much entropy you are actually using. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 21.09.15 19:22, Steven D'Aprano wrote:
The main cost is learning and memorising cost. The fewer words you need to learn and keep in memory the better.
But why these particular alphabets are special? I expect that every application will use the alphabet that matches its needs. One needs decimal digits ('0123456789'), other needs English letters ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'), or letters and digits and underscore, or letters, digits and punctuation, or all safe ASCII characters, or all well graphical distinguished characters. Why token_hex and token_url, but not token_digits, token_letters, token_identifier, token_base32, token_base85, token_html_safe, etc?

On Mon, Sep 21, 2015, at 16:12, Serhiy Storchaka wrote:
Well, for one thing, they're trivial encodings of random bits, which is why passing in nbytes (number of random bytes) makes sense. Someone else pointed out that this makes it easier to reason about the amount of entropy involved. Token_base64 could actually, in principle, return a string with padding at the end according to base64 rules, if you ask for a number of bytes that is not a multiple of four. Base85 could likewise, for that matter, but base85 is a less common encoding.

Sorry to jump in with replying to a random message, but I can't find the message where this originally showed up:
While we're bikeshedding, can we pick better names than randXXX? How about random_range(), etc.? I'd rather have clarity than save a few chars. I think it's more approachable for new users to a new module. Eric.

On 9/21/2015 12:22 PM, Steven D'Aprano wrote:
On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote:
I think the redundancy in random is a mistake. The cost is confusion and extra memory load, and there need to more ofter refer to the manual, for essentially zero gain. When I read two names, I expect them to do two different things. The question is whether to propagate the mistake to a new module. -- Terry Jan Reedy

On Mon, Sep 21, 2015 at 05:32:44PM -0400, Terry Reedy wrote:
Sorry, I don't understand what you mean. Do you mean that it is a mistake for the random module to have randint and randrange? Or that it is a mistake for the secrets module to include functions that the random module includes?
If you are referring to randint versus randrange, they do do different things. Look at their signatures. randint(a, b) follows the ubiquitous API of "generate a random integer from the closed range a through b inclusive". randrange([start,] end [, step]) follows the Python practice of specifying a half-open interval, and has a more complex signature. Even though randrange is more Pythonic, I've never actually used it. randint is always what I've wanted. E.g. def die(): # Roll a die. return randint(1, 6) is far more natural than randrange(1, 7), Pythonic half-open intervals or not. But I'm satisfied that others may think differently, and by Tim's argument that excluding one or the other will be more confusing than including them both. -- Steve

On 20 September 2015 at 00:40, Tim Peters <tim.peters@gmail.com> wrote:
Given where this started, I'd suggest renaming token_alpha as "password". Beginners wouldn't necessarily associate the term "token" with the problem "I want to generate a random password" [1]. Maybe add a short recipe showing how to meet constraints like "at least 2 digits" by simply generating repeatedly until a valid password is found. For a bit of extra bikeshedding, I'd make alphabet the second, optional, parameter and default it to string.ascii_letters+string.digits+string.punctuation, as that's often what password constraints require. Or at the very least, document how to use the module functions for the common tasks we see people getting wrong. But I thought the idea here was to make doing things the right way obvious, for people who don't read documentation, so I'd prefer to see the functions exposed by the module named based on the problems they solve, not on the features they provide. (Even if that involves a little duplication, and/or a split between "high level" and "low level" APIs). Paul. [1] I'd written a spec for password() before I spotted that it was the same as token_alpha :-(

On 20 September 2015 at 20:56, Paul Moore <p.f.moore@gmail.com> wrote:
Right, I'd suggest the following breakdown. * Arbitrary password generation (also covers passphrase generation from a word list): secrets.password(result_len: int, alphabet=string.ascii_letters+string.digits+string.punctuation: T) -> T * Binary token generation ("num_random_bytes" is the arg to os.urandom, not the length of result): secrets.token(num_random_bytes: int) -> bytes secrets.token_hex(num_random_bytes: int) -> bytes secrets.token_urlsafe_base64(num_random_bytes: int) -> bytes * Serial number generation ("num_random_bytes" is the arg to os.urandom, not the length of result): secrets.serial_number(num_random_bytes: int) -> int * Constant time secret comparison (aka hmac.compare_digest): secrets.equal(a: T, b: T) -> bool * Lower level building blocks: secrets.choice(container) # Hold off on other SystemRandom methods? (I don't have a strong opinion on that last point, as it's the higher level APIs that I think are the important aspect of this proposal) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 20, 2015, at 05:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
If T is a word list--that is, an Iterable of str or bytes--you want to return a str or a bytes, not a T. Also, making it work that generically will make the code much more complicated, to the point where it no longer serves as useful sample code to rank novices. You have to extract the first element of T, then do your choosing off chain([first], T) instead of off T, then type(first).join; all of that is more complicated than the actual logic, and will obscure the important part we want novices to learn if they read the source. Also, I think for word lists, I think you'd want a way to specify actual passphrases vs. the xkcd 936 idea of using passphrases as passwords even for sites that don't accept spaces, like "correcthorsebatterystaple". Maybe via a sep=' ' parameter? That would be very confusing if it's ignored when T is string-like but used when T is a non-string-like iterable of string-likes. I think it's better to require T to be string-like than to try to generalize it, and maybe add a separate passphrase function that takes (words: Sequence[T], sep: T) -> T. (Although I'm not sure how to default to ' ' vs b' ' based on the type of T... But maybe this does need to handle bytes, so Sequence[str] is fine?)
I think randrange is definitely worth having. Even the OpenSSL and arc4random APIs provide something equivalent. If you're a novice, and following a blog post that says to use your language's equivalent of randbelow(1000000), are you going to think of choice(range(1000000))? And, if you do, are you going to convince yourself that this is reasonable and not going to create a slew of million-element lists?

On 21 September 2015 at 08:07, Andrew Barnert <abarnert@yahoo.com> wrote:
Simpler is better here, so I'll revise the text based suggestions to: secrets.password(result_len: int, alphabet=string.ascii_letters+string.digits+string.punctuation: str) -> str secrets.passphrase(result_len: int, words: Sequence[str], sep=' ') -> str
Sure, that makes sense, while still keeping the secrets module focused on integers. getrandbits() is an interesting one, as it opens up the option of "secrets.getrandbits(128).to_bytes()" as a pointlessly slower alternative to "secrets.token(128 // 8)", while "secrets.getrandbits(128)" itself would be directly equivalent to the proposed "secrets.serial_number(128 // 8)" So perhaps it makes sense to just drop the serial_number() idea and have getrandbits() instead. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Sep 20, 2015 at 11:56:06AM +0100, Paul Moore wrote:
On 20 September 2015 at 00:40, Tim Peters <tim.peters@gmail.com> wrote:
I'm not entirely sure about including password generators, since there are so many password schemes around: http://thedailywtf.com/articles/Security-by-PostIt
If we're going to offer a simple, no-brainer password generator, my vote goes for: def password(nchars=10, alphabet=string.ascii_letters+string.digits): I wouldn't include punctuation by default, as too many places still prohibit some, or all, punctuation characters. If both my understanding and calculations are correct, using ascii_letters+digits+punctuation gives us log(94, 2) = 6.6 bits of (Shannon) entropy per character, while just using letters+digits gives us log(62, 2) = 6.0 bits per character. For short-ish passwords, up to 10 characters, the extra entropy from including punctuation is less than the extra from adding an extra character: password length of 8, without punctuation: 47.6 bits password length of 8, including punctuation: 52.4 bits password length of 9, without punctuation: 53.6 bits
I agree that secrets should be providing ready-to-use functions, even if they don't solve all use-cases, not just primitive building blocks. -- Steve

Steven D'Aprano writes:
Do you really expect users to choose their own random passwords using this function? I would expect that this function would be used for initial system-generated passwords (or system-enforced random passwords), and the system would have control over the admissible set. But users who have to conform to somebody else's rules much prefer obfuscated passwords that pass strength tests to random passwords in my experience. BTW, the last time I had to set a password that didn't allow the full set of 94 printable ASCII characters, uppercase letters were forbidden (silently -- it was documented in the help but not on the password change form, I had no idea why my first three suggestions were rejected). Go figure.

On Tue, Sep 22, 2015 at 08:56:24AM +0900, Stephen J. Turnbull wrote:
I don't know. Perhaps they will. I'm not entirely sure what the use-case of this password generator is, since I'm pretty sure that "real" password generators have to deal with far more complicated rules.
Perhaps so. But then how does the application get the password to the user? Via unencypted email, like mailman does? I expect that the only use-case for an application generating a password for the user would be "low security" applications where the password has low value. But maybe others disagree. I don't really have a strong opinion one way or another. -- Steve

Steven D'Aprano writes:
On Tue, Sep 22, 2015 at 08:56:24AM +0900, Stephen J. Turnbull wrote:
Actually, I think they'll do what randrange does: take a seed from urandom() and values from a (CS)PRNG based on that seed, and throw away an out-of-range subset. Ie, they'll just generate passwords based on a simple rule about the alphabet and keep trying until they get one that passes the strength tester.
Well, I hand them out to my students in class on business cards. But an HTTPS connection could also work.
That could very well be true.

On 22 September 2015 at 09:56, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Right, the primary use case here is "web developer creating a default password for an automatically created admin account" (for example), not "end user creating a password for an arbitrary service". We don't want to overgeneralise the canned recipes - keep them dirt simple, and if folks want something slightly different, we can go the itertools path and have recipes in the documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, 22 Sep 2015 at 04:56 Nick Coghlan <ncoghlan@gmail.com> wrote:
Out of this whole proposal, this password function is the one I'm most worried about. As someone who has a project whose entire job is to generate consistent passwords, I can tell you it's a messy business that will just lead to never-ending complaints about "why didn't you include this as part of password alphabet" or "why did you choose that length". It just isn't worth the hassle when it isn't going to impact a majority of Python users. This can be something that web frameworks and other folks worry about.

On 22.09.2015 18:01, Brett Cannon wrote:
Agreed. There are too many policies and regulations for passwords out there. The stdlib is not the right place for this. But the general purpose functionality of having a function which returns a string of given length and characters from a given set is useful for building routines which implement such policies. Just don't call it a password function :-) How about: randstr(length, alphabet) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 22 2015)
2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 4 days to go 2015-10-21: Python Meeting Duesseldorf ... 29 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Tue, Sep 22, 2015 at 04:01:51PM +0000, Brett Cannon wrote:
I too feel a quiet unease about password(), although I don't have anything concrete to pin it on. I'm happy to be guided by people with more experience in this realm. What if we called it simple_password() and made it clear that it wasn't intended as an all-singing, all-dancing password generator? -- Steve

On 23 September 2015 at 03:41, Tim Peters <tim.peters@gmail.com> wrote:
I think I may have been the one to suggest it originally, since one of the things we're trying to address is the plethora of bad advice found when Googling for "python password generator", but I'm OK with dropping it from the initial version of the module, just on the general principle that adding things later is relatively easy, while taking them away is hard.
Yeah, addressing the default password generation problem should work just as well as a recipe in the secrets module documentation - I see the core goal here as being to help guide folks towards using the right random number generator for security sensitive tasks, and "use the RNG in the secrets module for random secrets, and the RNG in the random module for modelling and simulation" is a much easier story to tell than explaining the technical differences between random.Random and random.SystemRandom. Raymond Hettinger's philosophy with itertools is likely a good guiding principle here: provide a small set of useful primitives, and otherwise favour recipes in the documentation. If we end up with a "more-secrets" module on PyPI akin to "more-itertools", I think that's fine (and also provides an easy way of backporting future secrets module additions to earlier Python versions) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Sep 19, 2015 at 06:40:32PM -0500, Tim Peters wrote:
While we're bike-shedding, I don't know that I like the name randbits, since that always makes me expect a sequence of 0, 1 bits. But that's a minor point. When would somebody use randbelow(n) rather than randrange(n)? Apart from the possible redundancy between rand[below|range], all the above seem reasonable to me. Are there use-cases for a strong random float between 0 and 1? If so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, or should we offer secrets.random() and/or secrets.uniform(a, b)?
I suggest adding a default length, say nbytes=32, with a note that the default length is expected to increase in the future. Otherwise, how will the naive user know what counts as a good, hard-to-attack length? All of the above look good to me.
What is the intention for this function? To use as passwords? Other than that, it's not obvious to me what that would be used for. -- Steve

On Tue, Sep 22, 2015 at 2:10 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I would be leery of such a function, because it'd be hard to define it perfectly. Tell me, crypto wonks: If I have a function randfloat() that returns 0.0 <= x < 1.0, is it safe to use it like this: # Generate an integer 0 <= x < 12345, uniformly distributed uniform = int(randfloat() * 12345) # Ditto but on a logarithmic distribution log = math.exp(randfloat() * math.log(12345)) # Double-logarithmic loglog = math.exp(math.exp(randfloat() * math.log(math.log(12345)))) If it's producing a random *real number* 0 <= x < 1, then these should be valid. But given the differences between floats and reals, I would be worried that this kind of usage would introduce an unexpected bias. Obviously the first example is much better spelled randbelow or randrange, but for more complicated examples, grabbing a random float would look like the best way to do it. Will it? Always? Not being a crypto wonk myself, I can't know what's safe and what isn't. If Python is going to offer a new module with the (implicit or explicit) recommendation "use this for all your cryptographic entropy", it needs to be 100% reliable. ChrisA

On 22 September 2015 at 02:50, Chris Angelico <rosuav@gmail.com> wrote:
Floating point numbers and crypto don't go together - crypto is all about integers, bits, bytes, and text. Folks dealing with floating point numbers are presumably handling modelling and simulation tasks, and will want the random module, not secrets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 22, 2015 at 02:50:56AM +1000, Chris Angelico wrote:
I'm satisfied by Nick's response to you, which also implies an answer to my question: there is no good use-case for a strong random float and no need for secrets.random(). The main reason I asked is because Ruby's SecureRandom.random_number() optionally returns a float between 0 and 1. -- Steve

[Tim]
[Steven D'Aprano <steve@pearwood.info>]
While we're bike-shedding,
I refuse to bikeshed on this. I posted a concrete proposal just to enrage others into it ;-) So I'll just sketch my thinking:
Had in mind multiple audiences, including those who know a lot about Python, and those who know little. The _lack_ of randbits() would surprise the former.
When would somebody use randbelow(n) rather than randrange(n)?
For the same reason they'd use randbits(n) instead of randrange(1 << n) ;-) That is, familiarity and obviousness. randrange() has a complicated signature, with 1 to 3 arguments, and endlessly surprises newbies who _expect_, e.g., randrange(3) to return 3 at times. That's why randint() was created. "randbelow(n)" has a dirt-simple signature, and its name makes it hard to mistakenly believe `n` is a possible return value. It's exactly what's needed most often to avoid _statistical_ bias (as opposed to security weaknesses) in higher-level functions - that's why _randbelow() is a fundamental primitive in Random. So, yes, it's redundant, but I don't care. randrange(n) itself is just a needlessly expensive way to call _randbelow(n) today.
Apart from the possible redundancy between rand[below|range], all the above seem reasonable to me.
If people want minimal, just expose os.urandom() under a friendlier name, and call it done ;-)
I don't know of any "security use" for random floats. But if you want to add a recipe to the docs, point them to SystemRandom.random instead. That gets it right. `sys.maxsize` doesn't really have anything to do with floats, and the snippet you gave would produce poor-quality floats on a 32-bit box (wouldn't get anywhere near randomizing all 53 bits of float precision). On a 64-bit box, it could, e.g., return 1.0 (which random() should never return).
Fine by me!
I just noted that several of the examples in the PHP paper appeared to want to use their own alphabet. But, since that paper was about exposing security holes in PHP apps, perhaps that wasn't such a good idea to begin with ;-) Fine by me if it's dropped.

On Sep 21, 2015, at 10:51, Tim Peters <tim.peters@gmail.com> wrote:
Anyone who gets confused by randrange(3) also gets confused by range(3), and they have to learn pretty quickly. Also, randint wasn't created to allow people to put off learning that fact. It was created before randrange, because Adrian Baddeley didn't realize that Python consistently used half-open ranges, and Guido didn't notice. After 1.5 was out and someone complained that choice(range(...)) was inefficient, Guido added randrange. See the commit comment (61464037da53) which says "This addresses the problem that randint() was accidentally defined as taking an inclusive range (how unpythonic)".Also, some guy named Tim Peters convinced Guido that randint(0, 2.5) was surprisingly broken, so if he wasn't going to remove it he should reimplement it as randrange(a, b+1), which would give a clear error message. Later still (3.0), there was another discussion on removing randint, but the decision was to keep it as a "legacy alias", and change the docs to reflect that. I suppose randbelow could be implemented as an alias to randrange(a), or it could copy and paste the same type checks as randrange, but honestly, I don't think anyone needs it.

[Steven]
When would somebody use randbelow(n) rather than randrange(n)?
[Tim]
[Andrew Barnert <abarnert@yahoo.com>]
Anyone who gets confused by randrange(3) also gets confused by range(3),
True!
and they have to learn pretty quickly.
And they do. And then, in a rush, they slip up.
Goodness - you seem to believe there's virtue in remembering things in the order they actually happened. Hmm. I'll try that sometime, but I'm dubious ;-)
randbelow() is already implemented, in current Pythons, although as a class-private method (Random._randbelow()). It's randrange() that's implemented by calling ._randbelow() now. To expose it on its own, it should grow a check that its argument is an integer > 0 (as a private method, it currently assumes it won't be called with an insane argument).
but honestly, I don't think anyone needs it.
Of the four {randbelow, randint, randrange, randbits}, any can be implemented via any of the other three. You chopped what I considered to be "the real" point:
"randbelow(n)" has a dirt-simple signature, and its name makes it hard to mistakenly believe `n` is a possible return value.
That's what gives it value. Indeed, if minimality crusaders are determined to root out redundancy, randbelow is the only one of the four I'd keep.

On Sun, Sep 20, 2015 at 09:02:26AM +1000, Chris Angelico <rosuav@gmail.com> wrote:
Or, BTW, I always wanted "import this or that"!
ChrisA
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

I have discovered that there is already a "secrets" module on PyPI: https://pypi.python.org/pypi/secrets (Thanks to Robert Collins who has brought this to my attention.) Personally, I don't think we should necessarily rule out re-using the name in the standard library. Does anyone have strong feelings either way? -- Steve

I'm looking for guidance and/or consensus on two issues regarding token* functions in secrets: output type, and default values. The idea is that the module will include a few functions for generating tokens, suitable for (say) password recovery, with the following signatures: def token_bytes(nbytes:int) -> bytes: """Return nbytes random bytes.""" def token_hex(nbytes:int) -> ???? : """Return nbytes random bytes, encoded to hex""" def token_url(nbytes:int) -> ???? : """Return nbytes random bytes, URL-safe base64 encoded.""" Question one: - token_bytes obviously should return bytes. What should the others return, bytes or str? Question two: - Many people will have no idea how many bytes should be used to be confident that it will be hard for an attacker to guess. Earlier, I suggested that the three functions include default values for nbytes, and there were no objections. Do we have consensus on this, and if so, what default value should we use? Question three: - If we have default values, do we need some sort of documented exception to the general backwards-compatibility requirement? E.g. suppose we release the module in 3.6.0 with defaults of 32 bytes, and in 3.6.2 we discover that's too small and we should have used 64 bytes. Can we change the default in 3.6.3 without notice? -- Steve

On 26.09.15 16:07, Steven D'Aprano wrote:
Why don't left conversion to the user? You can provide simple receipts in the documentation. def token_hex(nbytes): return token_bytes(nbytes).hex() def token_url(nbytes): return base64.urlsafe_b64encode(token_bytes(nbytes)).rstrip(b'=') We don't know what functions are needed by users. After the secrets module is widely used, we could gather the statistics of most popular patterns and add some of them in the stdlib.
I would made the nbytes argument mandatory, and exposed recommended values in examples.
secrets.token_bytes(32) b'\xf8\x80Ejh\x1ck\xfbL\xc3l\xd3ev\x1bT\xbe\x983\x072\xbbP\xe2\xee\xf8\xdc\xaf\xe4\xddJ#'

On 26 September 2015 at 23:56, Serhiy Storchaka <storchaka@gmail.com> wrote:
We already have those patterns based on what web frameworks use - the hex token generator pattern is taken from Pyramid's token generator, while the base64 one is inspired by Django's (the latter actually uses the "choosing from an alphabet" implementation style, but the proposed base64 approach makes the same general trade-off of encoding more bits of entropy per character to make the overall output shorter). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Sep 26, 2015 at 11:07 PM, Steven D'Aprano <steve@pearwood.info> wrote:
str. The point of encoding them is to turn the entropy into some form of text, so IMO it makes sense to treat this as text.
So as I understand you, there are three options: 1) No default. Whenever you want entropy, you say how much. Simple. 2) Fixed default, covered by backward guarantee promises. 3) Variable default with an implication that using the default entropy is "secure enough" for most purposes. Can you adequately define "secure enough" across all purposes? If so, I would support that. The precise number would never be documented specifically (if you want to know what your version does, try it interactively), and then it can indeed be changed in 3.6.3 - or even without a version number bump at all (in ten years' time, Red Hat might choose to continue shipping CPython 3.6.1, but change the default entropy value). Otherwise, I would be inclined toward not having a default at all. Having one that can be changed only in 3.7 seems like the worst of both worlds - programs can't depend on the value being constant, but a security enhancement can't be done on an already-released version. ChrisA

On 27 September 2015 at 00:04, Chris Angelico <rosuav@gmail.com> wrote:
We backported PEP 466 with its "the default SSL context settings may change in maintenance releases" behaviour to the Python 2.7.5 based system Python in RHEL 7.2, so I expect we'd be OK with backporting changes to default entropy settings in the secrets module. The default settings in the system provided OpenSSL have also long been subject to change (that's one of the reasons CPython defaults to dynamically linking to OpenSSL on *nix systems). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 26, 2015, at 06:07, Steven D'Aprano <steve@pearwood.info> wrote:
Why not just use a default value of None, and document that None picks an appropriate value? Then, if it changes to a different appropriate value in 3.7 or 3.6.3 or some custom build of CPython, it hasn't broken backward compatibility.

On 26 September 2015 at 23:07, Steven D'Aprano <steve@pearwood.info> wrote:
token_hex and token_url are inspired by Pyramid's and Django's token generators (albeit with a different implementation technique in the latter case), so I'd look at what type those return. The Django token generator is django.utils.crypto.get_random_string, and returns text. The Pyramid CSRF token generator in sessions.BaseCookieSessionFactory.CookieSession.new_csrf_token also returns text However, I'm starting to think we should just pick one of the two algorithms and call it "token_str" (with the shorter output from the URL-safe base64 with any trailing "=" removed being my preference). For folks that want or need to use a different token generation algorithm, we can offer the Pyramid and Django generation algorithms as recipes in the documentation.
32 bytes (256 bits of entropy) seems like a reasonable default to me.
I like Andrew's suggestion of making the default None, and saying that passing None means we'll choose an appropriate length, which will be 32 bytes for now, but may change in maintenance releases to increase the length if we decide 256 bits of entropy isn't enough. Changes in the default length could be indicated through "versionchanged" notes in the "token_bytes" documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi all, An updated version of PEP 506 is now available: https://www.python.org/dev/peps/pep-0506/ If there are no major objections, I intend to take it to python-dev in a day or two for discussion and a ruling. Thank you to everyone who contributed to the discussion. -- Steve

This is already looking good. An additional advantage to having a new module (as opposed to changing random) is that it could easily be backported as a PyPI package, all the way back to Python 2.7. I do still think that having a concrete proposal for what should (initially) go into secrets.py would make for a more compelling PEP. On Tue, Oct 6, 2015 at 9:00 AM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

On Tue, Oct 06, 2015 at 11:26:11AM -0700, Guido van Rossum wrote:
Thanks Guido. I'm not sure how much more concrete a proposal you are looking for. The PEP now lists a sample implementation. I've described it as "pseudo-code" only to indicate that it may be incomplete (e.g. missing some imports to make it work, lacking in error checking). It also lacks docstrings and tests, but otherwise I think it is most of the module. It's actually not very large, because most of the implementation is elsewhere (e.g. the random module). Is there something else the PEP should include? Any other requested functions? There have been a few functions suggested that were requested, e.g. password generation. https://www.python.org/dev/peps/pep-0506/ -- Steve

On Tue, Oct 6, 2015 at 7:07 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Hm... I totally did not find that when I read the PEP and even just now I almost missed it! What I had expected was something that could be directly committed into the stlib, similar to the statistics.py module you contributed. But apparently it's so small it could be inlined and overlooked! :-) Maybe you could clean it up, write some tests, and publish it somewhere? (Not sure if you do GitHub. :-) One bikeshed: maybe we should keep only randrange() and drop the confusing randint()?
-- --Guido van Rossum (python.org/~guido)

Thanks! I'd accept this (and I'd reject 504 at the same time). I like the secrets name. I wonder though, should the PEP propose a specific set of functions? (With the understanding that we might add more later.) Hopefully someone on the peps team can commit your PEP in the repo. It's probably going to be PEP 506. On Sat, Sep 19, 2015 at 11:16 AM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

[Guido]
The bikeshedding on that will be far more tedious than the implementation. I'll get it started :-) No attempt to be minimal here. More-than-less "obvious" is more important: Bound methods of a SystemRandom instance .randrange() .randint() .randbits() renamed from .getrandbits() .randbelow(exclusive_upper_bound) renamed from private ._randbelow() .choice() Token functions .token_bytes(nbytes) another name for os.urandom() .token_hex(nbytes) same, but return string of ASCII hex digits .token_url(nbytes) same, but return URL-safe base64-encoded ASCII .token_alpha(alphabet, nchars) string of `nchars` characters drawn uniformly from `alphabet`

On Sun, Sep 20, 2015 at 9:40 AM, Tim Peters <tim.peters@gmail.com> wrote:
token_bytes "obviously" should return a bytes, and token_alpha equally obviously should be returning a str. (Or maybe it should return the same type as alphabet, which could be either?) What about the other two? Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)? ChrisA

[Tim Peters]
[Chris Angelico <rosuav@gmail.com>]
token_bytes "obviously" should return a bytes,
Which os.urandom() does in Python 3. I'm not writing docs, just suggesting the functions.
and token_alpha equally obviously should be returning a str.
Which part of "string" doesn't suggest "str"?
Which part of "ASCII" is ambiguous?
Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)?
And which part of "same"? ;-) Bikeshed away.; I'm outta this now ;-)

On Sun, Sep 20, 2015 at 10:19 AM, Tim Peters <tim.peters@gmail.com> wrote:
Heh :) My personal preference for shed colour: token_bytes returns a bytestring, its length being the number provided. All the others return Unicode strings, their lengths again being the number provided. So they're all text bar the one that explicitly says it's in bytes. But I'm aware others may disagree, and while "ASCII" might not be ambiguous, Py3 does still distinguish between b"asdf" and u"asdf" :) ChrisA

Chris Angelico writes:
I think that token_url may need a bytes mode, for the same reasons that bytes needs __mod__: such tokens will often be created and parsed by programs that never leave the "ASCII-compatible bytes" world.

On Sun, Sep 20, 2015 at 01:45:36PM +0900, Stephen J. Turnbull wrote:
I expect that token_url would return a string (Unicode), but since it's pure ASCII (being base64 encoded), if you want bytes, you can just call token_url().encode('ascii'). Or maybe it should return bytes, and if you want a string, you just say token_url().decode('ascii'). Out of the two, I'm very slightly leaning towards the first (Unicode by default, encode to ASCII if you want bytes) than the second. I'm very much not in favour of a "return_bytes=True" argument. -- Steve

On 20.09.2015 02:27, Chris Angelico wrote:
My personal preference would be for the number of bytes to rather reflect the entropy in the result. This would be a safer use when migrating from using e.g. token_url to token_alpha with the base32 alphabet [1], for example because you want to have better readable tokens. Speaking of which, a token_base32 would probably make sense, too. regards, jwi [1]: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt

On 22 September 2015 at 18:26, Jonas Wielicki <j.wielicki@sotecware.net> wrote:
This isn't something to decide by personal preference, it's something to be decide by considering the consequences of someone misunderstanding the API and not noticing that the result isn't what they expected. Scenario 1: API specifies bytes of entropy Consequence of misunderstanding: result is twice as long as expected, with more entropy than expected Scenario 2: API specifies length of result Consequence of misunderstanding: result is half as long as expected, with less entropy than expected Scenario 1 fails safe, scenario 2 doesn't, so for the APIs that are just reversible data transforms around os.urandom, it makes the most sense to specify the number of bytes of entropy you want. Building a password from an alphabet is different, as that involves repeated applications of secrets.choice() to the given alphabet, so you need to specify the result length directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 22, 2015, at 08:03, Nick Coghlan wrote:
Well, in principle, the length could be calculated from the number of bytes of entropy desired by using ceil(nbytes*log(256)/log(len(alphabet))), if all that matters is to "fail safe" [i.e. longer] rather than to not be surprising. Being calculated by repeated application of choice rather than some other algorithm is an implementation detail.

On Tue, Sep 22, 2015 at 10:26:13AM +0200, Jonas Wielicki wrote:
I think the answer there has to be 8. I interpret Tim's reference to "same" as that the intent of token_hex is to call os.urandom(nbytes), then convert it to a hex string. So the implementation might be as simple as: def token_hex(nbytes): return binascii.hexlify(os.urandom(nbytes)) modulo a call to .decode('ascii') if we want it to return a string. One obvious question is, how many bytes is enough? Perhaps we should set a default value for nbytes, with the understanding that the default value will increase in the future.
Oh oh, scope creep already! And so it begins... *wink* What you are referring to isn't the standard base32, which already exists in the stdlib (in base64.py, together with base16). It's is referred to by its creators as z-base-32, and the reasoning they give seems sound. It's not intended as a replacement for RFC-3458 base32, but an alternative. If the std lib already included a z-base-32 implementation, I would be happy to include token_zbase32 in the same spirit as token_base64. But it doesn't. So first you would have to convince somebody to add zbase32 to the standard library.
[1]: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
-- Steve

Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)?
[Steven D'Aprano]
Absolutely. If we're trying to "fail safe", it's the number of unpredictable source bytes that's important, not the length of the string produced. And, e.g., in the case of a URL-safe base64 encoding, passing "number of characters in the string" would be plain idiotic ;-)
Nick Coghlan already posted implementation of these things, before this thread started. They're all easy, _provided that_ you know which obscure functions to call; e.g., def token_url(nbytes): return base64.urlsafe_b64encode(os.urandom(nbytes)).decode("ascii")

On 20.09.15 02:40, Tim Peters wrote:
randbelow() is just an alias for randrange() with single argument. randint(a, b) == randrange(a, b+1). These functions are redundant and they have non-zero cost. Would not renaming getrandbits be confused?
token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? token_url(nbytes) == token_alpha( 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', nchars) ?

On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote:
But they already exist in the random module, so adding them to secrets doesn't cost anything extra. It's just a reference to the bound method of the private SystemRandom() instance: # suggested implementation import random _systemrandom = random.SystemRandom() randint= _systemrandom.randint randrange = _systemrandom.randrange etc.
They may be reasonable implementations for the functions, but simple as they are, I think we still want to provide them as named functions rather than expect the user to write things like the above. If they're doing it more than once, they'll want to write a helper function, we might as well provide that for them. -- Steve

On 2015-09-21 17:22, Steven D'Aprano wrote:
On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote:
On 20.09.15 02:40, Tim Peters wrote:
Actually, I don't think those are the semantics that Tim intended. Rather, token_hex(nbytes) would return a string twice as long as nbytes. The idea is that you want to get nbytes-worth of random bits, just encoded in a common "safe" format. Similarly, token_url(nbytes) would get nbytes of random bits then base64-encode it, not just pick nbytes characters from a URL-safe list of characters. This makes it easier to reason about how much entropy you are actually using. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 21.09.15 19:22, Steven D'Aprano wrote:
The main cost is learning and memorising cost. The fewer words you need to learn and keep in memory the better.
But why these particular alphabets are special? I expect that every application will use the alphabet that matches its needs. One needs decimal digits ('0123456789'), other needs English letters ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'), or letters and digits and underscore, or letters, digits and punctuation, or all safe ASCII characters, or all well graphical distinguished characters. Why token_hex and token_url, but not token_digits, token_letters, token_identifier, token_base32, token_base85, token_html_safe, etc?

On Mon, Sep 21, 2015, at 16:12, Serhiy Storchaka wrote:
Well, for one thing, they're trivial encodings of random bits, which is why passing in nbytes (number of random bytes) makes sense. Someone else pointed out that this makes it easier to reason about the amount of entropy involved. Token_base64 could actually, in principle, return a string with padding at the end according to base64 rules, if you ask for a number of bytes that is not a multiple of four. Base85 could likewise, for that matter, but base85 is a less common encoding.

Sorry to jump in with replying to a random message, but I can't find the message where this originally showed up:
While we're bikeshedding, can we pick better names than randXXX? How about random_range(), etc.? I'd rather have clarity than save a few chars. I think it's more approachable for new users to a new module. Eric.

On 9/21/2015 12:22 PM, Steven D'Aprano wrote:
On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote:
I think the redundancy in random is a mistake. The cost is confusion and extra memory load, and there need to more ofter refer to the manual, for essentially zero gain. When I read two names, I expect them to do two different things. The question is whether to propagate the mistake to a new module. -- Terry Jan Reedy

On Mon, Sep 21, 2015 at 05:32:44PM -0400, Terry Reedy wrote:
Sorry, I don't understand what you mean. Do you mean that it is a mistake for the random module to have randint and randrange? Or that it is a mistake for the secrets module to include functions that the random module includes?
If you are referring to randint versus randrange, they do do different things. Look at their signatures. randint(a, b) follows the ubiquitous API of "generate a random integer from the closed range a through b inclusive". randrange([start,] end [, step]) follows the Python practice of specifying a half-open interval, and has a more complex signature. Even though randrange is more Pythonic, I've never actually used it. randint is always what I've wanted. E.g. def die(): # Roll a die. return randint(1, 6) is far more natural than randrange(1, 7), Pythonic half-open intervals or not. But I'm satisfied that others may think differently, and by Tim's argument that excluding one or the other will be more confusing than including them both. -- Steve

On 20 September 2015 at 00:40, Tim Peters <tim.peters@gmail.com> wrote:
Given where this started, I'd suggest renaming token_alpha as "password". Beginners wouldn't necessarily associate the term "token" with the problem "I want to generate a random password" [1]. Maybe add a short recipe showing how to meet constraints like "at least 2 digits" by simply generating repeatedly until a valid password is found. For a bit of extra bikeshedding, I'd make alphabet the second, optional, parameter and default it to string.ascii_letters+string.digits+string.punctuation, as that's often what password constraints require. Or at the very least, document how to use the module functions for the common tasks we see people getting wrong. But I thought the idea here was to make doing things the right way obvious, for people who don't read documentation, so I'd prefer to see the functions exposed by the module named based on the problems they solve, not on the features they provide. (Even if that involves a little duplication, and/or a split between "high level" and "low level" APIs). Paul. [1] I'd written a spec for password() before I spotted that it was the same as token_alpha :-(

On 20 September 2015 at 20:56, Paul Moore <p.f.moore@gmail.com> wrote:
Right, I'd suggest the following breakdown. * Arbitrary password generation (also covers passphrase generation from a word list): secrets.password(result_len: int, alphabet=string.ascii_letters+string.digits+string.punctuation: T) -> T * Binary token generation ("num_random_bytes" is the arg to os.urandom, not the length of result): secrets.token(num_random_bytes: int) -> bytes secrets.token_hex(num_random_bytes: int) -> bytes secrets.token_urlsafe_base64(num_random_bytes: int) -> bytes * Serial number generation ("num_random_bytes" is the arg to os.urandom, not the length of result): secrets.serial_number(num_random_bytes: int) -> int * Constant time secret comparison (aka hmac.compare_digest): secrets.equal(a: T, b: T) -> bool * Lower level building blocks: secrets.choice(container) # Hold off on other SystemRandom methods? (I don't have a strong opinion on that last point, as it's the higher level APIs that I think are the important aspect of this proposal) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 20, 2015, at 05:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
If T is a word list--that is, an Iterable of str or bytes--you want to return a str or a bytes, not a T. Also, making it work that generically will make the code much more complicated, to the point where it no longer serves as useful sample code to rank novices. You have to extract the first element of T, then do your choosing off chain([first], T) instead of off T, then type(first).join; all of that is more complicated than the actual logic, and will obscure the important part we want novices to learn if they read the source. Also, I think for word lists, I think you'd want a way to specify actual passphrases vs. the xkcd 936 idea of using passphrases as passwords even for sites that don't accept spaces, like "correcthorsebatterystaple". Maybe via a sep=' ' parameter? That would be very confusing if it's ignored when T is string-like but used when T is a non-string-like iterable of string-likes. I think it's better to require T to be string-like than to try to generalize it, and maybe add a separate passphrase function that takes (words: Sequence[T], sep: T) -> T. (Although I'm not sure how to default to ' ' vs b' ' based on the type of T... But maybe this does need to handle bytes, so Sequence[str] is fine?)
I think randrange is definitely worth having. Even the OpenSSL and arc4random APIs provide something equivalent. If you're a novice, and following a blog post that says to use your language's equivalent of randbelow(1000000), are you going to think of choice(range(1000000))? And, if you do, are you going to convince yourself that this is reasonable and not going to create a slew of million-element lists?

On 21 September 2015 at 08:07, Andrew Barnert <abarnert@yahoo.com> wrote:
Simpler is better here, so I'll revise the text based suggestions to: secrets.password(result_len: int, alphabet=string.ascii_letters+string.digits+string.punctuation: str) -> str secrets.passphrase(result_len: int, words: Sequence[str], sep=' ') -> str
Sure, that makes sense, while still keeping the secrets module focused on integers. getrandbits() is an interesting one, as it opens up the option of "secrets.getrandbits(128).to_bytes()" as a pointlessly slower alternative to "secrets.token(128 // 8)", while "secrets.getrandbits(128)" itself would be directly equivalent to the proposed "secrets.serial_number(128 // 8)" So perhaps it makes sense to just drop the serial_number() idea and have getrandbits() instead. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Sep 20, 2015 at 11:56:06AM +0100, Paul Moore wrote:
On 20 September 2015 at 00:40, Tim Peters <tim.peters@gmail.com> wrote:
I'm not entirely sure about including password generators, since there are so many password schemes around: http://thedailywtf.com/articles/Security-by-PostIt
If we're going to offer a simple, no-brainer password generator, my vote goes for: def password(nchars=10, alphabet=string.ascii_letters+string.digits): I wouldn't include punctuation by default, as too many places still prohibit some, or all, punctuation characters. If both my understanding and calculations are correct, using ascii_letters+digits+punctuation gives us log(94, 2) = 6.6 bits of (Shannon) entropy per character, while just using letters+digits gives us log(62, 2) = 6.0 bits per character. For short-ish passwords, up to 10 characters, the extra entropy from including punctuation is less than the extra from adding an extra character: password length of 8, without punctuation: 47.6 bits password length of 8, including punctuation: 52.4 bits password length of 9, without punctuation: 53.6 bits
I agree that secrets should be providing ready-to-use functions, even if they don't solve all use-cases, not just primitive building blocks. -- Steve

Steven D'Aprano writes:
Do you really expect users to choose their own random passwords using this function? I would expect that this function would be used for initial system-generated passwords (or system-enforced random passwords), and the system would have control over the admissible set. But users who have to conform to somebody else's rules much prefer obfuscated passwords that pass strength tests to random passwords in my experience. BTW, the last time I had to set a password that didn't allow the full set of 94 printable ASCII characters, uppercase letters were forbidden (silently -- it was documented in the help but not on the password change form, I had no idea why my first three suggestions were rejected). Go figure.

On Tue, Sep 22, 2015 at 08:56:24AM +0900, Stephen J. Turnbull wrote:
I don't know. Perhaps they will. I'm not entirely sure what the use-case of this password generator is, since I'm pretty sure that "real" password generators have to deal with far more complicated rules.
Perhaps so. But then how does the application get the password to the user? Via unencypted email, like mailman does? I expect that the only use-case for an application generating a password for the user would be "low security" applications where the password has low value. But maybe others disagree. I don't really have a strong opinion one way or another. -- Steve

Steven D'Aprano writes:
On Tue, Sep 22, 2015 at 08:56:24AM +0900, Stephen J. Turnbull wrote:
Actually, I think they'll do what randrange does: take a seed from urandom() and values from a (CS)PRNG based on that seed, and throw away an out-of-range subset. Ie, they'll just generate passwords based on a simple rule about the alphabet and keep trying until they get one that passes the strength tester.
Well, I hand them out to my students in class on business cards. But an HTTPS connection could also work.
That could very well be true.

On 22 September 2015 at 09:56, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Right, the primary use case here is "web developer creating a default password for an automatically created admin account" (for example), not "end user creating a password for an arbitrary service". We don't want to overgeneralise the canned recipes - keep them dirt simple, and if folks want something slightly different, we can go the itertools path and have recipes in the documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, 22 Sep 2015 at 04:56 Nick Coghlan <ncoghlan@gmail.com> wrote:
Out of this whole proposal, this password function is the one I'm most worried about. As someone who has a project whose entire job is to generate consistent passwords, I can tell you it's a messy business that will just lead to never-ending complaints about "why didn't you include this as part of password alphabet" or "why did you choose that length". It just isn't worth the hassle when it isn't going to impact a majority of Python users. This can be something that web frameworks and other folks worry about.

On 22.09.2015 18:01, Brett Cannon wrote:
Agreed. There are too many policies and regulations for passwords out there. The stdlib is not the right place for this. But the general purpose functionality of having a function which returns a string of given length and characters from a given set is useful for building routines which implement such policies. Just don't call it a password function :-) How about: randstr(length, alphabet) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 22 2015)
2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 4 days to go 2015-10-21: Python Meeting Duesseldorf ... 29 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Tue, Sep 22, 2015 at 04:01:51PM +0000, Brett Cannon wrote:
I too feel a quiet unease about password(), although I don't have anything concrete to pin it on. I'm happy to be guided by people with more experience in this realm. What if we called it simple_password() and made it clear that it wasn't intended as an all-singing, all-dancing password generator? -- Steve

On 23 September 2015 at 03:41, Tim Peters <tim.peters@gmail.com> wrote:
I think I may have been the one to suggest it originally, since one of the things we're trying to address is the plethora of bad advice found when Googling for "python password generator", but I'm OK with dropping it from the initial version of the module, just on the general principle that adding things later is relatively easy, while taking them away is hard.
Yeah, addressing the default password generation problem should work just as well as a recipe in the secrets module documentation - I see the core goal here as being to help guide folks towards using the right random number generator for security sensitive tasks, and "use the RNG in the secrets module for random secrets, and the RNG in the random module for modelling and simulation" is a much easier story to tell than explaining the technical differences between random.Random and random.SystemRandom. Raymond Hettinger's philosophy with itertools is likely a good guiding principle here: provide a small set of useful primitives, and otherwise favour recipes in the documentation. If we end up with a "more-secrets" module on PyPI akin to "more-itertools", I think that's fine (and also provides an easy way of backporting future secrets module additions to earlier Python versions) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Sep 19, 2015 at 06:40:32PM -0500, Tim Peters wrote:
While we're bike-shedding, I don't know that I like the name randbits, since that always makes me expect a sequence of 0, 1 bits. But that's a minor point. When would somebody use randbelow(n) rather than randrange(n)? Apart from the possible redundancy between rand[below|range], all the above seem reasonable to me. Are there use-cases for a strong random float between 0 and 1? If so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, or should we offer secrets.random() and/or secrets.uniform(a, b)?
I suggest adding a default length, say nbytes=32, with a note that the default length is expected to increase in the future. Otherwise, how will the naive user know what counts as a good, hard-to-attack length? All of the above look good to me.
What is the intention for this function? To use as passwords? Other than that, it's not obvious to me what that would be used for. -- Steve

On Tue, Sep 22, 2015 at 2:10 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I would be leery of such a function, because it'd be hard to define it perfectly. Tell me, crypto wonks: If I have a function randfloat() that returns 0.0 <= x < 1.0, is it safe to use it like this: # Generate an integer 0 <= x < 12345, uniformly distributed uniform = int(randfloat() * 12345) # Ditto but on a logarithmic distribution log = math.exp(randfloat() * math.log(12345)) # Double-logarithmic loglog = math.exp(math.exp(randfloat() * math.log(math.log(12345)))) If it's producing a random *real number* 0 <= x < 1, then these should be valid. But given the differences between floats and reals, I would be worried that this kind of usage would introduce an unexpected bias. Obviously the first example is much better spelled randbelow or randrange, but for more complicated examples, grabbing a random float would look like the best way to do it. Will it? Always? Not being a crypto wonk myself, I can't know what's safe and what isn't. If Python is going to offer a new module with the (implicit or explicit) recommendation "use this for all your cryptographic entropy", it needs to be 100% reliable. ChrisA

On 22 September 2015 at 02:50, Chris Angelico <rosuav@gmail.com> wrote:
Floating point numbers and crypto don't go together - crypto is all about integers, bits, bytes, and text. Folks dealing with floating point numbers are presumably handling modelling and simulation tasks, and will want the random module, not secrets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 22, 2015 at 02:50:56AM +1000, Chris Angelico wrote:
I'm satisfied by Nick's response to you, which also implies an answer to my question: there is no good use-case for a strong random float and no need for secrets.random(). The main reason I asked is because Ruby's SecureRandom.random_number() optionally returns a float between 0 and 1. -- Steve

[Tim]
[Steven D'Aprano <steve@pearwood.info>]
While we're bike-shedding,
I refuse to bikeshed on this. I posted a concrete proposal just to enrage others into it ;-) So I'll just sketch my thinking:
Had in mind multiple audiences, including those who know a lot about Python, and those who know little. The _lack_ of randbits() would surprise the former.
When would somebody use randbelow(n) rather than randrange(n)?
For the same reason they'd use randbits(n) instead of randrange(1 << n) ;-) That is, familiarity and obviousness. randrange() has a complicated signature, with 1 to 3 arguments, and endlessly surprises newbies who _expect_, e.g., randrange(3) to return 3 at times. That's why randint() was created. "randbelow(n)" has a dirt-simple signature, and its name makes it hard to mistakenly believe `n` is a possible return value. It's exactly what's needed most often to avoid _statistical_ bias (as opposed to security weaknesses) in higher-level functions - that's why _randbelow() is a fundamental primitive in Random. So, yes, it's redundant, but I don't care. randrange(n) itself is just a needlessly expensive way to call _randbelow(n) today.
Apart from the possible redundancy between rand[below|range], all the above seem reasonable to me.
If people want minimal, just expose os.urandom() under a friendlier name, and call it done ;-)
I don't know of any "security use" for random floats. But if you want to add a recipe to the docs, point them to SystemRandom.random instead. That gets it right. `sys.maxsize` doesn't really have anything to do with floats, and the snippet you gave would produce poor-quality floats on a 32-bit box (wouldn't get anywhere near randomizing all 53 bits of float precision). On a 64-bit box, it could, e.g., return 1.0 (which random() should never return).
Fine by me!
I just noted that several of the examples in the PHP paper appeared to want to use their own alphabet. But, since that paper was about exposing security holes in PHP apps, perhaps that wasn't such a good idea to begin with ;-) Fine by me if it's dropped.

On Sep 21, 2015, at 10:51, Tim Peters <tim.peters@gmail.com> wrote:
Anyone who gets confused by randrange(3) also gets confused by range(3), and they have to learn pretty quickly. Also, randint wasn't created to allow people to put off learning that fact. It was created before randrange, because Adrian Baddeley didn't realize that Python consistently used half-open ranges, and Guido didn't notice. After 1.5 was out and someone complained that choice(range(...)) was inefficient, Guido added randrange. See the commit comment (61464037da53) which says "This addresses the problem that randint() was accidentally defined as taking an inclusive range (how unpythonic)".Also, some guy named Tim Peters convinced Guido that randint(0, 2.5) was surprisingly broken, so if he wasn't going to remove it he should reimplement it as randrange(a, b+1), which would give a clear error message. Later still (3.0), there was another discussion on removing randint, but the decision was to keep it as a "legacy alias", and change the docs to reflect that. I suppose randbelow could be implemented as an alias to randrange(a), or it could copy and paste the same type checks as randrange, but honestly, I don't think anyone needs it.

[Steven]
When would somebody use randbelow(n) rather than randrange(n)?
[Tim]
[Andrew Barnert <abarnert@yahoo.com>]
Anyone who gets confused by randrange(3) also gets confused by range(3),
True!
and they have to learn pretty quickly.
And they do. And then, in a rush, they slip up.
Goodness - you seem to believe there's virtue in remembering things in the order they actually happened. Hmm. I'll try that sometime, but I'm dubious ;-)
randbelow() is already implemented, in current Pythons, although as a class-private method (Random._randbelow()). It's randrange() that's implemented by calling ._randbelow() now. To expose it on its own, it should grow a check that its argument is an integer > 0 (as a private method, it currently assumes it won't be called with an insane argument).
but honestly, I don't think anyone needs it.
Of the four {randbelow, randint, randrange, randbits}, any can be implemented via any of the other three. You chopped what I considered to be "the real" point:
"randbelow(n)" has a dirt-simple signature, and its name makes it hard to mistakenly believe `n` is a possible return value.
That's what gives it value. Indeed, if minimality crusaders are determined to root out redundancy, randbelow is the only one of the four I'd keep.
participants (19)
-
Andrew Barnert
-
Brett Cannon
-
Chris Angelico
-
Eric V. Smith
-
Guido van Rossum
-
John Wong
-
Jonas Wielicki
-
M.-A. Lemburg
-
Matthias Bussonnier
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Random832
-
Robert Kern
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Tim Peters