Globally configurable random number generation

This is an expansion of the random module enhancement idea I previously posted to Donald's thread: https://mail.python.org/pipermail/python-ideas/2015-September/035969.html I'll write it up as a full PEP later, but I think it's just as useful in this form for now. = Defining the problem = We're moving into an era where the easiest way to publish software is as a web application, with "deployment" to client systems done at runtime via a web browser. It's regularly the case that "learn to program" classes (especially those aimed at adults picking up programming for the first time) will introduce folks to both a web development framework and how to deploy web applications on a developer focused service with a free hosting tier, like Heroku or OpenShift. It's also the case that we live in an era where there's a lot of well-intentioned-but-actually-bad advice on the internet when it comes to generating security sensitive tokens, and the folks receiving that advice through forums like Stack Overflow aren't necessarily ever going to see the "don't do that" guidance in the standard library's random module documentation, or the docs for the cryptography library, or the docs for a web framework like Flask, Django or Pyramid. One of the ways we know many of the folks doing web development often don't take admonitions in documentation seriously is because one of the most popular web servers for Python on these kinds of services is Django's "runserver", even though Django's docs specifically say only to use that for local development. It isn't OK to say "the developers deserve the consequences that come to them" as in many case, it isn't the developers that suffer the consequences, but the users of their applications. One reason we know weak RNGs can be a problem in practice is because the same kind of concern exists in PHP web applications, and https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_... shows how the relative predictability of password reset tokens can be used to compromise administrator accounts. Rather than playing whackamole with individual web applications (many of which will be written by inexperienced developers), or attempting to demonstrate that a deterministic PRNG is "secure enough" for these use cases (when the research on PHP and deterministic PRNGs in general indicates that it isn't), it is proposed to migrate Python to a default random implementation that *is* known to be secure enough for these kinds of use cases. At the same time, deterministic random number generation is still desirable in many situations, and we also don't want to require that folks learning Python in the future be required to take a crash course in web application security theory first. Thus, it is also proposed that the abstraction used to present these differences to end users minimise the references to the underlying security concepts. A key outcome of this proposal is that it will retroactively upgrade a lot of existing instructions on the internet for generating default passwords and other sensitive tokens in Python from "actively harmful" to "not necessarily ideal, but at least not wrong if you're using Python 3.6+". This *is* a compatibility break for the sake of correcting default behaviours that are fine when developing applications for local use, but problematic from a network service security perspective, just as happened with the introduction of hash randomisation. Unlike the hash randomisation change, this one is readily addressed in old versions on a case by case basis, so it is only proposed to make the change in a future feature release of Python, not in any current maintenance releases. = Core abstraction = The core concept of this proposal involves classifying random number generators in Python as follows: * seedable * seedless * system These terms are chosen to make sense to folks that have *no idea* about the way different kinds of random number generator work and how that affects their security properties, but do know whether or not they need to be able to pass in a particular fixed seed in order to regenerate the same series of outputs. The guidance to Python users is then: * we use the seedless RNG by default as it provides the best balance of speed and security * if you need to be able to exactly reproduce output sequences, use the seedable RNG * if you know you're doing security sensitive work, use the system RNG directly to eliminate Python's seedless RNG as a potential source of vulnerabilities Importantly, there are relatively simple answers to the following two questions (which could be added to the Design FAQ): Q: Why isn't the seedable RNG the default random implementation (any more)? A: The same properties that make it possible to provide an explicit seed to the seedable RNG and get a predictable series of outputs make it inappropriate for tasks like generating session IDs and password reset tokens in web applications. Since folks continued to use the default RNG for those cases, even after years of the core development team, web framework developers and security engineers saying "Don't do that, use the system RNG instead", we eventually changed the default behaviour to just make those cases OK. Q: Why isn't the system RNG the default implementation? A: Due to the way operating systems work, calling into the kernel to get a random number is always going to be slower than generating one within the Python runtime. The default seedless generator provides most of the same benefits as using the system RNG directly, but is an order of magnitude faster as it doesn't need to call into the kernel as often. = Proposed change for Python 3.6 = * add a random.SeedlessRandom API that omits the seed(), getstate() and setstate() methods and uses a cryptographically secure PRNG internally (such as the ChaCha20 algorithm implemented by OpenBSD) * rename random.Random to random.SeedableRandom * make random.Random a subclass of SeedableRandom that deprecates seed(), getstate() and setstate() * deprecate the seed(), getstate() and setstate() methods on SystemRandom * expose the global SeedableRandom instance as random.seedable_random * expose a global SeedlessRandom instance as random.seedless_random * expose a global SystemRandom instance as random.system_random * provide a random.set_default_instance() API that makes it possible to specify the instance used by the module level methods * the module level seed(), getstate(), and setstate() functions will throw RuntimeError if the corresponding method is missing from the default instance In 3.6, "random.set_default_instance(random.seedless_random)" will opt in to the CSPRNG when using the module level functions process wide, while "from random import seedless_random as random" will do so on a module by module basis. "from random import system_random as random" also becomes available as a simple upgrade path for security sensitive modules. Appropriate helpers would be added to the six and future projects to allow single source Python 2/3 projects to easily cope with the change in behaviour when using the seeded RNG for its intended purposes. For many projects, compatibility code will consist of the following lines in a compatibility module: try: from random import seedable_random as random except ImportError: import random It would also be desirable for the seedless random number generator to be made available as a PyPI package for use on older Python versions. = Proposed change for Python 3.7 = * random.Random becomes an alias for random.SeedlessRandom * the default instance changes to be random.seedless_random In 3.7, "random.set_default_instance(random.seedable_random)" will opt back in to the deterministic PRNG when using the module level functions process wide, while "from random import seedable_random as random" will do so on a module by module basis. = Seedable random number generation = This is what we have today. The MT random implementation supports explicit seeding, state retrieval, and state restoration. It doesn't automatically mix in additional system entropy as it operates. This is the right choice for use cases like computer games, map generation, and randomising the order of test execution, as in these situations, it's desirable to be able to reproduce a past sequence exactly. = Seedless random number generators = This is the key proposed new addition: a cryptographically secure, non-deterministic, userspace PRNG. It's faster than the system RNG as it avoids the need to make a system API call. The "seedless" name comes from the fact that the inability to feed in a fixed seed is the most obvious API difference relative to deterministic RNGs, and hence provides a mental hook for people to remember which is which, without needing to know the relevant background security theory (which is arcane enough to be opaque even to developers with decades of experience and hence isn't something we want to be inflicting on folks in the process of learning to program). = System random number generator = The only proposed change here is providing a default instance to enable the "from random import system_random as random" pattern. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan <ncoghlan@...> writes:
Sorry, -1 on this. Theo proposed a simple API like: arc4random() arc4random_uniform() Go has: https://golang.org/pkg/math/rand/ https://golang.org/pkg/crypto/rand/ These are sane, unambiguously named APIs. I wish Python had more of those. If people must have their CSPRNG, please let's leave the random module alone and introduce a crypto module like Go. Stefan Krah

On 14/09/15 15:43, Stefan Krah wrote:
In a perfect world, every programmer would know the difference between PRNGs for numerical simulation and entropy sources for cryptography. Those that do will still use os.urandom or just read from /dev/urandom or /dev/random for cryptography. Those that do know the need for mathematical precision when simulating samples from a given distribution. Those that do know the need for a fixed seed because a Monte Carlo simulation should be exactly reproducible in a scientific context. The problem is users who have no idea that the Mersenne Twister is constructed for producing random deviates that are great for numerical simulation -- and that the Mersenne Twister is very weak for cryptography. Using os.urandom as default entropy source has the opposite effect. It is not constructed for being mathematically precise, it is slow, and it does not allow for a fixed seed and exact reproducibility. Whatever we do there are someone who are going to shoot their leg off. A crypto module would perhaps be great, but it does not solve anything. Someone who uses random.random instead of os.urandom is likely to use random.random instead of a PRNG in a crypto module as well. Mostly this is about propagating knowledge of random number generators to new developers and science students. Sturla

Sturla Molden <sturla.molden@...> writes:
The sentiments in the original thread (which has now been renamed two times), seem to have been lost: Theo: ===== "chacha arc4random is really fast. if you were to create such an API in python, maybe this is how it will go: say it becomes arc4random in the back end. i am unsure what advice to give you regarding a python API name. in swift, they chose to use the same prefix "arc4random" (id = arc4random(), id = arc4random_uniform(1..n)"; it is a little bit different than the C API. google has tended to choose other prefixes. we admit the name is a bit strange, but we can't touch the previous attempts like drand48.... I do suggest you have the _uniform and _buf versions. Maybe apple chose to stick to arc4random as a name simply because search engines tend to give above average advice for this search string?" Theo: ===== "that opens /dev/urandom or uses the getrandom system call depending on system. it also has support for the windows entropy API. it pulls data into a large buffer, a cache. then each subsequent call, it consumes some, until it rus out, and has to do a fresh read. it appears to not clean the buffer behind itself, probably for performance reasons, so the memory is left active. (forward secrecy violated) i don't think they are doing the best they can... i think they should get forward secrecy and higher performance by having an in-process chacha. but you can sense the trend." So the original thread is about: ================================ - Inplementing a possibly faster (and allegedly more secure) chacha20-random. - Possibly using the naming scheme of Swift. - Being careful with os.urandom(), as there are some pitfalls that the OpenBSD libcrypto (allegedly) solves. I see nothing about magically repurposing random.random() functions. Stefan Krah

On September 14, 2015 at 10:50:46 AM, Stefan Krah (skrah@bytereef.org) wrote:
I've actually talked to Theo and I believe he's read my summary of his proposal and he didin't mention anything amiss. He did mention that he wasn't aware of the number of APIs that we had in random.py that built ontop of the RNG. As far as I can tell from talking to him, he focused on that particular thing because he became aware of the issue via the recent issue with getentropy on Solaris, and I believe he assumed that our APIs were similar to C in what we provided. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On September 14, 2015 at 11:16:48 AM, Stefan Krah (skrah@bytereef.org) wrote:
Well, he's expressed that he's unlikely to participate in this discussion because he doesn't use Python and thus doesn't have any skin in the game. He just saw an opportunity to try and improve the "ambient" security of applications written in Python and thought he'd reach out to see if there was any interest in it on our end. I'd ask him personally, but given that I'm "biased" you'll have to manage to ask him on your own. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On September 14, 2015 at 9:33:27 AM, Nick Coghlan (ncoghlan@gmail.com) wrote:
I don't love the "seedable" and "seedless" names here, but I don't have a better suggestion for the userspace CSPRNG one because it's security properties are a bit nuanced. People doing security sensitive things like generating keys for cryptography should still use something based on os.urandom, so it's mostly about providing a safety net that will "probably" [1] be safe. Probably something like random.ProbablySecureRandom is a bad name :)
* provide a random.set_default_instance() API that makes it possible to specify the instance used by the module level methods
I think this particular bit is a bad idea, it makes an official API that makes it really hard for an auditor to come into a code base and determine if the use of random is correct or not. Given that going back to the MT based algorithm is fairly trivial (and could even be mechanical) what's the long ter benefit here? [1] The safety of userspace CSPRNGs is a debated topic by security experts, however I think any of them would be hard pressed to think it's a bad idea to have a userspace CSPRNG as a safety net for folks who, for whatever reason, didn't know to use os.urandom/random.SystemRandom and instead to make them more likely to be safe, or at the very least, harder to attack. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, Sep 14, 2015, at 09:51, Donald Stufft wrote:
It's no worse than what OpenBSD itself has done with the C api for rand/random/rand48. At some point you've got to balance it with the realities of making backwards compatibility easy to achieve for the applications that really do need it with either a few lines change or none at all. And anyway, the auditor would *know* that if they see a module-level function called they need to do the extra work to find out what mode the module-level RNG is in (i.e. yes/no is there anywhere at all in the codebase that changes it from the secure default?) It's not an "official API", it's an escape hatch for allowing a minimal change to existing code that needs the old behavior.
Given that going back to the MT based algorithm is fairly trivial (and could even be mechanical) what's the long ter benefit here?
I don't see how it's trivial/mechanical, *without* the exact feature being discussed.

Random832 <random832@...> writes:
It's no worse than what OpenBSD itself has done with the C api for rand/random/rand48.
These functions aren't used widely in scientific computing.
It's not an "official API", it's an escape hatch for allowing a minimal change to existing code that needs the old behavior.
It's yet another case split to keep in the back of one's mind. Stefan Krah

On 14/09/15 16:45, Random832 wrote:
It is not just a matter of security versus determinism. It is also a matter of numerical accuracy. The distribution of the output sequence must be proven and be as close as possible to the distribution of interest. MT19937 is loved by scientists because it emulates sampling from the uniform distribution so well. Faster alternatives exist, more secure alternatives too. But when we simulate a stochastic process we also care about numerical accuracy. MT19937 is considered state of the art for this purpose. It does not seem that the issue of numerical accuracy is appreciated in this debate. Cryptographers just want random bits that cannot be predicted. Numerical accuracy is not their primary concern. If you replace MT19937 with "something more secure" you likely also loose its usefulness for scientific computing. Sturla

On September 14, 2015 at 11:40:53 AM, Sturla Molden (sturla.molden@gmail.com) wrote:
Nobody is suggesting to remove MT, just make it so you have to explicitly opt-in to it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 2015-09-14 16:39, Sturla Molden wrote:
Actually, it's well behind the state of the art as it fails BigCrush. The proposed alternative does better in this regard. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 14/09/15 17:50, Robert Kern wrote:
Actually, it's well behind the state of the art as it fails BigCrush. The proposed alternative does better in this regard.
Is that one of the PCGs? Or Arc4Random, ChaCha20 or XorShift64/32? The three latter fails on k-dimensional equi-distribution, MT does not. Some of the PCGs do too, but some should be as good as MT. Not sure if that is worse or better than failing some parts of BigCrush. Which PCG would you recommend, by the way? Sturla

On 2015-09-14 17:56, Sturla Molden wrote:
The alternative proposed in this thread is ChaCha20.
There is a reason that exact k-dimensional equidistribution for such a large k is not tested even in BigCrush. It's a nifty feature useful in a few applications, but not for simulations. It is important that the PRNG is *well*-distributed, but exact equidistribution is mostly neither here nor there. It can be trivially implemented by statistically bad PRNGs, like a simple counter. Obtaining it requires implementing an astronomically long period (and consequent growth in the state size) that adds significant costs without any realizable improvement to the statistics. If I'm drawing millions of numbers, k=623 is not much better than k=1, provided that the generator is otherwise good.
Which PCG would you recommend, by the way?
Probably pcg64 (128-bit state, 64-bit output). Having the 64-bit output is nice so you only have to draw one value to make a uniform(0,1) double, and a period of 2**128 is nice and roomy without being excessively large. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On September 14, 2015 at 10:16:49 AM, Random832 (random832@fastmail.com) wrote:
Easily, you change your: import random to from random import seeded_random as random And then all of your code that used random.foo works without any further modification. If you were importing the individual functions, you can either change your code to use random.foo or you can do: from random import seeded_random as _random random = _random.random randint = _random.randint If you want to do this in cross language code, then you can combine this with a try: except block like: try: from random import seeded_random as random except ImportError: import random Either way, trivial and mechanical. It doesn't require much thought, it just requires some pretty simple changes. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, Sep 14, 2015 at 10:16:00AM -0400, Random832 wrote:
Of course it is an official API. It's a documented public function (or rather, it will be if Nick's suggest is accepted) in the standard library. That makes it an official API. The *whole purpose of it* is to give a standard API for what Python can already do: monkey-patch the random module. E.g. we can do this now: import random random.random = lambda: 9 random.uniform = lambda a, b: return 9 but if you do that, you know you're on thin ice. I don't entirely agree with everything Donald has said, but I agree that providing this API would be harmful. It would mean that any arbitrary module you import (directly or indirectly) could swap out the secure CSPRNG you're relying on for an insecure PRNG, and you would never know. (Yes, they could do that now, this is Python. But they won't, because there's no official API for swapping out the default PRNG.) -- Steve

Probably something like random.ProbablySecureRandom is a bad name :)
Yes but unsecureRandom for the unsecure one (which obviously is insecure) is not unreasonable. (unsafe can be shorter) -- M Also seedless does not mean secure: https://xkcd.com/221/ <https://xkcd.com/221/> :-)

On 14.09.15 16:32, Nick Coghlan wrote:
* make random.Random a subclass of SeedableRandom that deprecates seed(), getstate() and setstate()
I would make seed() and setstate() to switch to seedable algorithm. If you don't use seed() or setstate(), it is not important that the algorithm is changed. If you use seed() or setstate(), you expect reproducible behavior.
* random.Random becomes an alias for random.SeedlessRandom
This breaks compatibility with the data pickled in older Python.
What to do with "from random import random" deep in third-party module? It caches random.random in the module dictionary.

On 9/14/2015 11:04 AM, Serhiy Storchaka wrote:
An alternate proposal is to initialize the module so that random uses a something more 'secure' than MT. Then...
I would make seed() and setstate() to switch to seedable algorithm.
In particular, to MT. Also switch on a getstate() call.
There is more than one possible internal implementation. But for any of them, the change should be invisible to callers. (Representations and introspection results would be a different matter.) I understand that the docs currently say that random uses MT. But I wonder if any version of the above could be used in current versions, so as to immediately "upgrade a lot of existing instructions on the internet" and code that follows such instructions. -- Terry Jan Reedy

On 14 September 2015 at 14:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'll write it up as a full PEP later, but I think it's just as useful in this form for now.
Please provide costs and benefits. At the moment, the proposal takes an implied stance that fixing security issues warrants disruption to users (and in particular to users with *no* security requirements). I appreciate that there's the usual 2-release long deprecation process, and that the only visible disruption is to the state/seed APIs. But I'd like to see that expanded on a little more, precisely to convince those people who *aren't* automatically convinced by "there's a security issue" arguments, that the trade-offs have been properly analyzed. For example, in terms of costs: 1. The module API is more complex and harder to teach. 2. The new API deliberately introduces a global state setting API. 3. People using "from random import choice" can't use the "simple upgrade" recommendation "from random import system_random as random". The benefits seem to be solely: 1. Users of code written based on bad advice will be protected from the consequences (as long as the code runs on a sufficiently new version of Python). (I'm serious - that's how the benefit statement reads to me. Although I agree it'd be nice if I worded it a bit more unemotionally, I genuinely don't know how to without either overstating it or making it a paragraph long...) I'm not trying to say that the cost/benefit analysis doesn't justify the change (I'm currently unconvinced, and trying to remain open in spite of the over-abundance of security rhetoric in the thread), just that it's a key point of the debate here, and it's not captured in your summary/pre-PEP. Paul

On Mon, Sep 14, 2015 at 8:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One problem that people (I can't remember who) have pointed out about random.set_default_instance() is that any imported module in the same process can change the random from secure -> insecure at a distance. One way to solve this is to ensure that set_default_instance() can be called only once; if it is called more than once, a RuntimeError could be raised. I think the logging module does something like this for setting the logging level? I think the only way that this really would make sense would be to make set_default_instance() be called before any of the module level functions. The first time a module level function is called, you could default to selecting the CSRNG. If you call one of the seeded API functions (getstate, setstate, seed) before the other module-level functions the instance could default to the deterministic RNG, but that might be confusing to debug. I could imagine people getting really confused if this program worked: import random random.seed(1234) random.random() but this program failed: import random random.random() random.seed(1234) # would raise a RuntimeError random.random() # would not be reached I'm not crazy about the idea of changing the default instance based on the first module level function called; that might be a terrible idea. But I _do_ think it's a good idea not to let the default instance change throughout the life of the program.

On Sep 14, 2015, at 06:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
Since I suggested the set_default_instance and the singleton instances that can be imported in place of the module, I'm obviously happy with those parts. However, I think you still haven't solved the problem with my proposal that you set out to solve. The main difference is that I wanted to deprecate (and eventually make it an error) to use the top-level functions without calling set_default_instance, while you want to allow them and gradually shift the semantics from using the seeded to the seedless PRNG. As I understand it, the reason for this is that you want to make it possible for someone to write "from random import choice", and not get a warning or error telling them they need to call set_default_instance or import one of the singletons instead. But then you're encouraging people to write code that's broken in 3.6 and earlier--and that's also potentially broken in 3.7 if used together with any code that calls set_default_instance (because that can't retroactive fix anything from-imported before the call). So, it takes 18 more months to provide any benefit, and it adds an extra cost. Maybe the suggestion of not allowing set_default_instance to be called more than once and/or after any other functions is sufficient, but I'm not sure that it is. What about this change: replace the three singleton instances with three modules, so we can tell people (and 2to3 and similar mechanical tools) to replace "from random import choice" with "from random.seedless_random import choice"? Would that be acceptable for novices? (And, If so, would that mean we no longer need the set_default_instance and can just flat-out deprecate the top-level functions in random?) If that's not sufficient because the name is too long/too nested, could we just flatten the names out, so it's "from seedless_random import choice", and then the deprecation process is just making random an alias for seeded_random and then switching it to seedless_random later? (I don't think there's any official cross-platform way to alias modules like that, and having to do some ugly sys.modules munging to force them to be the same instance, or using a special module finder just for this case, etc. is obviously ugly, but it may be worth doing anyway.) One nice advantage of this is that it's dead-easy to backport; if I need seeded_random, I can write code that works for 2.6+/3.3+ by just spending on seeded_random from PyPI...

On 15 September 2015 at 09:10, Andrew Barnert <abarnert@yahoo.com> wrote:
This entire problem is one that I put in the "fix it eventually" category, rather than "fix it urgently" - folks really are better off learning to use things like cryptography.io for security sensitive software, so this change is just about harm mitigation given that it's inevitable that a non-trivial proportion of the millions of current and future Python developers won't do that. Since there's really only one transition I want to enable (seedable -> seedless as the default RNG), I now think the "switch implicitly as needed" is a better idea than a permanent support API for switching the default instance - I'd just add a deprecation warning to that behaviour, with the intent of removing it some time after 2.7 goes EOL. I also realised based on Paul's comments that we really do want "random.seedable" and "random.seedless" submodules, since that allows proper interaction with the import system in constructs like "from random.seedable import randint" That would make the proposed change for Python 3.6: * add a random.SeedlessRandom API that omits the seed(), getstate() and setstate() methods and uses a cryptographically secure PRNG internally (such as the ChaCha20 algorithm implemented by OpenBSD) * deprecate the seed(), getstate() and setstate() methods on SystemRandom * convert random to a pseudo-module with "seedless", "seedable" and "system" submodules (keeping most code in __init__ for easy pickle compatibility) * these would each work like the current top-level random module - a default instance, with bound methods as module level callables * random._inst would be an alias for random.seedless._inst by default * the top level random functions would change to be functions lazily looking up methods on random._inst, rather than bound methods * if you call the module level seed(), getstate(), or setstate() methods, and random._inst is set to random.seedless._inst, it will issue a deprecation warning recommending the direct use of "random.seedable" and switch random._inst to refer to random.seedable._inst instead Compared to my original proposal, the seedable MT RNG retains the random.Random name, so any code already using explicit instances is entirely unaffected by the proposed change. This means the only code that will receive a deprecation warning is code calling the module level seed(), getstate() and setstate() functions, and that warning will just recommend importing "random.seedable" rather than importing "random". The API used to replace the default instance at runtime for backwards compatibility purposes becomes private rather than public, so we only need to support our specific reasons for doing that, rather than supporting it as a general feature. Future security audits would focus on the use of the module "seed()", "getstate()" and "setstate()" functions (since they'd trigger the deterministic RNG process wide), and it would also still be encouraged to use random.SystemRandom() or os.urandom() for security sensitive use cases (since that's both version independent, and immune to other modules modifying the active default RNG). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 September 2015 at 12:30, Random832 <random832@fastmail.com> wrote:
Yes, with the revised proposal, only the module level functions would change their behaviour to use a CSPRNG by default. If you trawl the various cryptographically unsound password generation recipes, they're almost all using the module level functions, so changing the meaning of random.Random itself would add a lot of additional pain for next to no gain. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 14, 2015, at 20:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
That definitely makes things simpler. The only part of my set_default_instance patch that was at all tricky was how to make sure Random instances worked the same as top-level functions, but still providing a way to explicitly select one (hence renaming the base class to DeterministicRandom, making a new subclass UnsafeRandom that subclasses it and added the warning, and making both Random and the top-level functions point at that). If we don't need that, then your simpler solution makes more sense. Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore. First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter? One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5? For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. The fact that many apps will never actually issue a deprecation warning or any other signal that anything is changing may be leaning over too far toward convenience. I realize the benefit of having books and course materials written for 3.4 continue to work in 3.8, but really, if those books are giving people bad ideas, removing any incentive for anyone to change the next edition may not be a good idea. And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted".

On 15 September 2015 at 14:03, Andrew Barnert <abarnert@yahoo.com> wrote:
Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore.
First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter?
If folks are in a situation where the performance impact of the additional layer of indirection is a problem, they can switch to using random.Random explicitly, or import from random.seedable rather than the top level random module.
One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5?
This isn't an applicable concern, as we already provide zero runtime protections against hostile monkeypatching of other modules (by design choice). You can subvert even os.urandom in a hostile plugin: def not_random(num_bytes): return b'A' * num_bytes import os os.urandom = not_random Once "potentially hostile code running in the current process" is part of your threat model, CPython is out of the running, and even PyPy's sandboxing capabilities rely on running the potentially hostile code in a separate process. IronPython and Jython can rely on CLR/JVM sandboxing, but that's still a case of delegating the problem to the host platform, rather than trying to solve it at the Python level.
For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. The fact that many apps will never actually issue a deprecation warning or any other signal that anything is changing may be leaning over too far toward convenience. I realize the benefit of having books and course materials written for 3.4 continue to work in 3.8, but really, if those books are giving people bad ideas, removing any incentive for anyone to change the next edition may not be a good idea.
Forcing people to make choices they're ill-equipped to make just because we think they "should" know enough to make a wise decision is one of the leading causes of user hostile software (consider the respective user experiences of a HTTP site and a HTTPS site with a self-signed certificate). People are busy, and life is full of decisions that need to be made where there's no good default, so when we're able to deliver a good default that fails *noisily* when it's the wrong answer, that's what we should be aiming for.
And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted".
Given the general lack of investment in sustaining engineering for scientific software, I think the naysayers are right on that front, which is why I switched my proposal to give them a transparent upgrade path - I was originally thinking primarily of the educational and gaming use cases, and hadn't considered randomised simulations in the scientific realm. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 September 2015 at 05:53, Nick Coghlan <ncoghlan@gmail.com> wrote:
The same problem can occur the other way round. Suppose that I want my whole app to be seedable but I have many modules that use "from random import choice" etc. Then in my top-level script I call random.seed and get an error under Python 3.6. So I switch that to use random.seedable but potentially end up with a mix of modules using random.seedable.choice and random.choice. It may seem under certain conditions that my app is properly seeded while not under others depending on which particular functions get called. The docs explicitly state that I will always be able to globally seed the module so that my entire non-threaded application is reproducible when using the top-level functions (even across different Python versions for random.random). So it's entirely reasonable to expect that people are using this behaviour and will want a way to revert to it which in the general case would need something like set_default_instance so that every module (including those I don't write myself) uses the same generator.
It might not be a case of "hostile monkeypatching". Someone might just be trying to fix their code that was broken by the backwards-incompatible change proposed in this discussion.
For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random.
That's fine but seeded_random won't exist in earlier Python versions so it creates another cross-version compatibility problem. Also switching to using your own random instance can be a non-trivial change if more than one module/project is involved. The random module has deliberately provided a convenient place to store that global state which would need to be replaced somehow.
TBH when I need to burn thousands of CPU-hours on RNG heavy code I would rather use numpy's random module. It also uses Mersenne Twister but it's a lot faster if you need loads of random numbers. -- Oscar

Nick Coghlan <ncoghlan@...> writes:
Sorry, -1 on this. Theo proposed a simple API like: arc4random() arc4random_uniform() Go has: https://golang.org/pkg/math/rand/ https://golang.org/pkg/crypto/rand/ These are sane, unambiguously named APIs. I wish Python had more of those. If people must have their CSPRNG, please let's leave the random module alone and introduce a crypto module like Go. Stefan Krah

On 14/09/15 15:43, Stefan Krah wrote:
In a perfect world, every programmer would know the difference between PRNGs for numerical simulation and entropy sources for cryptography. Those that do will still use os.urandom or just read from /dev/urandom or /dev/random for cryptography. Those that do know the need for mathematical precision when simulating samples from a given distribution. Those that do know the need for a fixed seed because a Monte Carlo simulation should be exactly reproducible in a scientific context. The problem is users who have no idea that the Mersenne Twister is constructed for producing random deviates that are great for numerical simulation -- and that the Mersenne Twister is very weak for cryptography. Using os.urandom as default entropy source has the opposite effect. It is not constructed for being mathematically precise, it is slow, and it does not allow for a fixed seed and exact reproducibility. Whatever we do there are someone who are going to shoot their leg off. A crypto module would perhaps be great, but it does not solve anything. Someone who uses random.random instead of os.urandom is likely to use random.random instead of a PRNG in a crypto module as well. Mostly this is about propagating knowledge of random number generators to new developers and science students. Sturla

Sturla Molden <sturla.molden@...> writes:
The sentiments in the original thread (which has now been renamed two times), seem to have been lost: Theo: ===== "chacha arc4random is really fast. if you were to create such an API in python, maybe this is how it will go: say it becomes arc4random in the back end. i am unsure what advice to give you regarding a python API name. in swift, they chose to use the same prefix "arc4random" (id = arc4random(), id = arc4random_uniform(1..n)"; it is a little bit different than the C API. google has tended to choose other prefixes. we admit the name is a bit strange, but we can't touch the previous attempts like drand48.... I do suggest you have the _uniform and _buf versions. Maybe apple chose to stick to arc4random as a name simply because search engines tend to give above average advice for this search string?" Theo: ===== "that opens /dev/urandom or uses the getrandom system call depending on system. it also has support for the windows entropy API. it pulls data into a large buffer, a cache. then each subsequent call, it consumes some, until it rus out, and has to do a fresh read. it appears to not clean the buffer behind itself, probably for performance reasons, so the memory is left active. (forward secrecy violated) i don't think they are doing the best they can... i think they should get forward secrecy and higher performance by having an in-process chacha. but you can sense the trend." So the original thread is about: ================================ - Inplementing a possibly faster (and allegedly more secure) chacha20-random. - Possibly using the naming scheme of Swift. - Being careful with os.urandom(), as there are some pitfalls that the OpenBSD libcrypto (allegedly) solves. I see nothing about magically repurposing random.random() functions. Stefan Krah

On September 14, 2015 at 10:50:46 AM, Stefan Krah (skrah@bytereef.org) wrote:
I've actually talked to Theo and I believe he's read my summary of his proposal and he didin't mention anything amiss. He did mention that he wasn't aware of the number of APIs that we had in random.py that built ontop of the RNG. As far as I can tell from talking to him, he focused on that particular thing because he became aware of the issue via the recent issue with getentropy on Solaris, and I believe he assumed that our APIs were similar to C in what we provided. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On September 14, 2015 at 11:16:48 AM, Stefan Krah (skrah@bytereef.org) wrote:
Well, he's expressed that he's unlikely to participate in this discussion because he doesn't use Python and thus doesn't have any skin in the game. He just saw an opportunity to try and improve the "ambient" security of applications written in Python and thought he'd reach out to see if there was any interest in it on our end. I'd ask him personally, but given that I'm "biased" you'll have to manage to ask him on your own. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On September 14, 2015 at 9:33:27 AM, Nick Coghlan (ncoghlan@gmail.com) wrote:
I don't love the "seedable" and "seedless" names here, but I don't have a better suggestion for the userspace CSPRNG one because it's security properties are a bit nuanced. People doing security sensitive things like generating keys for cryptography should still use something based on os.urandom, so it's mostly about providing a safety net that will "probably" [1] be safe. Probably something like random.ProbablySecureRandom is a bad name :)
* provide a random.set_default_instance() API that makes it possible to specify the instance used by the module level methods
I think this particular bit is a bad idea, it makes an official API that makes it really hard for an auditor to come into a code base and determine if the use of random is correct or not. Given that going back to the MT based algorithm is fairly trivial (and could even be mechanical) what's the long ter benefit here? [1] The safety of userspace CSPRNGs is a debated topic by security experts, however I think any of them would be hard pressed to think it's a bad idea to have a userspace CSPRNG as a safety net for folks who, for whatever reason, didn't know to use os.urandom/random.SystemRandom and instead to make them more likely to be safe, or at the very least, harder to attack. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, Sep 14, 2015, at 09:51, Donald Stufft wrote:
It's no worse than what OpenBSD itself has done with the C api for rand/random/rand48. At some point you've got to balance it with the realities of making backwards compatibility easy to achieve for the applications that really do need it with either a few lines change or none at all. And anyway, the auditor would *know* that if they see a module-level function called they need to do the extra work to find out what mode the module-level RNG is in (i.e. yes/no is there anywhere at all in the codebase that changes it from the secure default?) It's not an "official API", it's an escape hatch for allowing a minimal change to existing code that needs the old behavior.
Given that going back to the MT based algorithm is fairly trivial (and could even be mechanical) what's the long ter benefit here?
I don't see how it's trivial/mechanical, *without* the exact feature being discussed.

Random832 <random832@...> writes:
It's no worse than what OpenBSD itself has done with the C api for rand/random/rand48.
These functions aren't used widely in scientific computing.
It's not an "official API", it's an escape hatch for allowing a minimal change to existing code that needs the old behavior.
It's yet another case split to keep in the back of one's mind. Stefan Krah

On 14/09/15 16:45, Random832 wrote:
It is not just a matter of security versus determinism. It is also a matter of numerical accuracy. The distribution of the output sequence must be proven and be as close as possible to the distribution of interest. MT19937 is loved by scientists because it emulates sampling from the uniform distribution so well. Faster alternatives exist, more secure alternatives too. But when we simulate a stochastic process we also care about numerical accuracy. MT19937 is considered state of the art for this purpose. It does not seem that the issue of numerical accuracy is appreciated in this debate. Cryptographers just want random bits that cannot be predicted. Numerical accuracy is not their primary concern. If you replace MT19937 with "something more secure" you likely also loose its usefulness for scientific computing. Sturla

On September 14, 2015 at 11:40:53 AM, Sturla Molden (sturla.molden@gmail.com) wrote:
Nobody is suggesting to remove MT, just make it so you have to explicitly opt-in to it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 2015-09-14 16:39, Sturla Molden wrote:
Actually, it's well behind the state of the art as it fails BigCrush. The proposed alternative does better in this regard. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 14/09/15 17:50, Robert Kern wrote:
Actually, it's well behind the state of the art as it fails BigCrush. The proposed alternative does better in this regard.
Is that one of the PCGs? Or Arc4Random, ChaCha20 or XorShift64/32? The three latter fails on k-dimensional equi-distribution, MT does not. Some of the PCGs do too, but some should be as good as MT. Not sure if that is worse or better than failing some parts of BigCrush. Which PCG would you recommend, by the way? Sturla

On 2015-09-14 17:56, Sturla Molden wrote:
The alternative proposed in this thread is ChaCha20.
There is a reason that exact k-dimensional equidistribution for such a large k is not tested even in BigCrush. It's a nifty feature useful in a few applications, but not for simulations. It is important that the PRNG is *well*-distributed, but exact equidistribution is mostly neither here nor there. It can be trivially implemented by statistically bad PRNGs, like a simple counter. Obtaining it requires implementing an astronomically long period (and consequent growth in the state size) that adds significant costs without any realizable improvement to the statistics. If I'm drawing millions of numbers, k=623 is not much better than k=1, provided that the generator is otherwise good.
Which PCG would you recommend, by the way?
Probably pcg64 (128-bit state, 64-bit output). Having the 64-bit output is nice so you only have to draw one value to make a uniform(0,1) double, and a period of 2**128 is nice and roomy without being excessively large. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On September 14, 2015 at 10:16:49 AM, Random832 (random832@fastmail.com) wrote:
Easily, you change your: import random to from random import seeded_random as random And then all of your code that used random.foo works without any further modification. If you were importing the individual functions, you can either change your code to use random.foo or you can do: from random import seeded_random as _random random = _random.random randint = _random.randint If you want to do this in cross language code, then you can combine this with a try: except block like: try: from random import seeded_random as random except ImportError: import random Either way, trivial and mechanical. It doesn't require much thought, it just requires some pretty simple changes. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, Sep 14, 2015 at 10:16:00AM -0400, Random832 wrote:
Of course it is an official API. It's a documented public function (or rather, it will be if Nick's suggest is accepted) in the standard library. That makes it an official API. The *whole purpose of it* is to give a standard API for what Python can already do: monkey-patch the random module. E.g. we can do this now: import random random.random = lambda: 9 random.uniform = lambda a, b: return 9 but if you do that, you know you're on thin ice. I don't entirely agree with everything Donald has said, but I agree that providing this API would be harmful. It would mean that any arbitrary module you import (directly or indirectly) could swap out the secure CSPRNG you're relying on for an insecure PRNG, and you would never know. (Yes, they could do that now, this is Python. But they won't, because there's no official API for swapping out the default PRNG.) -- Steve

Probably something like random.ProbablySecureRandom is a bad name :)
Yes but unsecureRandom for the unsecure one (which obviously is insecure) is not unreasonable. (unsafe can be shorter) -- M Also seedless does not mean secure: https://xkcd.com/221/ <https://xkcd.com/221/> :-)

On 14.09.15 16:32, Nick Coghlan wrote:
* make random.Random a subclass of SeedableRandom that deprecates seed(), getstate() and setstate()
I would make seed() and setstate() to switch to seedable algorithm. If you don't use seed() or setstate(), it is not important that the algorithm is changed. If you use seed() or setstate(), you expect reproducible behavior.
* random.Random becomes an alias for random.SeedlessRandom
This breaks compatibility with the data pickled in older Python.
What to do with "from random import random" deep in third-party module? It caches random.random in the module dictionary.

On 9/14/2015 11:04 AM, Serhiy Storchaka wrote:
An alternate proposal is to initialize the module so that random uses a something more 'secure' than MT. Then...
I would make seed() and setstate() to switch to seedable algorithm.
In particular, to MT. Also switch on a getstate() call.
There is more than one possible internal implementation. But for any of them, the change should be invisible to callers. (Representations and introspection results would be a different matter.) I understand that the docs currently say that random uses MT. But I wonder if any version of the above could be used in current versions, so as to immediately "upgrade a lot of existing instructions on the internet" and code that follows such instructions. -- Terry Jan Reedy

On 14 September 2015 at 14:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'll write it up as a full PEP later, but I think it's just as useful in this form for now.
Please provide costs and benefits. At the moment, the proposal takes an implied stance that fixing security issues warrants disruption to users (and in particular to users with *no* security requirements). I appreciate that there's the usual 2-release long deprecation process, and that the only visible disruption is to the state/seed APIs. But I'd like to see that expanded on a little more, precisely to convince those people who *aren't* automatically convinced by "there's a security issue" arguments, that the trade-offs have been properly analyzed. For example, in terms of costs: 1. The module API is more complex and harder to teach. 2. The new API deliberately introduces a global state setting API. 3. People using "from random import choice" can't use the "simple upgrade" recommendation "from random import system_random as random". The benefits seem to be solely: 1. Users of code written based on bad advice will be protected from the consequences (as long as the code runs on a sufficiently new version of Python). (I'm serious - that's how the benefit statement reads to me. Although I agree it'd be nice if I worded it a bit more unemotionally, I genuinely don't know how to without either overstating it or making it a paragraph long...) I'm not trying to say that the cost/benefit analysis doesn't justify the change (I'm currently unconvinced, and trying to remain open in spite of the over-abundance of security rhetoric in the thread), just that it's a key point of the debate here, and it's not captured in your summary/pre-PEP. Paul

On Mon, Sep 14, 2015 at 8:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One problem that people (I can't remember who) have pointed out about random.set_default_instance() is that any imported module in the same process can change the random from secure -> insecure at a distance. One way to solve this is to ensure that set_default_instance() can be called only once; if it is called more than once, a RuntimeError could be raised. I think the logging module does something like this for setting the logging level? I think the only way that this really would make sense would be to make set_default_instance() be called before any of the module level functions. The first time a module level function is called, you could default to selecting the CSRNG. If you call one of the seeded API functions (getstate, setstate, seed) before the other module-level functions the instance could default to the deterministic RNG, but that might be confusing to debug. I could imagine people getting really confused if this program worked: import random random.seed(1234) random.random() but this program failed: import random random.random() random.seed(1234) # would raise a RuntimeError random.random() # would not be reached I'm not crazy about the idea of changing the default instance based on the first module level function called; that might be a terrible idea. But I _do_ think it's a good idea not to let the default instance change throughout the life of the program.

On Sep 14, 2015, at 06:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
Since I suggested the set_default_instance and the singleton instances that can be imported in place of the module, I'm obviously happy with those parts. However, I think you still haven't solved the problem with my proposal that you set out to solve. The main difference is that I wanted to deprecate (and eventually make it an error) to use the top-level functions without calling set_default_instance, while you want to allow them and gradually shift the semantics from using the seeded to the seedless PRNG. As I understand it, the reason for this is that you want to make it possible for someone to write "from random import choice", and not get a warning or error telling them they need to call set_default_instance or import one of the singletons instead. But then you're encouraging people to write code that's broken in 3.6 and earlier--and that's also potentially broken in 3.7 if used together with any code that calls set_default_instance (because that can't retroactive fix anything from-imported before the call). So, it takes 18 more months to provide any benefit, and it adds an extra cost. Maybe the suggestion of not allowing set_default_instance to be called more than once and/or after any other functions is sufficient, but I'm not sure that it is. What about this change: replace the three singleton instances with three modules, so we can tell people (and 2to3 and similar mechanical tools) to replace "from random import choice" with "from random.seedless_random import choice"? Would that be acceptable for novices? (And, If so, would that mean we no longer need the set_default_instance and can just flat-out deprecate the top-level functions in random?) If that's not sufficient because the name is too long/too nested, could we just flatten the names out, so it's "from seedless_random import choice", and then the deprecation process is just making random an alias for seeded_random and then switching it to seedless_random later? (I don't think there's any official cross-platform way to alias modules like that, and having to do some ugly sys.modules munging to force them to be the same instance, or using a special module finder just for this case, etc. is obviously ugly, but it may be worth doing anyway.) One nice advantage of this is that it's dead-easy to backport; if I need seeded_random, I can write code that works for 2.6+/3.3+ by just spending on seeded_random from PyPI...

On 15 September 2015 at 09:10, Andrew Barnert <abarnert@yahoo.com> wrote:
This entire problem is one that I put in the "fix it eventually" category, rather than "fix it urgently" - folks really are better off learning to use things like cryptography.io for security sensitive software, so this change is just about harm mitigation given that it's inevitable that a non-trivial proportion of the millions of current and future Python developers won't do that. Since there's really only one transition I want to enable (seedable -> seedless as the default RNG), I now think the "switch implicitly as needed" is a better idea than a permanent support API for switching the default instance - I'd just add a deprecation warning to that behaviour, with the intent of removing it some time after 2.7 goes EOL. I also realised based on Paul's comments that we really do want "random.seedable" and "random.seedless" submodules, since that allows proper interaction with the import system in constructs like "from random.seedable import randint" That would make the proposed change for Python 3.6: * add a random.SeedlessRandom API that omits the seed(), getstate() and setstate() methods and uses a cryptographically secure PRNG internally (such as the ChaCha20 algorithm implemented by OpenBSD) * deprecate the seed(), getstate() and setstate() methods on SystemRandom * convert random to a pseudo-module with "seedless", "seedable" and "system" submodules (keeping most code in __init__ for easy pickle compatibility) * these would each work like the current top-level random module - a default instance, with bound methods as module level callables * random._inst would be an alias for random.seedless._inst by default * the top level random functions would change to be functions lazily looking up methods on random._inst, rather than bound methods * if you call the module level seed(), getstate(), or setstate() methods, and random._inst is set to random.seedless._inst, it will issue a deprecation warning recommending the direct use of "random.seedable" and switch random._inst to refer to random.seedable._inst instead Compared to my original proposal, the seedable MT RNG retains the random.Random name, so any code already using explicit instances is entirely unaffected by the proposed change. This means the only code that will receive a deprecation warning is code calling the module level seed(), getstate() and setstate() functions, and that warning will just recommend importing "random.seedable" rather than importing "random". The API used to replace the default instance at runtime for backwards compatibility purposes becomes private rather than public, so we only need to support our specific reasons for doing that, rather than supporting it as a general feature. Future security audits would focus on the use of the module "seed()", "getstate()" and "setstate()" functions (since they'd trigger the deterministic RNG process wide), and it would also still be encouraged to use random.SystemRandom() or os.urandom() for security sensitive use cases (since that's both version independent, and immune to other modules modifying the active default RNG). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 September 2015 at 12:30, Random832 <random832@fastmail.com> wrote:
Yes, with the revised proposal, only the module level functions would change their behaviour to use a CSPRNG by default. If you trawl the various cryptographically unsound password generation recipes, they're almost all using the module level functions, so changing the meaning of random.Random itself would add a lot of additional pain for next to no gain. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sep 14, 2015, at 20:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
That definitely makes things simpler. The only part of my set_default_instance patch that was at all tricky was how to make sure Random instances worked the same as top-level functions, but still providing a way to explicitly select one (hence renaming the base class to DeterministicRandom, making a new subclass UnsafeRandom that subclasses it and added the warning, and making both Random and the top-level functions point at that). If we don't need that, then your simpler solution makes more sense. Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore. First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter? One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5? For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. The fact that many apps will never actually issue a deprecation warning or any other signal that anything is changing may be leaning over too far toward convenience. I realize the benefit of having books and course materials written for 3.4 continue to work in 3.8, but really, if those books are giving people bad ideas, removing any incentive for anyone to change the next edition may not be a good idea. And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted".

On 15 September 2015 at 14:03, Andrew Barnert <abarnert@yahoo.com> wrote:
Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore.
First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter?
If folks are in a situation where the performance impact of the additional layer of indirection is a problem, they can switch to using random.Random explicitly, or import from random.seedable rather than the top level random module.
One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5?
This isn't an applicable concern, as we already provide zero runtime protections against hostile monkeypatching of other modules (by design choice). You can subvert even os.urandom in a hostile plugin: def not_random(num_bytes): return b'A' * num_bytes import os os.urandom = not_random Once "potentially hostile code running in the current process" is part of your threat model, CPython is out of the running, and even PyPy's sandboxing capabilities rely on running the potentially hostile code in a separate process. IronPython and Jython can rely on CLR/JVM sandboxing, but that's still a case of delegating the problem to the host platform, rather than trying to solve it at the Python level.
For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. The fact that many apps will never actually issue a deprecation warning or any other signal that anything is changing may be leaning over too far toward convenience. I realize the benefit of having books and course materials written for 3.4 continue to work in 3.8, but really, if those books are giving people bad ideas, removing any incentive for anyone to change the next edition may not be a good idea.
Forcing people to make choices they're ill-equipped to make just because we think they "should" know enough to make a wise decision is one of the leading causes of user hostile software (consider the respective user experiences of a HTTP site and a HTTPS site with a self-signed certificate). People are busy, and life is full of decisions that need to be made where there's no good default, so when we're able to deliver a good default that fails *noisily* when it's the wrong answer, that's what we should be aiming for.
And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted".
Given the general lack of investment in sustaining engineering for scientific software, I think the naysayers are right on that front, which is why I switched my proposal to give them a transparent upgrade path - I was originally thinking primarily of the educational and gaming use cases, and hadn't considered randomised simulations in the scientific realm. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 September 2015 at 05:53, Nick Coghlan <ncoghlan@gmail.com> wrote:
The same problem can occur the other way round. Suppose that I want my whole app to be seedable but I have many modules that use "from random import choice" etc. Then in my top-level script I call random.seed and get an error under Python 3.6. So I switch that to use random.seedable but potentially end up with a mix of modules using random.seedable.choice and random.choice. It may seem under certain conditions that my app is properly seeded while not under others depending on which particular functions get called. The docs explicitly state that I will always be able to globally seed the module so that my entire non-threaded application is reproducible when using the top-level functions (even across different Python versions for random.random). So it's entirely reasonable to expect that people are using this behaviour and will want a way to revert to it which in the general case would need something like set_default_instance so that every module (including those I don't write myself) uses the same generator.
It might not be a case of "hostile monkeypatching". Someone might just be trying to fix their code that was broken by the backwards-incompatible change proposed in this discussion.
For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random.
That's fine but seeded_random won't exist in earlier Python versions so it creates another cross-version compatibility problem. Also switching to using your own random instance can be a non-trivial change if more than one module/project is involved. The random module has deliberately provided a convenient place to store that global state which would need to be replaced somehow.
TBH when I need to burn thousands of CPU-hours on RNG heavy code I would rather use numpy's random module. It also uses Mersenne Twister but it's a lot faster if you need loads of random numbers. -- Oscar
participants (15)
-
Andrew Barnert
-
Cody Piersall
-
Donald Stufft
-
Matthias Bussonnier
-
Nick Coghlan
-
Oscar Benjamin
-
Paul Moore
-
Random832
-
Robert Kern
-
Serhiy Storchaka
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sturla Molden
-
Terry Reedy