[Python-ideas] Globally configurable random number generation

Tue Sep 15 19:20:18 CEST 2015

On 15 September 2015 at 05:53, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 15 September 2015 at 14:03, Andrew Barnert <abarnert at yahoo.com> wrote:
>> Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore.
>>
>> First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter?
>
> If folks are in a situation where the performance impact of the
> additional layer of indirection is a problem, they can switch to using
> random.Random explicitly, or import from random.seedable rather than
> the top level random module.
>
>> One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5?

The same problem can occur the other way round. Suppose that I want my
whole app to be seedable but I have many modules that use "from random
import choice" etc. Then in my top-level script I call random.seed and
get an error under Python 3.6. So I switch that to use random.seedable
but potentially end up with a mix of modules using
random.seedable.choice and random.choice. It may seem under certain
conditions that my app is properly seeded while not under others
depending on which particular functions get called.

The docs explicitly state that I will always be able to globally seed
the module so that my entire non-threaded application is reproducible
when using the top-level functions (even across different Python
versions for random.random). So it's entirely reasonable to expect
that people are using this behaviour and will want a way to revert to
it which in the general case would need something like
set_default_instance so that every module (including those I don't
write myself) uses the same generator.

> This isn't an applicable concern, as we already provide zero runtime
> protections against hostile monkeypatching of other modules (by design
> choice). You can subvert even os.urandom in a hostile plugin:
>
>     def not_random(num_bytes):
>         return b'A' * num_bytes
>     import os
>     os.urandom = not_random

It might not be a case of "hostile monkeypatching". Someone might just
be trying to fix their code that was broken by the
backwards-incompatible change proposed in this discussion.

>> For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random.

That's fine but seeded_random won't exist in earlier Python versions
so it creates another cross-version compatibility problem. Also
switching to using your own random instance can be a non-trivial
change if more than one module/project is involved. The random module
has deliberately provided a convenient place to store that global
state which would need to be replaced somehow.

>> And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted".
>
> Given the general lack of investment in sustaining engineering for
> scientific software, I think the naysayers are right on that front,
> which is why I switched my proposal to give them a transparent upgrade
> path - I was originally thinking primarily of the educational and
> gaming use cases, and hadn't considered randomised simulations in the
> scientific realm.

TBH when I need to burn thousands of CPU-hours on RNG heavy code I
would rather use numpy's random module. It also uses Mersenne Twister
but it's a lot faster if you need loads of random numbers.

--
Oscar