[issue9025] Non-uniformity in randrange for large arguments.

Raymond Hettinger report at bugs.python.org
Wed Jun 23 20:01:22 CEST 2010


Raymond Hettinger <rhettinger at users.sourceforge.net> added the comment:

Thanks guys, I've got it from here.

Some considerations for the PRNG are:
* equidistribution (for quality)
* repeatability from the same seed (even in multithreaded environments)
* quality and simplicity of API (for usability)
* speed (it matters whether a Monte Carlo simulation takes 5 minutes or 30 minutes).

I'm looking at several ideas:
* updating the new randrange() to use rnd2() algorithm shown above (for equidistribution).
* possibly providing a C version of rnd2() and using it in randrange() for speed and for thread-safety.
* possibly updating shuffle() and choice() to use rnd2().
* moving the existing randrange() to randrange_quick() -- uses int(n * random) for speed and for reproducibility of previously created sequences.  Alternatively, adding a recipe to the docs for recreating old sequences and not cluttering the code with backwards compatibility cruft.

Am raising the priority to normal because I think some effort needs to be made to address equidistribution in 3.2.  Will take some time to come-up with a good patch that balances quality, simplicity, speed, thread-safety, and reproducibility.   

May also consult with Tim Peters who has previously voiced concerns about stripping bits off of multiple calls to random() because the MT proofs make no guarantees about quality in those cases.  I don't think this is an issue in practice, but in theory when we start tossing out some of the calls to random(), we're also throwing away the guarantees of a long periodic, 623 dimensions, uniformity, and equidistribution.

----------
priority: low -> normal

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9025>
_______________________________________


More information about the Python-bugs-list mailing list