Unsorting(randomizing) a sequence

Tim Peters tim_one at email.msn.com
Sat Aug 21 04:09:23 EDT 1999


[Tim, on the difficulty of generating random permutations, because
 n! quickly grows larger than a given random-# generator's period]

[Aahz Maruch]
> What happens if you re-seed?

By what means?  If you have a good source of random bits to do the reseeding
with, save yourself the expense of calling random() and just use them
directly <0.1 wink>.  If your seed generation isn't random, at best you can
boost the period of the combined generation scheme; at worst, damage the
pseudo-randomness of both streams.  "It depends", and on everything.

> I'm curious because I just wrote some code that re-seeds every
> hundred calls to random().  In my case, though, it's necessary
> because I actually don't care that much about randomness per se;
> I'm using it to generate unique numbers (with duplicate detection),
> and I need to ensure that if I ever do happen on the same random
> number sequence as a previous run of the process, it gets aborted
> reasonably quickly.

Probably better to skip the reseeding, saving random()'s state at the end of
one run and restoring it at the start of the next -- random.random()'s
period is in the neighborhood of a trillion; it's not going to repeat before
your current employer goes out of business <wink>.

But, so far, you haven't said anything that explains why you can't just
repeatedly add 1 to a global counter!  Guaranteed unique without bothering
to check for duplicates.  Something else must be driving this.

> (This is a temporary hack to deal with an unfortunate choice in
> hashing functions until we can switch to some code that uses SQL
> Server GUIDs.  And, yes, I did find out about Python's hash()
> function *after* I wrote this....  <sigh>)

Check out the md5 module for a much stronger (& much slower -- win-win
<wink>)128-bit hash; e.g.,

>>> m = md5.new(); m.update('a'); m.digest()
'\014\301u\271\300\361\266\2501\303\231\342iw&a'
>>> m = md5.new(); m.update('b'); m.digest()
'\222\353_\376\346\256/\354:\327\034wu1W\217'
>>>

they-should-have-named-guids-guidos-ly y'rs  - tim






More information about the Python-list mailing list