
Dear crunchers, Please see my personal thoughts on the past discussion about initial seeds some paragraphs down below, where I'd like to list concrete code and manual enhancements aimed at providing users with a clear understanding of it's usage (and pitfalls e.g. w/r to cryptographic applications)... ==> Suggestions for code and manual changes w/r to initial seeding (down below) But first a response to Paul's earlier message: On Thursday 25 July 2002 08:08, Paul F Dubois wrote:
I'm not going to change the default seed on RNG. Existing users have the right to stability, and not to have things change because someone thinks a certain choice among several reasonable ones is better than the one previously made.
Well, I wasn't aware of the fact that things were completely set in stone for Numarray solely for backward compatibilty. It was my impression that numarray and it's accompanying xx2 packages were also open for redesign. I agree stability is important, but numarray already breaks with Numeric in other aspects so why should RNG (RNG2 in numarray?) or other packages not be? It's more a matter of well documenting changes I think. Users switching to numarray will already have to take into account some changes and verify their code. It's not that I "think a certain choice among several reasonable ones is better" [although my favorite is still a fixed seed, as in RNG, for reasons of reproducibility in later re-runs of Monte Carlo's that are not possible now, because the naive user, using a clock seed, may not have saved the initial seed with get_seed], but that the different packages, i.c. RNG (RNG2 to be?) and RandomArray2, should be orthogonal in this respect. I.e. the same, so 'default always an automagical (clock whatever) random initial seed _or_ a fixed one'. Orthogonality is a very common and accepted design principle in computing science and for good reasons (usability). Users changing from one PRNG to another (and using the default seed) would otherwise be unwelcomely surprised by a sudden change in behavior of their program. I try to give logical arguments and real code examples in this discussion and fail to see in Paul's reaction where I'm wrong. By the way: in Python 2.1 alpha 2 seeding changed, too: """ - random.py's seed() function is new. For bit-for-bit compatibility with prior releases, use the whseed function instead. The new seed function addresses two problems: (1) The old function couldn't produce more than about 2**24 distinct internal states; the new one about 2**45 (the best that can be done in the Wichmann-Hill generator). (2) The old function sometimes produced identical internal states when passed distinct integers, and there was no simple way to predict when that would happen; the new one guarantees to produce distinct internal states for all arguments in [0, 27814431486576L). """
There is the further issue here of RNG being advertised as similar to Cray's ranf() and that similarity extends to this default. Not to mention that for many purposes the current default is quite useful.
Perhaps I'm mistaken here, but RNG/Lib/__init__.py does (-1 -> uses fixed internal seed): standard_generator = CreateGenerator(-1) and: def ranf(): "ranf() = a random number from the standard generator." return standard_generator.ranf() And indeed Mixranf in RNG/Src/ranf.c does set them to 0: ... if(*s < 0){ /* Set default initial value */ s48[0] = s48[1] = 0; Setranf(s48); Getranf(s48); And this code, or I'm missing the point, uses a standard generator from RNG, which demonstrates the same sequence of initial seeds in re-runs (note that it does not suffer from the "1-second problem" as RandomArray2 does, see the Appendix below for a demonstration of that, because RNG uses milliseconds). Note that 'ranf()' is listed in chapter 18 in Module RNG as one of the 'Generator objects': $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ... >>> from numarray import * >>> from RNG import * >>> for i in range(3): ... standard_generator.ranf() ... 0.58011364857958725 0.95051273498076583 0.78637142533060356 >>> $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ... >>> from numarray import * >>> from RNG import * >>> for i in range(3): ... standard_generator.ranf() ... 0.58011364857958725 0.95051273498076583 0.78637142533060356 >>> Ok, now then my own (and possibly biased) personal summary of the past discussions and concrete code and manual recommendations: ==> Suggestions for code and manual changes w/r to initial seeding Conclusions: 1. Default initial seeding should be random (and not fixed). This is the general consensus and while it may not win the beauty contest in purist software engineering circles, it also is the default behavior in Python's own Random/WHRandom modules. URL: http://web.pydoc.org/2.2/random.html => Recommendations: - Like Python's random/whrandom module, default arguments to seed() should not be 0, but None, and this triggers the default behavior which is to use a random initial seed (ideally 'truly' random from e.g. /dev/random or otherwise clock or whatever based), because: o better usability: users changing from Python's own random to numarray's random facilities will find familiar seed() usage semantics o often 0 itself can be a legal seed (although the MersenneTwister does not recommend it) - Like RNG provide support for using a built-in fixed seed by supplying negative seeds to seed(), rationale: o support for reproducible re-runs of Monte Carlo's without having to specify ones own initial seed o usability: naive users may not know a 'good' seed is, like: can it be 0 or must it be >0, what is the maximum, etc. - See my suggested code fix for RandomArray2.seed() in the Appendix below. - Likewise, in RNG: o CreateGenerator (s, ...) should be changed to CreateGenerator (s=None) Also note Python's own: def create_generators(num, delta, firstseed=None) from random (random.py), url: http://web.pydoc.org/2.2/random.html o RNG's code should be changed from testing on 0 to testing on None first (which results in using the clock), then on < 0 (use built-in seed), and then using the user provided seed (which is thus >= 0, and hence can also be 0) o 'standard_generator = CreateGenerator(-1)' should be changed to 'standard_generator = CreateGenerator() and results in using the clock - Put some explicit warnings in the numarray manual, that the seeding of numarray's packages should _not_ be used in those parts of software where unpredictability of seeds is important, such as for example, cryptographical software for creating session keys, TCP sequence numbers etc. Attacks on crypto software usually center around these issues. Ideally, a /dev/random should be used, but with the current system clock based implementation, the seeds are not random, because the clock does not have deci-nanosecond precision (10**10 ~= 2**32) yet ;-) Appendix -------- ** 1. "1-second problem" with RandomArray2: $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ...
from numarray import * from RandomArray2 import * import time import sys sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' numarray.__version__ '0.3.5' for i in range(3): ... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(1) ... print ... 1027591910.9043469 (102759, 1911)
1027591911.901091 (102759, 1912) 1027591912.901088 (102759, 1913)
for i in range(3): ... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(0.3) ... print ... 1027591966.260392 (102759, 1967)
1027591966.5510809 (102759, 1967) 1027591966.851079 (102759, 1967) Note that Python (at least 2.2.1) own random() suffers much less from this (on my 450 MHz machine, every 10-th millisecond or so the seed will be different): $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ...
from random import * import time
for i in range(3): ... print long(time.time() * 256) ... 263065231349 263065231349 263065231349 for i in range(3): ... print long(time.time() * 256) ... time.sleep(.00001) ... 263065240314 263065240315 263065240317
By the way, Python's own random.seed() also suffers from this, but on a 10th-millisecond level (on my 450 Mhz i586 at least). For the implementation of seed() see Lib/random.py, basically a 'long(time.time()' is used: $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ...
from random import * import time for i in range(3): ... print long(time.time() * 256) ... 263065231349 263065231349 263065231349 for i in range(3): ... print long(time.time() * 256) ... time.sleep(.00001) ... 263065240314 263065240315 263065240317
2. Proposed re-implementation of RandomArray2.seed(): def seed(x=None,y=None): """seed(x, y), set the seed using the integers x, y: x or y is None (or not specified): A random seed is used which in the current implementation may be based on the system's clock. Warning: do not this seed in software where the initial seed may not be predictable, such as for example, in cryptographical software for creating session keys. x < 0 or y < 0: Use the module's fixed built-in seed which is the tuple (1234567890L, 123456789L) (or whatever) x >= 0 and y >= 0 Use the seeds specified by the user. (Note: some random number generators do not recommend using 0) Note: based on Python 2.2.1's random.seed(a=None). ADAPTED for _2_ seeds as required by ranlib.set_seeds(x,y) """ if (x != None and type (x) != IntType) or (y != None and type (y) != IntType) : raise ArgumentError, "seed requires integer arguments (or None)." if x == None or y == None: try: # This would be the best, but is problematic under Windows/Mac. # To my knowledge there isn't a portable lib_randdevice yet. # As GPG, OpenSSH and OpenSSL's code show, getting entropy # under Windows is problematic. # However, Python 2.2.1's socketmodule does wrap the ssl code. import dev_random_device # uses /dev/random or equivalent x = dev_random_device.nextvalue() # egd.sf.net is a user space y = dev_random_device.nextvalue() # alternative except: # Use Mixranf() from RNG/Src/ranf.c here or, perhaps better, # use Python 2.2.1's code? At least it looks simpler and does not # have the platform dependency's and has possibly met wider testing # (and why not re-use code? ;-) # For Python 2.2.1's random.seed(a=None), see url: # http://web.pydoc.org/2.2/random.html # and file Lib/random.py. # Do note, however, that on my 450 Mhz machine, the statement # 'long(time.time() * 256)' will generate the same values # within a tenth of a millisecond (see Appendix #1 for a code # example). This can be fixed by doing a time.sleep(0.001). # See my #EM# comment. # Naturally this code needs to be adapted for ranlib's # generator, because this code uses the Wichmann-Hill generator. ---cut: Wichmann-Hill--- def seed(self, a=None): """Initialize internal state from hashable object. None or no argument seeds from current time. If a is not None or an int or long, hash(a) is used instead. If a is an int or long, a is used directly. Distinct values between 0 and 27814431486575L inclusive are guaranteed to yield distinct internal states (this guarantee is specific to the default Wichmann-Hill generator). """ if a is None: # Initialize from current time import time a = long(time.time() * 256) #EM# Guarantee unique a's between subsequent call's of seed() #EM# by sleeping one millisecond. This should not be harmful, #EM# because ordinarily, seed() will only be called once or so #EM# in a program. time.sleep(0.001) if type(a) not in (type(3), type(3L)): a = hash(a) a, x = divmod(a, 30268) a, y = divmod(a, 30306) a, z = divmod(a, 30322) self._seed = int(x)+1, int(y)+1, int(z)+1 ---cut: Wichmann-Hill--- elif x < 0 or y < 0: x = 1234567890L # or any other suitable 0 - 2**32-1 y = 123456789L ranlib.set_seeds(x,y) 3. Mersenne Twister, another PRNG: Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. In a grocery store, the Real Programmer is the one who insists on running the cans past the laser checkout scanner himself, because he never could trust keypunch operators to get it right the first time.