Eric Maryniak wrote:
According to the _Numpy_ manual for RandomArray.seed(x=0, y=0) (with /my/ emphasis):
The seed() function takes two integers and sets the two seeds of the random number generator to those values. If the default values of 0 are used for /both/ x and y, then a seed is generated from the current time, providing a /pseudo-random/ seed.
Note: in numarray, the RandomArray2 package is provided but it's description is not (yet) included in the numarray manual.
I have some questions about this:
- The implementation of seed(), which is, by the way, identical
both in Numeric's RandomArray.py and numarray's RandomArray2.py seems to contradict it's usage description:
The 2 in RandomArray2 is there to support side-by-side testing with Numeric, not to imply something new and improved. The point of providing RandomArray2 is to provide a migration path for current Numeric users. To that end, RandomArray2 should be functionally identical to RandomArray.
That should not, however, discourage you from writing a new and improved random number package for numarray.
---cut--- def seed(x=0,y=0): """seed(x, y), set the seed using the integers x, y; Set a random one from clock if y == 0 """ if type (x) != IntType or type (y) != IntType : raise ArgumentError, "seed requires integer arguments." if y == 0: import time t = time.time() ndigits = int(math.log10(t)) base = 10**(ndigits/2) x = int(t/base) y = 1 + int(t%base) ranlib.set_seeds(x,y) ---cut---
Shouldn't the second 'if' be:
if x == 0 and y == 0:
With the current implementation:
- 'seed(3)' will actually use the clock for seeding
- it is impossible to specify 0's (0,0) as seed: it might be better to use None as default values?
- With the current time.time() based default seeding, I wonder
if you can call that, from a mathematical point of view, pseudo-random:
---cut--- $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from numarray import * from RandomArray2 import * import time numarray.__version__
for i in range(5):
... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(1) ... print ... 1027434978.406238 (102743, 4979)
1027434979.400319 (102743, 4980)
1027434980.400316 (102743, 4981)
1027434981.40031 (102743, 4982)
1027434982.400308 (102743, 4983) ---cut---
It is incremental, and if you use default seeding within one (1) second, you get the same seed:
for i in range(5):
... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(0.1) ... print ... 1027436537.066677 (102743, 6538)
1027436537.160303 (102743, 6538)
1027436537.260363 (102743, 6538)
1027436537.360299 (102743, 6538)
1027436537.460363 (102743, 6538) ---cut---
- I wonder what the design philosophy is behind the decision
to use 'mathematically suspect' seeding as default behavior.
Using time for a seed is fairly common. Since it's an implementation detail, I doubt anyone would object if you can suggest a better default seed.
Apart from the fact that it can hardly be called 'random', I also have the following problems with it:
- The RandomArray2 module initializes with 'seed()' itself, too. Reload()'s of RandomArray2, which might occur outside the control of the user, will thus override explicit user's seeding. Or am I seeing ghosts here?
Overriding a user's explicit seed as a result of a reload sounds correct to me. All of the module's top level statements are re-executed during a reload.
- When doing repeated run's of one's neural net simulations that each take less than a second, one will get identical streams of random numbers, despite seed()'ing each time. Not quite what you would expect or want.
This is easy enough to work around: don't seed or re-seed. If you then need to make multiple simulation runs, make a separate module and call your simulation like:
RandomArray2.seed(something_deterministic, something_else_deterministic) for i in range(number_of_runs): simulation.main()
- From a purist software engineering point of view, I don't think automagical default behavior is desirable: one wants programs to be deterministic and produce reproducible behavior/output.
I don't know. I think by default, random numbers *should be* random.
If you use default seed()'ing now and re-run your program/model later with identical parameters, you will get different output.
When you care about this, you need to set the seed to something deterministic.
In Eiffel, object attributes are always initialized, and you will almost never have irreproducible runs. I found that this is a good thing for reproducing ones bugs, too ;-)
This sounds like a good design principle, but I don't see anything in RandomArray2 which is keeping you from doing this now.
To summarize, my recommendation would be to use None default arguments and use, when no user arguments are supplied, a hard (built-in) seed tuple, like (1,1) or whatever.
Unless there is a general outcry from the rest of the community, I think the (existing) numarray extensions (RandomArray2, LinearAlgebra2, FFT2) should try to stay functionally identical with Numeric.
Sometimes a paper on a random number generator suggests seeds (like 4357 for the MersenneTwister), but of course, a good random number generator should behave well independently of the initial seed/seed-tuple. I may be completely mistaken here (I'm not an expert on random number theory), but the random number generators (Ahrens, et. al) seem 'old'? After some studying, we decided to use the Mersenne Twister:
An array enabled version might make a good add-on package for numarray.
M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator", ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3-30 1998
There are some Python wrappers and it has good performance as well.