Mailman 3 Numarray: minor feature requests (setup.py and version info) - NumPy-Discussion

Numarray: minor feature requests (setup.py and version info)

Eric Maryniak

June 26, 2002

5:34 a.m.

Dear crunchers, Please excuse me for dropping a feature request here as I'm new to the list and don't have the 'feel' of this list yet. Should feature requests be submitted to the bug tracker? Anyways, I installed Numarray on a SuSE/Linux box, following the Numarray PDF manual's directions. Having installed Python packages (like, ehm, Numeric) before, here are a few impressions: 1. When running 'python setup.py' and 'python setup.py --help' I was surprised to see that already source generation took place: Using EXTRA_COMPILE_ARGS = [] generating new version of Src/_convmodule.c ... generating new version of Src/_ufuncComplex64module.c Normally, you would expect that at build/install time. 2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like: /usr/local/bin/python ./setup.py install --prefix=/usr/local 3. After installation, I usually test the success of a library's import by looking at version info (especially with multiple installations, see [2]). However, numarray does not seem to have version info? : # python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...

...
...
import sys sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' sys.version_info (2, 2, 1, 'final', 0)

...

...
...
import Numeric Numeric.__version__ '21.3'

...

...
...
import numarray numarray.__version__ Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute '__version__' numarray.version Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'version'

The __doc__ string: 'numarray: The big enchilada numeric module\n\n $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' does not seem to give a hint at the version (i.c. 0.3.4), either. Well, enough nitpicking for now I guess. Thanks to the Numarray developers for this project, it's much appreciated. Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. An error in the premise will appear in the conclusion.

Show replies by date

Perry Greenfield

June 2002

6:30 a.m.

Hi Eric, Todd Miller should answer these but he is away for a few days.

...

1. When running 'python setup.py' and 'python setup.py --help' I was surprised to see that already source generation took place:

Using EXTRA_COMPILE_ARGS = [] generating new version of Src/_convmodule.c ... generating new version of Src/_ufuncComplex64module.c

Normally, you would expect that at build/install time.

Yes, it looks like it does the code generation regardless of the option. We should change that.

...

2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like:

/usr/local/bin/python ./setup.py install --prefix=/usr/local

Good idea.

...

3. After installation, I usually test the success of a library's import by looking at version info (especially with multiple installations, see [2]). However, numarray does not seem to have version info? :

...

# python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import sys sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' sys.version_info (2, 2, 1, 'final', 0)

...
...
...
import Numeric Numeric.__version__ '21.3'

...
...
...
import numarray numarray.__version__ Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute '__version__' numarray.version Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'version'

The __doc__ string: 'numarray: The big enchilada numeric module\n\n $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' does not seem to give a hint at the version (i.c. 0.3.4), either.

Well, I remember putting this on the to do list and thought it had been done, but obviously not. I'm sure Todd will take care of these. Thanks very much for the feedback. Perry

Eric Maryniak

7:48 a.m.

Hello Perry, On Wednesday 26 June 2002 19:29, Perry Greenfield wrote:

...

...

...
2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like:

/usr/local/bin/python ./setup.py install --prefix=/usr/local

Good idea.

And perhaps another suggestion: no mention is made of the 'setupall.py' script... and setup.py does _not_ install the LinearAlgebra2 (including our favorite SVD ;-), Convolve, RandomArray2 and FFT2 packages. I successfully installed them with: python ./setupall.py install Other minor notes: #1: No FFT2.pth file is generated (the others are ok). It should just include the string 'FFT2'. #2: While RandomArray2 etc. nicely stay away from a concurrently imported Numeric.RandomArray, shouldn't Convolve, for orthogonality, be named Convolve2? (cuz who knows, numarray's Convolve may be backported to Numeric in the future, for comparative testing etc.). Of course in the end, when numarray is to replace Numeric, the '2' could be dropped altogether (breaking some programs then ;-) #3: LinearAlgebra2, RandomArray2 and Convolve have empty __doc__ 's. FFT and these 3 have no __version__ attributes, either (like numarray itself, too). Module sys uses a tuple 'version_info': >>> sys.version_info (2, 2, 1, 'final', 0) allowing fine-grained version testing and e.g. conditional importing etc. based on that. This may be a good idea for numarray, where interfaces may change and you could thus allow your code to support multiple (or rather, evolving) versions of numarray. Btw: imho __versioninfo__ or just __version__ would be a better standard attribute (for all modules) allowing a standard way of testing for major/minor version number, if __version__[0] >= 2: etc() Ideally, numarray's sub-packages' numbers would be in sync with that of numarray itself. Numeric's __version__ is a string, which is not so handy, either. #4: It is very helpful that there are a large number of self-tests of the packages, together with expected values. E.g.: Average of 10000 chi squared random numbers with 11 degrees of freedom (should be about 11 ): 11.0404176623 Variance of those random numbers (should be about 22 ): 21.6517761217 Skewness of those random numbers (should be about 0.852802865422 ): 0.718573002875 But sometimes you wonder (e.g. 0.85 / 0.71) if deviations are not too serious. Perhaps a 95%-int or std.dev. could be added?

...

...
... Thanks very much for the feedback.

Perry

You're welcome, they're just minor things one notices in the beginning and tends to ignore later; please say so if this kind of feedback should be postponed for later. Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Puzzle: what's another word for synonym?

Paul F Dubois

July 2002

12:50 p.m.

New subject: words that must not be spoken

...

[mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Eric Maryniak In the midst of a discussion Eric wrote:

...
... shouldn't Convolve, for orthogonality, be named Convolve2? (cuz who knows, numarray's Convolve may be backported to Numeric in the future, for comparative testing etc.).

Use of the phrase "backported to Numeric" will result in your subscription to numpy-discussion being cancelled. (:-> No backporting is ever going to happen. This is a short one-way street or there is no purpose to travel on it. I am just back from Europython and had a chance to talk to a lot of users and have some thoughts which I will share with all of you shortly. However, since I just had to fill out a form and where it said "Date" I looked at my watch and wrote the time 11/16, I conclude that I have jet lag and can't trust myself to be lucid yet.

Todd Miller

June 2002

2:24 a.m.

Perry Greenfield wrote:

...

Hi Eric,

Todd Miller should answer these but he is away for a few days.

...
1. When running 'python setup.py' and 'python setup.py --help' I was surprised to see that already source generation took place:

Using EXTRA_COMPILE_ARGS = [] generating new version of Src/_convmodule.c ... generating new version of Src/_ufuncComplex64module.c

Normally, you would expect that at build/install time.

Yes, it looks like it does the code generation regardless of the option. We should change that.

I'll clean this up.

...

...
2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like:

/usr/local/bin/python ./setup.py install --prefix=/usr/local

Good idea.

I'm actually surprised that this is necessary. I was under the impression that the distutils pick reasonable defaults simply based on the python that is running. In your case, I would expect numarray to install to /usr/local/lib/pythonX.Y/site-packages without specifying any prefix. What happens on SuSE?

...

...
3. After installation, I usually test the success of a library's import by looking at version info (especially with multiple installations, see [2]). However, numarray does not seem to have version info? :

...
# python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import sys sys.version

'2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]'

...
...
...
sys.version_info

(2, 2, 1, 'final', 0)

...
...
...
import Numeric Numeric.__version__

'21.3'

In numarray, this is spelled:

...

...
...
import numinclude numinclude.version '0.3.4'

I'll add __version__ to numarray as a synonym.

...

...
...
...
...
import numarray numarray.__version__

Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute '__version__'

...
...
...
numarray.version

Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'version'

The __doc__ string: 'numarray: The big enchilada numeric module\n\n $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' does not seem to give a hint at the version (i.c. 0.3.4), either.

Well, I remember putting this on the to do list and thought it had been done, but obviously not. I'm sure Todd will take care of these.

Thanks very much for the feedback.

Perry

Thanks again, Todd

...

------------------------------------------------------- This sf.net email is sponsored by: Jabber Inc. Don't miss the IM event of the season | Special offer for OSDN members! JabberConf 2002, Aug. 20-22, Keystone, CO http://www.jabberconf.com/osdn _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Eric Maryniak

9:48 p.m.

On Sunday 30 June 2002 15:31, Todd Miller wrote:

...

Perry Greenfield wrote:

...
...

...
2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like:

/usr/local/bin/python ./setup.py install --prefix=/usr/local

Good idea.

I'm actually surprised that this is necessary. I was under the impression that the distutils pick reasonable defaults simply based on the python that is running. In your case, I would expect numarray to install to /usr/local/lib/pythonX.Y/site-packages without specifying any prefix. What happens on SuSE?

Yes, you're probably right. On SuSE I tested it out on my own machine ('test server'), because I did not want to do it on the production server. It run's Python 2.2.1 exclusively. I remembered that I had to this in a previous Numeric installation, where 1.5.2 and 2.1 were running side-by-side (and at that time I also had to install distutils manually). So, yes, it may not be an issue (anymore) for at least recent Python's if you call the Python explicitly like '/usr/local/bin/python ./setup.py' and '/usr/bin/python ./setup' (on SuSE python goes to /usr/bin).

...

...
...
...

Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. It said 'Insert disk #3', but only two will fit.

Paul F Dubois

July 2002

12:41 p.m.

distutils installs into the python used to run the setup.py by using the sys.exec_prefix and sys.prefix. You would not normally need to use any option unless you are trying to install something "off to the side" because, for example, you don't have write permission in that python's site-packages directory.

Todd Miller

7:56 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

Eric Maryniak wrote:

...

Dear crunchers,

According to the _Numpy_ manual for RandomArray.seed(x=0, y=0) (with /my/ emphasis):

The seed() function takes two integers and sets the two seeds of the random number generator to those values. If the default values of 0 are used for /both/ x and y, then a seed is generated from the current time, providing a /pseudo-random/ seed.

Note: in numarray, the RandomArray2 package is provided but it's description is not (yet) included in the numarray manual.

I have some questions about this:

1. The implementation of seed(), which is, by the way, identical both in Numeric's RandomArray.py and numarray's RandomArray2.py seems to contradict it's usage description:

The 2 in RandomArray2 is there to support side-by-side testing with Numeric, not to imply something new and improved. The point of providing RandomArray2 is to provide a migration path for current Numeric users. To that end, RandomArray2 should be functionally identical to RandomArray. That should not, however, discourage you from writing a new and improved random number package for numarray.

...

---cut--- def seed(x=0,y=0): """seed(x, y), set the seed using the integers x, y; Set a random one from clock if y == 0 """ if type (x) != IntType or type (y) != IntType : raise ArgumentError, "seed requires integer arguments." if y == 0: import time t = time.time() ndigits = int(math.log10(t)) base = 10**(ndigits/2) x = int(t/base) y = 1 + int(t%base) ranlib.set_seeds(x,y) ---cut---

Shouldn't the second 'if' be:

if x == 0 and y == 0:

With the current implementation:

- 'seed(3)' will actually use the clock for seeding - it is impossible to specify 0's (0,0) as seed: it might be better to use None as default values?

2. With the current time.time() based default seeding, I wonder if you can call that, from a mathematical point of view, pseudo-random:

---cut--- $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
from numarray import * from RandomArray2 import * import time numarray.__version__

'0.3.5'

...
...
...
for i in range(5):

... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(1) ... print ... 1027434978.406238 (102743, 4979)

1027434979.400319 (102743, 4980)

1027434980.400316 (102743, 4981)

1027434981.40031 (102743, 4982)

1027434982.400308 (102743, 4983) ---cut---

It is incremental, and if you use default seeding within one (1) second, you get the same seed:

---cut---

...
...
...
for i in range(5):

... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(0.1) ... print ... 1027436537.066677 (102743, 6538)

1027436537.160303 (102743, 6538)

1027436537.260363 (102743, 6538)

1027436537.360299 (102743, 6538)

1027436537.460363 (102743, 6538) ---cut---

3. I wonder what the design philosophy is behind the decision to use 'mathematically suspect' seeding as default behavior.

Using time for a seed is fairly common. Since it's an implementation detail, I doubt anyone would object if you can suggest a better default seed.

...

Apart from the fact that it can hardly be called 'random', I also have the following problems with it:

- The RandomArray2 module initializes with 'seed()' itself, too. Reload()'s of RandomArray2, which might occur outside the control of the user, will thus override explicit user's seeding. Or am I seeing ghosts here?

Overriding a user's explicit seed as a result of a reload sounds correct to me. All of the module's top level statements are re-executed during a reload.

...

- When doing repeated run's of one's neural net simulations that each take less than a second, one will get identical streams of random numbers, despite seed()'ing each time. Not quite what you would expect or want.

This is easy enough to work around: don't seed or re-seed. If you then need to make multiple simulation runs, make a separate module and call your simulation like: import simulation RandomArray2.seed(something_deterministic, something_else_deterministic) for i in range(number_of_runs): simulation.main()

...

- From a purist software engineering point of view, I don't think automagical default behavior is desirable: one wants programs to be deterministic and produce reproducible behavior/output.

I don't know. I think by default, random numbers *should be* random.

...

If you use default seed()'ing now and re-run your program/model later with identical parameters, you will get different output.

When you care about this, you need to set the seed to something deterministic.

...

In Eiffel, object attributes are always initialized, and you will almost never have irreproducible runs. I found that this is a good thing for reproducing ones bugs, too ;-)

This sounds like a good design principle, but I don't see anything in RandomArray2 which is keeping you from doing this now.

...

To summarize, my recommendation would be to use None default arguments and use, when no user arguments are supplied, a hard (built-in) seed tuple, like (1,1) or whatever.

Unless there is a general outcry from the rest of the community, I think the (existing) numarray extensions (RandomArray2, LinearAlgebra2, FFT2) should try to stay functionally identical with Numeric.

...

Sometimes a paper on a random number generator suggests seeds (like 4357 for the MersenneTwister), but of course, a good random number generator should behave well independently of the initial seed/seed-tuple. I may be completely mistaken here (I'm not an expert on random number theory), but the random number generators (Ahrens, et. al) seem 'old'? After some studying, we decided to use the Mersenne Twister:

An array enabled version might make a good add-on package for numarray.

...

http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html http://www.math.keio.ac.jp/~matumoto/emt.html

PDF article:

http://www.math.keio.ac.jp/~nisimura/random/doc/mt.pdf

M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator", ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3-30 1998

There are some Python wrappers and it has good performance as well.

Bye-bye,

Eric

Bye, Todd

Eric Maryniak

9:03 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

On Tuesday 23 July 2002 20:54, Todd Miller wrote:

...

Eric Maryniak wrote:

...
... That should not, however, discourage you from writing a new and improved random number package for numarray.

Yes, thank you :-)

...

...
... 3. I wonder what the design philosophy is behind the decision to use 'mathematically suspect' seeding as default behavior.

Using time for a seed is fairly common. Since it's an implementation detail, I doubt anyone would object if you can suggest a better default seed.

Well, as said, a fixed seed, provided by the class implementation and therefore 'good', instead of a not-so-random 'random' seed. And imho it would be better not to (only) use the clock, but a /dev/random kinda thing. Personally, I find the RNG setup much more appealing: there the default is: standard_generator = CreateGenerator(-1) where seed < 0 ==> Use the default initial seed value. seed = 0 ==> Set a "random" value for the seed from the system clock. seed > 0 ==> Set seed directly (32 bits only). And indeed 'void Mixranf(int *s,u32 s48[2])' uses a built-in constant as initial seed value (actually, two).

...

...

...
If you use default seed()'ing now and re-run your program/model later with identical parameters, you will get different output.

When you care about this, you need to set the seed to something deterministic.

Naturally, but how do I know what a 'good' seed is (or indeed it's type, range, etc.)? I just would like, as e.g. RNG does, let the number generator take care of this... (or at least provide the option to)

...

...

In the programs I've seen so far, including a lot of ours ahem, usually a program (simulation) is run multiple times with the same parameters and, in our case for neural nets, seeded each time with a clock generated seed and then the different simulations are compared and checked if they are similar or sensitive to chaotic influences. But I don't think this is the proper way to do this. My point is, I guess, that the sequence of these clock-generated seeds itself is not random, because (as for RandomArray) the generated numbers are clearly not random. Better, and reproducible, would be to start the first simulation with a supplied seed, get the seed and pickle after the first run and use the pickled seed for run 2 etc. or indeed have a kind of master script (as you suggest) that manages this. That way you would start with one seed only and are not re-seeding for each run. Because if the clock-seeds are not truly random, you will a much greater change of cycles in your overall sequence of numbers. Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. VME ERROR 37022: Hierarchic name syntax invalid taking into account starting points defined by initial context.

paul＠pfdubois.com

9:14 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

RandomArray got a "special" position as part of Numeric simply by historical accident in being there first. I think in the conversion to Numarray we will be able to remove such things from the "core" and make more of a marketplace of equals for the "addons". As it is now there is some implication that somehow one is "better" than the other, which is unjustified either mathematically or in the sense of design. RNG's design is based on my experience with large codes needing many independent streams. The mathematics is from a well-tested Cray algorithm. I'm sure it could use fluffing up but a good case can be made for it.

Eric Maryniak

5:24 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

On Tuesday 23 July 2002 22:15, paul@pfdubois.com wrote:

...

RandomArray got a "special" position as part of Numeric simply by historical accident in being there first. I think in the conversion to Numarray we will be able to remove such things from the "core" and make more of a marketplace of equals for the "addons". As it is now there is some implication that somehow one is "better" than the other, which is unjustified either mathematically or in the sense of design.

RNG's design is based on my experience with large codes needing many independent streams. The mathematics is from a well-tested Cray algorithm. I'm sure it could use fluffing up but a good case can be made for it.

A famous quote from Linus is "Nice idea. Now show me the code." Perhaps a detailed example makes my problem clearer, because as it is now, RNG and RandomArray2 are not orthogonal in design, in the sense that RNG's default seed is fixed and RandomArray's is automagical (clock), not reproducible and mathematically suspect, which I think is not good for the more naive Python user. Below I will give intended usage in a provocative way, but please don't take me too seriously (I know, I don't ;-) Let's say you have a master shell script that runs a neural net paradigm (size 20x20) 10 times, each time with the same parameters, to see if it's stable or chaotic, i.e. does not 'converge' c.q. outcome depends on initial values (it should not be chaotic, but this should always be checked). run10.sh tracelink.py 20 20 inputpat.dat > hippocamp01.out ... 8 more ... tracelink.py 20 20 inputpat.dat > hippocamp10.out tracelink.py ... import numarray, RandomArray2 _or_ RNG ... # Case 1: RandomArray2 # User uses default clock seed, which is the same # during 1 second (see my previous posting). # ignlgi(void)'s seeds 1234567890L,123456789L # are _not_ used (see com.c). RandomArray2.seed() # But if omitted, RandomArray2.py does it, too. ... calculations ... other program outcome _only_ if program runs > 1 second, ... otherwise the others will have the same result. # Case 2: RNG # A 'standard_generator = CreateGenerator(-1)' is automatically done. # seed < 0 ==> Use the default initial seed value. # seed = 0 ==> Set a "random" value for the seed from system clock. # seed > 0 ==> Set seed directly (32 bits only). # Thus, the fixed seeds used are 0,0 (see Mixranf() in ranf.c). ... calculations ... all 10 programs have the same outcome when using ranf(), ... because it always starts the same seed, the sequence is always: ... 0.58011364857958725, 0.95051273498076583, 0.78637142533060356 etc. The problem with RandomArray's seed is, that it is not truly random itself. In it's current (time.time based) implementation it is linearly auto incrementing every second, and therefore suffers from auto-correlation. Moreover, in the above example, if 10 separate .py runs complete in 1 second they'll all have the same seed (and outcome). This is not what the user, if accustomed to clock seeding, would expect. But if the seed is different each time, a problem is that runs are not reproducible. Let's say that run hippocamp06.out produced some strange output: now unless the user saved the seed (with get_seed), it can never be reproduced. Therefore, I think RNG's design is better and should be applied to RandomArray2, too, because RandomArray2's seeding is flawed anyways. A user should be aware of proper seeding, agreed, and now will be: when doing multiple identical runs, the same (and thus reproducible) output will result and so the user is made aware of the fact that, as an example, he or she should seed or pickle it between runs. So my suggestion would be to re-implement RandomArray2.seed(x=0,y=0) as follows: if either the x or y seed: seed < 0 ==> Use the default initial seed value. seed = None ==> Set a "random" value for the seed from the system clock. seeds >= 0 ==> Set seed directly (32 bits only). and en-passant do a better job than clock-based seeding: ---cut--- def seed(x=None,y=None): """seed(x, y), set the seed using the integers x, y; ... """ if (x != None and type (x) != IntType) or (y != None and type (y) != IntType) : raise ArgumentError, "seed requires integer arguments (or None)." if x == None or y == None: import dev_random_device # uses /dev/random or equivalent x = dev_random_device.nextvalue() # egd.sf.net is a user space y = dev_random_device.nextvalue() # alternative elif x < 0 or y < 0: x = 1234567890L y = 123456789L ranlib.set_seeds(x,y) ---cut--- But: I realize that this is different behavior from Python's standard random and whrandom, where no arg or None uses the clock. But, if that behavior is kept for RandomArray2 (and RNG should then be adapted, too) then I'd urge at least to use a better initial seed. In certain applications, e.g. generating session id's in crypto programs, non-predictability of initial seeds is crucial. But if you have a look at GPG's or OpenSSL's source for a PRNG (especially for Windows), it looks like an art in itself. So perhaps RNG's 'clock code' should replace RandomArray2's: it uses microseconds (in gettimeofday), too, and thus will not have the 1-second problem. Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Just because you're not paranoid, that doesn't mean that they're not after you.

Chris Barker

6:01 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

Just to add my $.02: I disagree with Eric about what the default behaviour should be. Every programming language/environment I have ever used uses some kind of "random" seed by default. When I want reproducible results (which I often do for testing) I can specify a seed. I find the the most useful behaviour. As Eric points out, it is not trivial to generate a "random" seed (from the time, or whatever), so it doesn't make sense to burdon the nieve user with this chore. Therefore, I strongly support keeping the default behaviour of a "random" seed. Eric Maryniak wrote:

...

then I'd urge at least to use a better initial seed. In certain applications, e.g. generating session id's in crypto programs, non-predictability of initial seeds is crucial. But if you have a look at GPG's or OpenSSL's source for a PRNG (especially for Windows), it looks like an art in itself. So perhaps RNG's 'clock code' should replace RandomArray2's: it uses microseconds (in gettimeofday), too, and thus will not have the 1-second problem.

This I agree with: a better default initial seed would be great. As someone said, "show me the code!". I don't imagine anyone would object to improving this. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Eric Maryniak

6:29 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

On Wednesday 24 July 2002 17:59, Chris Barker wrote:

...

Just to add my $.02:

I disagree with Eric about what the default behaviour should be. Every programming language/environment I have ever used uses some kind of "random" seed by default. When I want reproducible results (which I often do for testing) I can specify a seed. I find the the most useful behaviour. As Eric points out, it is not trivial to generate a "random" seed (from the time, or whatever), so it doesn't make sense to burdon the nieve user with this chore.

Therefore, I strongly support keeping the default behaviour of a "random" seed.

In that case, and if that is the general consensus, RNG should be adapted: it now uses a fixed seed by default (and not a clock generated one).

...

Eric Maryniak wrote:

...
then I'd urge at least to use a better initial seed. In certain applications, e.g. generating session id's in crypto programs, non-predictability of initial seeds is crucial. But if you have a look at GPG's or OpenSSL's source for a PRNG (especially for Windows), it looks like an art in itself. So perhaps RNG's 'clock code' should replace RandomArray2's: it uses microseconds (in gettimeofday), too, and thus will not have the 1-second problem.

This I agree with: a better default initial seed would be great. As someone said, "show me the code!". I don't imagine anyone would object to improving this.

The source is in Mixranf(), file Numerical/Packages/RNG/Src/ranf.c (when checked out with CVS), but it may be a good idea to check it with Python's own random/whrandom code (which I don't have at hand -- it may be more recent and/or portable for other OSes). By the way, I realized in my code 'fix' for RandomArray2.seed(x=None,y=None) that I already anticipated this and that the default behavior is _not_ to use a fixed seed ;-) : if either the x or y seed: seed < 0 ==> Use the default initial seed value. seed = None ==> Set a "random" value for the seed from clock (default) seeds >= 0 ==> Set seed directly (32 bits only). and en-passant do a better job than clock-based seeding: ---cut--- def seed(x=None,y=None): """seed(x, y), set the seed using the integers x, y; ... """ if (x != None and type (x) != IntType) or (y != None and type (y) != IntType) : raise ArgumentError, "seed requires integer arguments (or None)." if x == None or y == None: # This would be the best, but is problematic under Windows/Mac. import dev_random_device # uses /dev/random or equivalent x = dev_random_device.nextvalue() # egd.sf.net is a user space y = dev_random_device.nextvalue() # alternative # So best is to use Mixranf() from RNG/Src/ranf.c here. elif x < 0 or y < 0: x = 1234567890L y = 123456789L ranlib.set_seeds(x,y) ---cut--- Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Unix was a trademark of AT&T. AT&T is a modem test command.

peter.chang＠nottingham.ac.uk

7:08 a.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

Just to stick my oar in: I think Eric's preference is predicated by the lousiness (or otherwise?) of RandomArray's seeding mechanism. The random sequences generated by incremental seeds should, by design, be uncorrelated thus allowing the use of the system clock as a seed source. If you're running lots of simulations (as I do with Monte Carlos, though not in numpy) using PRNGs, the last thing you want is the task to find a (pseudo) random source of seeds. Using /dev/random is not particularly portable; the system clock is much easier to obtain and is fine as long as your iteration cycle is longer than its resolution. Peter

Paul F Dubois

7:09 p.m.

New subject: Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

I'm not going to change the default seed on RNG. Existing users have the right to stability, and not to have things change because someone thinks a certain choice among several reasonable ones is better than the one previously made. There is the further issue here of RNG being advertised as similar to Cray's ranf() and that similarity extends to this default. Not to mention that for many purposes the current default is quite useful.

Eric Maryniak

2:02 a.m.

New subject: Numarray: Summary (seeding): personal code and manual suggestions on initial seeding in module RNG and RandomArray(2)

Dear crunchers, Please see my personal thoughts on the past discussion about initial seeds some paragraphs down below, where I'd like to list concrete code and manual enhancements aimed at providing users with a clear understanding of it's usage (and pitfalls e.g. w/r to cryptographic applications)... ==> Suggestions for code and manual changes w/r to initial seeding (down below) But first a response to Paul's earlier message: On Thursday 25 July 2002 08:08, Paul F Dubois wrote:

...

I'm not going to change the default seed on RNG. Existing users have the right to stability, and not to have things change because someone thinks a certain choice among several reasonable ones is better than the one previously made.

Well, I wasn't aware of the fact that things were completely set in stone for Numarray solely for backward compatibilty. It was my impression that numarray and it's accompanying xx2 packages were also open for redesign. I agree stability is important, but numarray already breaks with Numeric in other aspects so why should RNG (RNG2 in numarray?) or other packages not be? It's more a matter of well documenting changes I think. Users switching to numarray will already have to take into account some changes and verify their code. It's not that I "think a certain choice among several reasonable ones is better" [although my favorite is still a fixed seed, as in RNG, for reasons of reproducibility in later re-runs of Monte Carlo's that are not possible now, because the naive user, using a clock seed, may not have saved the initial seed with get_seed], but that the different packages, i.c. RNG (RNG2 to be?) and RandomArray2, should be orthogonal in this respect. I.e. the same, so 'default always an automagical (clock whatever) random initial seed _or_ a fixed one'. Orthogonality is a very common and accepted design principle in computing science and for good reasons (usability). Users changing from one PRNG to another (and using the default seed) would otherwise be unwelcomely surprised by a sudden change in behavior of their program. I try to give logical arguments and real code examples in this discussion and fail to see in Paul's reaction where I'm wrong. By the way: in Python 2.1 alpha 2 seeding changed, too: """ - random.py's seed() function is new. For bit-for-bit compatibility with prior releases, use the whseed function instead. The new seed function addresses two problems: (1) The old function couldn't produce more than about 2**24 distinct internal states; the new one about 2**45 (the best that can be done in the Wichmann-Hill generator). (2) The old function sometimes produced identical internal states when passed distinct integers, and there was no simple way to predict when that would happen; the new one guarantees to produce distinct internal states for all arguments in [0, 27814431486576L). """

...

There is the further issue here of RNG being advertised as similar to Cray's ranf() and that similarity extends to this default. Not to mention that for many purposes the current default is quite useful.

Perhaps I'm mistaken here, but RNG/Lib/__init__.py does (-1 -> uses fixed internal seed): standard_generator = CreateGenerator(-1) and: def ranf(): "ranf() = a random number from the standard generator." return standard_generator.ranf() And indeed Mixranf in RNG/Src/ranf.c does set them to 0: ... if(*s < 0){ /* Set default initial value */ s48[0] = s48[1] = 0; Setranf(s48); Getranf(s48); And this code, or I'm missing the point, uses a standard generator from RNG, which demonstrates the same sequence of initial seeds in re-runs (note that it does not suffer from the "1-second problem" as RandomArray2 does, see the Appendix below for a demonstration of that, because RNG uses milliseconds). Note that 'ranf()' is listed in chapter 18 in Module RNG as one of the 'Generator objects': $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ... >>> from numarray import * >>> from RNG import * >>> for i in range(3): ... standard_generator.ranf() ... 0.58011364857958725 0.95051273498076583 0.78637142533060356 >>> $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ... >>> from numarray import * >>> from RNG import * >>> for i in range(3): ... standard_generator.ranf() ... 0.58011364857958725 0.95051273498076583 0.78637142533060356 >>> Ok, now then my own (and possibly biased) personal summary of the past discussions and concrete code and manual recommendations: ==> Suggestions for code and manual changes w/r to initial seeding Conclusions: 1. Default initial seeding should be random (and not fixed). This is the general consensus and while it may not win the beauty contest in purist software engineering circles, it also is the default behavior in Python's own Random/WHRandom modules. URL: http://web.pydoc.org/2.2/random.html => Recommendations: - Like Python's random/whrandom module, default arguments to seed() should not be 0, but None, and this triggers the default behavior which is to use a random initial seed (ideally 'truly' random from e.g. /dev/random or otherwise clock or whatever based), because: o better usability: users changing from Python's own random to numarray's random facilities will find familiar seed() usage semantics o often 0 itself can be a legal seed (although the MersenneTwister does not recommend it) - Like RNG provide support for using a built-in fixed seed by supplying negative seeds to seed(), rationale: o support for reproducible re-runs of Monte Carlo's without having to specify ones own initial seed o usability: naive users may not know a 'good' seed is, like: can it be 0 or must it be >0, what is the maximum, etc. - See my suggested code fix for RandomArray2.seed() in the Appendix below. - Likewise, in RNG: o CreateGenerator (s, ...) should be changed to CreateGenerator (s=None) Also note Python's own: def create_generators(num, delta, firstseed=None) from random (random.py), url: http://web.pydoc.org/2.2/random.html o RNG's code should be changed from testing on 0 to testing on None first (which results in using the clock), then on < 0 (use built-in seed), and then using the user provided seed (which is thus >= 0, and hence can also be 0) o 'standard_generator = CreateGenerator(-1)' should be changed to 'standard_generator = CreateGenerator() and results in using the clock - Put some explicit warnings in the numarray manual, that the seeding of numarray's packages should _not_ be used in those parts of software where unpredictability of seeds is important, such as for example, cryptographical software for creating session keys, TCP sequence numbers etc. Attacks on crypto software usually center around these issues. Ideally, a /dev/random should be used, but with the current system clock based implementation, the seeds are not random, because the clock does not have deci-nanosecond precision (10**10 ~= 2**32) yet ;-) Appendix -------- ** 1. "1-second problem" with RandomArray2: $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ...

...

...
...
from numarray import * from RandomArray2 import * import time import sys sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' numarray.__version__ '0.3.5' for i in range(3): ... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(1) ... print ... 1027591910.9043469 (102759, 1911)

1027591911.901091 (102759, 1912) 1027591912.901088 (102759, 1913)

...

...
...
for i in range(3): ... time.time() ... RandomArray2.seed() ... RandomArray2.get_seed() ... time.sleep(0.3) ... print ... 1027591966.260392 (102759, 1967)

1027591966.5510809 (102759, 1967) 1027591966.851079 (102759, 1967) Note that Python (at least 2.2.1) own random() suffers much less from this (on my 450 MHz machine, every 10-th millisecond or so the seed will be different): $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ...

...

...
...
from random import * import time

for i in range(3): ... print long(time.time() * 256) ... 263065231349 263065231349 263065231349 for i in range(3): ... print long(time.time() * 256) ... time.sleep(.00001) ... 263065240314 263065240315 263065240317

By the way, Python's own random.seed() also suffers from this, but on a 10th-millisecond level (on my 450 Mhz i586 at least). For the implementation of seed() see Lib/random.py, basically a 'long(time.time()' is used: $ python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) ...

...

...
...
from random import * import time for i in range(3): ... print long(time.time() * 256) ... 263065231349 263065231349 263065231349 for i in range(3): ... print long(time.time() * 256) ... time.sleep(.00001) ... 263065240314 263065240315 263065240317

2. Proposed re-implementation of RandomArray2.seed(): def seed(x=None,y=None): """seed(x, y), set the seed using the integers x, y: x or y is None (or not specified): A random seed is used which in the current implementation may be based on the system's clock. Warning: do not this seed in software where the initial seed may not be predictable, such as for example, in cryptographical software for creating session keys. x < 0 or y < 0: Use the module's fixed built-in seed which is the tuple (1234567890L, 123456789L) (or whatever) x >= 0 and y >= 0 Use the seeds specified by the user. (Note: some random number generators do not recommend using 0) Note: based on Python 2.2.1's random.seed(a=None). ADAPTED for _2_ seeds as required by ranlib.set_seeds(x,y) """ if (x != None and type (x) != IntType) or (y != None and type (y) != IntType) : raise ArgumentError, "seed requires integer arguments (or None)." if x == None or y == None: try: # This would be the best, but is problematic under Windows/Mac. # To my knowledge there isn't a portable lib_randdevice yet. # As GPG, OpenSSH and OpenSSL's code show, getting entropy # under Windows is problematic. # However, Python 2.2.1's socketmodule does wrap the ssl code. import dev_random_device # uses /dev/random or equivalent x = dev_random_device.nextvalue() # egd.sf.net is a user space y = dev_random_device.nextvalue() # alternative except: # Use Mixranf() from RNG/Src/ranf.c here or, perhaps better, # use Python 2.2.1's code? At least it looks simpler and does not # have the platform dependency's and has possibly met wider testing # (and why not re-use code? ;-) # For Python 2.2.1's random.seed(a=None), see url: # http://web.pydoc.org/2.2/random.html # and file Lib/random.py. # Do note, however, that on my 450 Mhz machine, the statement # 'long(time.time() * 256)' will generate the same values # within a tenth of a millisecond (see Appendix #1 for a code # example). This can be fixed by doing a time.sleep(0.001). # See my #EM# comment. # Naturally this code needs to be adapted for ranlib's # generator, because this code uses the Wichmann-Hill generator. ---cut: Wichmann-Hill--- def seed(self, a=None): """Initialize internal state from hashable object. None or no argument seeds from current time. If a is not None or an int or long, hash(a) is used instead. If a is an int or long, a is used directly. Distinct values between 0 and 27814431486575L inclusive are guaranteed to yield distinct internal states (this guarantee is specific to the default Wichmann-Hill generator). """ if a is None: # Initialize from current time import time a = long(time.time() * 256) #EM# Guarantee unique a's between subsequent call's of seed() #EM# by sleeping one millisecond. This should not be harmful, #EM# because ordinarily, seed() will only be called once or so #EM# in a program. time.sleep(0.001) if type(a) not in (type(3), type(3L)): a = hash(a) a, x = divmod(a, 30268) a, y = divmod(a, 30306) a, z = divmod(a, 30322) self._seed = int(x)+1, int(y)+1, int(z)+1 ---cut: Wichmann-Hill--- elif x < 0 or y < 0: x = 1234567890L # or any other suitable 0 - 2**32-1 y = 123456789L ranlib.set_seeds(x,y) 3. Mersenne Twister, another PRNG: Bye-bye, Eric -- Eric Maryniak <e.maryniak@pobox.com> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. In a grocery store, the Real Programmer is the one who insists on running the cans past the laser checkout scanner himself, because he never could trust keypunch operators to get it right the first time.

8241

Age (days ago)

8270

Last active (days ago)

List overview

Download

15 comments

7 participants

participants (7)

Chris Barker
Eric Maryniak
Paul F Dubois
paul＠pfdubois.com
Perry Greenfield
peter.chang＠nottingham.ac.uk
Todd Miller

Numarray: minor feature requests (setup.py and version info)

Eric Maryniak

Perry Greenfield

Eric Maryniak

Paul F Dubois

Todd Miller

Eric Maryniak

Paul F Dubois

Todd Miller

Eric Maryniak

paul＠pfdubois.com

Eric Maryniak

Chris Barker

Eric Maryniak

peter.chang＠nottingham.ac.uk

Paul F Dubois

Eric Maryniak

tags

participants (7)