[Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

Robert Kern robert.kern at gmail.com
Fri Feb 13 18:14:08 EST 2009


On Fri, Feb 13, 2009 at 10:17, Michael S. Gilbert
<michael.s.gilbert at gmail.com> wrote:
> On Fri, 13 Feb 2009 17:04:48 +0100 Sturla Molden wrote:
>> So you have a simulation written in *Python*, and the major bottleneck
>> is the MT prng? Forgive me for not believing it.
>
> Yes, running a lot of monte carlo simulations back-to-back.  if the
> PRNG were twice as fast, my code would be twice as fast.  It isn't that
> unbelievable...

It is somewhat. If you do even fair trivial code above getting the raw
bytes, the amount of time spent actually in the PRNG is only moderate.
Making the PRNG twice as fast only speeds up that moderate part, not
the entire program. Unless if you are *only* running the PRNG and
doing nothing else, then yes, speeding up the PRNG by a factor of 2
will speed up your program by that amount.

For example, here is a script that will either get random bytes,
standard Gaussian numbers, or random long ints. Using Instruments.app
on my Mac, I can find the amount of time actually spent in the PRNG as
opposed to the rest of Python or mtrand. For the bytes case, this is
around 96% of the time (although it will drop down to 80% or so with
1kb blocks). For the Gaussian case, this is 75% of the time. For the
long ints, this is actually 54% (I'm not sure why this is the slowest,
but a lot of time is being wasted in the method; worth looking into).
In the latter case, a double-speed PRNG will only speed up your
program by 25%. If you are doing actual computations with those random
numbers, these factors will only get worse.


import os

from numpy import random

print os.getpid()

prng = random.RandomState(1234567890)
while True:
    #x = prng.bytes(1024*1024*16)
    #x = prng.standard_normal(1024*1024*4)
    x = prng.tomaxint(1024*1024*4)


> Honestly, I don't feel like arguing about this anymore.  Its a matter
> of "show me the code," and when I have the time, I will "show you the
> code."

Before spending too much time on this, I highly recommend profiling
your code to see what is actually consuming the most time. Start with
the Python profiling tools:

  http://docs.python.org/library/profile

If it appears that the methods in numpy.random are actually a
significant bottleneck, you may need to break out a C profiler, too,
to determine how much time is actually being spent in the PRNG itself
as opposed to the non-uniform distributions and the Pyrex wrappers.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list