[CentralOH] random.seed

Mon Mar 2 18:13:49 CET 2009

On Mon, 2009-03-02 at 09:55 -0500, Mark Erbaugh wrote:
> I've written a Python program that generates tests and answer keys from
> a pool of questions.  It uses random.seed with a user entered value so
> that the user can regenerate the same test sequence if needed.
> 
> This morning, I discovered that the same seed doesn't produce the same
> random values on 32-bit and 64-bit systems.

This doesn't seem to be an issue for the systems I'm on:

On a 32 bit Linux system:
        $ lshw | egrep 'width'
             *-cpu
                  width: 32 bits
        $ python
        Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
        [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on
        linux2
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import random
        >>> foo, bar = random.Random(), random.Random()
        >>> foo.seed(5)
        >>> [foo.randint(0,1000) for x in range(20)]
        [623, 742, 795, 943, 740, 923, 29, 466, 944, 649, 901, 113, 469,
        246, 544, 574, 13, 216, 279, 917]
        >>> bar.seed(2<<32 + 12345)
        >>> [bar.randint(0,1000) for x in range(20)]
        [337, 126, 741, 299, 335, 418, 786, 851, 810, 611, 987, 352,
        535, 427, 857, 601, 47, 553, 390, 294]

On a 64 bit Linux system:
        $ lshw | egrep 'cpu|width'
             *-cpu:0
                  width: 64 bits
        $  python
        Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22)
        [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
        Type "help", "copyright", "credits" or "license" for more
        information.
        >>> import random
        >>> foo, bar = random.Random(), random.Random()
        >>> foo.seed(5)
        >>> [foo.randint(0,1000) for x in range(20)]
        [623, 742, 795, 943, 740, 923, 29, 466, 944, 649, 901, 113, 469,
        246, 544, 574, 13, 216, 279, 917]
        >>> bar.seed(2<<32 + 12345)
        >>> [bar.randint(0,1000) for x in range(20)]
        [337, 126, 741, 299, 335, 418, 786, 851, 810, 611, 987, 352,
        535, 427, 857, 601, 47, 553, 390, 294]

> I think the problem is that hash(<str>) generates different values on
> the 32 and 64 bit systems. I found that by using 
> 
> random.seed(hash(<str>) & 0xffffffff)

This is where you problem is. hash() is not guaranteed to be consistent
across hosts. I don't think it's even guaranteed to be consistent across
invocations on the same host. The fact that it is stable across hosts on
the same platform is an implementation artifact which shouldn't be
depended upon. You can check out
http://docs.python.org/reference/datamodel.html#object.__hash__ for more
info on why hash() returns different values on 32bit system and 64 bit
systems. I would highly suggest using one the hash functions in the
hashlib standard module, which are based on defined standards which are
guaranteed to yield a stable value across invocations, hosts, platforms,
python implementations. Alternatively, if you demand high performance
but not an extremely large set of possible seed values, you could look
into the binascii module which have several routines to map a string
into integer in a consistent manner (e.g. binascii.crc32). 

> I can get the same results.  Is there a better way? 

Yes. Use hashlib.

> Is my way "safe" and repeatable?

Nope. You have a workaround for an implementation artifact; however,
Jython, IronPython, or PyPy may choose to calculate hash() values of
strings differently than CPython. Even differing versions of CPython on
the same platform may introduce incompatible hash() values as well.

  -- William