[Tutor] PYTHONHASHSEED, -R
eryksun
eryksun at gmail.com
Sat Jul 27 23:49:11 CEST 2013
On Sat, Jul 27, 2013 at 3:19 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> In the little script below, are k and v guaranteed to be equal?
>
> d = {i : i for i in range(50)}
> k, v = d.keys(), d.values()
> assert k == v, "keys and values not the same"
Yes, provided you compare list(k) and list(v), since in 3.x the views
aren't equal.
The dict isn't changing state, and the table traversal in each case is
a single pass, looking for non-NULL values. I'm no language lawyer,
but it seems the wording in the docs under the description of views
guarantees that this is true across Python implementations:
http://docs.python.org/2.7/library/stdtypes.html#dictionary-view-objects
Keys and values are iterated over in an arbitrary order which is
non-random, varies across Python implementations, and depends
on the dictionary’s history of insertions and deletions.
If keys, values and items views are iterated over with no intervening
modifications to the dictionary, the order of items will directly
correspond. This allows the creation of (value, key) pairs using
zip(): pairs = zip(d.values(), d.keys()).
Threading and asynchronous signal handlers may be an issue. A CPython
thread defaults to holding the global interpreter lock for 100 virtual
ops (see sys.setcheckinterval). So it might release the GIL in between
calling list(keys()) and list(values()). In that case use your own
lock. Or better yet, avoid using shared state if possible.
> I tried:
> python -R rnd.py
>
> (-R means Turn on hash randomization
The way the objects hash into the table isn't relevant here.
Also, hash randomization (_Py_HashSecret) isn't used by numbers. It's
used by strings, and anything that hashes a string. For example,
datetime objects use the hash of the string representation from
__reduce__:
>>> d1 = date(2013,7,28)
>>> hash(d1) == hash(d1.__reduce__()[1])
True
>>> d2 = datetime(2013,7,28,13,30)
>>> hash(d2) == hash(d2.__reduce__()[1][0])
True
date's hash is using the full tuple at index 1, while datetime's hash
only uses the string data.
CPython 3.3 defaults to enabling hash randomization. Set the
environment variable PYTHONHASHSEED=0 to disable it.
More information about the Tutor
mailing list