[Tutor] PYTHONHASHSEED, -R

Albert-Jan Roskam fomcl at yahoo.com
Mon Jul 29 11:29:41 CEST 2013



----- Original Message -----

> From: eryksun <eryksun at gmail.com>
> To: Albert-Jan Roskam <fomcl at yahoo.com>
> Cc: Python Mailing List <tutor at python.org>
> Sent: Saturday, July 27, 2013 11:49 PM
> Subject: Re: [Tutor] PYTHONHASHSEED, -R
> 
> On Sat, Jul 27, 2013 at 3:19 PM, Albert-Jan Roskam <fomcl at yahoo.com> 
> wrote:
>>  In the little script below, are k and v guaranteed to be equal?
>> 
>>  d = {i : i for i in range(50)}
>>  k, v  = d.keys(), d.values()
>>  assert k == v, "keys and values not the same"
> 
> Yes, provided you compare list(k) and list(v), since in 3.x the views
> aren't equal.

Hi Eryksun,

Thank you for replying. Good to know these two things. The following question is almost new-thread-worthy, but: if I would like to make my app work for 2.x and 3.x, what is the best approach:
(a) use "if sys.version_info.major...." throughout the code
(b) use 2to3, hope for the best and fix manually whatever can't be fixed

In any case, it will be good to use code that works in 2.x and 3.x (e.g., from __future__ import print_statement, ..print(..)). Approach (a) will result in one code base (a big plus) but code will be slightly slower (which may be relevant in e.g. busy loops. Approach (b) is not attractive wrt maintenance. If you find a bug, you have to fix it twice, you may need two sets of tests (en run each set every time you commit code).
 
> The dict isn't changing state, 

So that's the criterion! Thanks! So as long as you don't use __setitem__ and __delitem__ (maybe also __setattribute__, __delattribute__, ...) the state does not change.

and the table traversal in each case is
> a single pass, looking for non-NULL values. I'm no language lawyer,
> but it seems the wording in the docs under the description of views
> guarantees that this is true across Python implementations:
> 
> http://docs.python.org/2.7/library/stdtypes.html#dictionary-view-objects
> 
>     Keys and values are iterated over in an arbitrary order which is
>     non-random, 

That sounds like a contradictio in terminis to me. How can something be non-random and arbitrary at the same time?

varies across Python implementations, and depends
>     on the dictionary’s history of insertions and deletions.
> 
>     If keys, values and items views are iterated over with no intervening
>     modifications to the dictionary, the order of items will directly
>     correspond. This allows the creation of (value, key) pairs using
>     zip(): pairs = zip(d.values(), d.keys()).

This is indeed *exactly* what my question was about.

> Threading and asynchronous signal handlers may be an issue. A CPython
> thread defaults to holding the global interpreter lock for 100 virtual
> ops (see sys.setcheckinterval). So it might release the GIL in between
> calling list(keys()) and list(values()). In that case use your own
> lock. Or better yet, avoid using shared state if possible.
> 
>>  I tried:
>>  python -R rnd.py
>> 
>>  (-R means Turn on hash randomization
> 
> The way the objects hash into the table isn't relevant here.
> 
> Also, hash randomization (_Py_HashSecret) isn't used by numbers. It's
> used by strings, and anything that hashes a string. For example,
> datetime objects use the hash of the string representation from
> __reduce__:
> 
>     >>> d1 = date(2013,7,28)
>     >>> hash(d1) == hash(d1.__reduce__()[1])
>     True
> 
>     >>> d2 = datetime(2013,7,28,13,30)
>     >>> hash(d2) == hash(d2.__reduce__()[1][0])
>     True
> 
> date's hash is using the full tuple at index 1, while datetime's hash
> only uses the string data.
> 
> CPython 3.3 defaults to enabling hash randomization. Set the
> environment variable PYTHONHASHSEED=0 to disable it.

So in addition to my "2to3" question from above, it might be a good idea to already set PYTHONHASHSEED so Python 2.x behaves like Python 3.x in this respect, right? Given that the environment variables are already loaded once Python has started, what would be the approach to test this? Call os.putenv("PYTHONHASHSEED", 1), and then run the tests in a subprocess (that would know about the changed environment variables)?

Thanks again,

Best wishes,
Albert-Jan


More information about the Tutor mailing list