[Tutor] PYTHONHASHSEED, -R

Mon Jul 29 19:44:55 CEST 2013

> question is almost new-thread-worthy, but: if I would like to make
> my app work for 2.x and 3.x, what is the best approach:
> (a) use "if sys.version_info.major...." throughout the code
> (b) use 2to3, hope for the best and fix manually whatever can't be fixed

(c) Use "six".

BTW, sys.version_info.major doesn't work in <2.7. The names were added
in 2.7/3.1.

>> The dict isn't changing state,
>
> So that's the criterion! Thanks! So as long as you don't use
> __setitem__ and __delitem__ (maybe also __setattribute__,
> __delattribute__, ...) the state does not change.

It's __getattribute__, __getattr__, __setattr__ and __delattr__. I
guess that's relevant if we're talking about a dict that's functioning
as a namespace, but that's a bit meta. Setting attributes on the dict
itself (I guess it's a subclass we're talking about; normal dict
instances don't have a dict) wouldn't affect the hash table it uses
for the contained items.

BTW, these slot wrappers generally aren't called in CPython, not
unless you're overriding the built-in slot function. They're a hook
back into the C API. They bind as a method-wrapper that has a function
pointer to a C wrapper function that calls the C slot function.

If you override __setattr__, your type's tp_setattro slot is set to
slot_tp_setattro, which gets called by PyObject_SetAttr. If the value
being 'set' is NULL, this function looks up "__delattr__" in your
type. Since you didn't override this, it finds and binds the
__delattr__ slot wrapper from your base class(es).

This is also comes up with rich comparison functions, for which all 6
comparisons are handled by the single slot function, tp_richcompare.
So if you subclass dict and override __lt__, then slot_tp_richcompare
finds the slot wrapper for the other 5 comparisons by searching dict's
dict:

    >>> type(vars(dict)['__gt__'])
    <type 'wrapper_descriptor'>

And binds it to the instance as a method-wrapper:

    >>> type(vars(dict)['__gt__'].__get__({}))
    <type 'method-wrapper'>

After jumping through several hoops it ends up at dict_richcompare.
For a regular dict, PyObject_RichCompare simply jumps straight to
dict_richcompare. It doesn't use the slot wrapper.

>>     Keys and values are iterated over in an arbitrary order which is
>>     non-random,
>
> That sounds like a contradictio in terminis to me. How can something
> be non-random and arbitrary at the same time?

It's just worded generally to be valid for all implementations. They
could have gone into the specifics of the open-addressing hash table
used by CPython's dict type, but it would have to be highlighted as an
implementation detail. Anyway, the table has a history (collisions
with other keys, dummy keys) and size that affects the insertion
order. It isn't ontologically random; it's contingent. But that's
getting too philosophical I think.

>> CPython 3.3 defaults to enabling hash randomization. Set the
>> environment variable PYTHONHASHSEED=0 to disable it.
>
> So in addition to my "2to3" question from above, it might be a good
> idea to already set PYTHONHASHSEED so Python 2.x behaves like
> Python 3.x in this respect, right? Given that the environment
> variables are already loaded once Python has started, what would be
> the approach to test this? Call os.putenv("PYTHONHASHSEED", 1), and
> then run the tests in a subprocess (that would know about the
> changed environment variables)?

Sorry, I don't see the point of this. PYTHONHASHSEED is to be set by a
system administrator as a security measure. I think CPython is the
only implementation that has this feature.

With regard to tests that depend on the PYTHON* environment variables,
Python's own tests use subprocess.Popen to run sys.executable with a
modified "env" environment (e.g. checking the effect of
PYTHONIOENCODING).