
On Sun, Nov 27, 2022 at 11:36 AM Yoni Lavi <yoni.lavi.p@gmail.com> wrote:
I wrote a doc stating my case here:
https://docs.google.com/document/d/1et5x5HckTJhUQsz2lcC1avQrgDufXFnHMin7GlI5...
Briefly,
1. The main motivation for it is to allow users to get a predictable result on a given input (for programs that are doing pure compute, in domains like operations research / compilation), any time they run their program. Having stable repro is important for debugging. Notebooks with statistical analysis are another similar case where this is needed: you might want other people to run your notebook and get the same result you did.
But the hash of an object is not guaranteed to be stable by the language, so I would argue someone expecting that is expected to convert random-access data structures to ones that are consistent when necessary (e.g. sorted lists).
2. The reason the hash non-determinism of None matters in practice is that it can infect commonly used mapping key types, such as frozen dataclasses containing `Optional[int]` fields.
I don't see why the hashing within a dict needs to be consistent as that's not a guarantee we make with Python.
3. Non-determinism emerging from other value types like `str` can be disabled by the user using `PYTHONHASHSEED`, but there's no such protection against `None`.
If I remember correctly, PYTHONHASHSEED was added to help folks migrate when we added randomness to hashing as they had accidentally come to expect a consistent iteration order on dictionary keys. I wouldn't take its existence to suggest that PYTHONHASHSEED is meant to make **all** hashing consistent (e.g. people who implement their own __hash__ don't have to follow that expectation).
All it takes is for your program to compute a set somewhere with affected keys, and iterate on it - and determinism is lost.
That's actually by design. Sets are not meant to be deterministic conceptually as they are essentially a bag of stuff. If you want deterministic ordering you should convert it to a list and sort the list.
The need to modify None itself is caused by two factors - `Optional` being implemented effectively as `T | None` in Python as a strongly established practice - The fact that `__hash__` is an intrinsic property of a type in Python, the hashing function cannot be externally supplied to its builtin container types. So we have to modify the type None itself, rather than write some alternative hasher that we could use if we care about deterministic behavior across runs.
This was debated at length over the forum and in discord. I also posted a PR for it, and it was closed, see:
https://github.com/python/cpython/issues/99540 https://github.com/python/cpython/pull/99541
Asking for opinions, and to re-open the PR, provided there is enough support for such a change to take place.
I personally agree with the arguments made in the issue, so I'm afraid I don't' support making the change as we worked hard to stop people from relying on consistent hashing/iteration from random-access data structures like dict and set. -Brett
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KUH4HZYK... Code of Conduct: http://python.org/psf/codeofconduct/