Re: Python-Dev Digest, Vol 232, Issue 10
I can't see any, but then I couldn't see the security consequences of predictable string hashes until they were pointed out to me. So it would be really good to have some security experts comment on whether this is safe or not.
I can't either. I can point out that the complexity attack via hash collisions is not possible on None specifically because it is a singleton, so: 1. There may only be one None in a dict at most, 2. For composite types that depend on None and other things, like say a tuple of string and an optional int, they become as resistant to attack as the same types without None, in this case string and int. 3. Python only explicitly added this security (hash randomization) feature to string and bytes hashes, and if your composite key type depends on those types, it will still be protected regardless of what None does
Because this entire discussion is motivated by the OP who wants consistent set order across multiple runs of his Python application. That's what he needs; having None hash to a constant value is just a means to that end.
Not entirely. I do explain in my doc why there is a foundational reason why, once the choice was made to have None represent the `None` of Optional[T], it entered the family of types that you would expect to have deterministic behavior. And as the hashing function is intrinsic to types in Python, it is included in that.
Even if we grant None a constant hash, that still does not guarantee consistent set order across runs. At best, we might get such consistent order as an undocumented and changeable implementation detail, until we change the implementation of hashing, or of sets, or of something seemingly unrelated like address randomisation.
ASLR will not cause any trouble so long as I keep objects with identity based hashing out of my sets. Or at least, any sets I later iterate on and run imperative code with side-effects under the loop. The possibility of disabling ASLR is not a reason to dismiss this change. No one guarantees a user of Python is in a position to make infosec related decisions on the computer system they are working on. Regarding the possibilities of hashes changing for the worse (in this regard) - sure. Anything is possible. Regarding sets - the set implementation is deterministic, and always has been (for decades). Yes, in theory it is possible that a new version or implementation of Python will make sets non-deterministic - shuffle themselves in the background independently from their history of operations, or what have you, and then my change loses all of its value. Like I said, anything is possible. But I think these last two points are essentially FUD. I made my proposal because I believe the FUD scenarios are strongly unlikely. (and even then, at worst we end up with a "practically useless" behavior on None, that can also be freely reverted along with such other changes anyway) On Tue, Nov 29, 2022 at 5:34 AM <python-dev-request@python.org> wrote:
Send Python-Dev mailing list submissions to python-dev@python.org
To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman3/lists/python-dev.python.org/ or, via email, send a message with subject or body 'help' to python-dev-request@python.org
You can reach the person managing the list at python-dev-owner@python.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Python-Dev digest..."Today's Topics:
1. Re: A proposal to modify `None` so that it hashes to a constant (Steven D'Aprano) 2. Re: A proposal to modify `None` so that it hashes to a constant (Oscar Benjamin) 3. Re: A proposal to modify `None` so that it hashes to a constant (Chris Angelico) 4. Re: A proposal to modify `None` so that it hashes to a constant (Steven D'Aprano)
---------- Forwarded message ---------- From: "Steven D'Aprano" <steve@pearwood.info> To: python-dev@python.org Cc: Bcc: Date: Mon, 28 Nov 2022 15:35:44 +1100 Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant On Tue, Nov 29, 2022 at 01:34:54PM +1300, Greg Ewing wrote:
I got the impression that there were some internal language reasons to want stable dicts, e.g. so that the class dict passed to __prepare__ preserves the order in which names are assigned in the class body. Are there any such use cases for stable sets?
Some people wanted order preserving kwargs, I think for web frameworks. There was even a discussion for a while about using OrderedDict for kwargs and leaving dicts unordered.
For me, the biggest advantage of preserving input order in dicts is that it is now easier to write doctests using the dict's repr. It would be nice to be able to do the same for sets, but not nice enough to justify making them bigger or slower.
-- Steve
---------- Forwarded message ---------- From: Oscar Benjamin <oscar.j.benjamin@gmail.com> To: Cc: python-dev@python.org Bcc: Date: Tue, 29 Nov 2022 02:07:34 +0000 Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant On Tue, 29 Nov 2022 at 01:33, Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Nov 28, 2022 at 11:13:34PM +0000, Oscar Benjamin wrote:
On Mon, 28 Nov 2022 at 22:56, Brett Cannon <brett@python.org> wrote:
As I understand it, we could make sets ordered, but only at the cost of space (much more memory) or time (slower) or both.
I am sure that Guido is correct that **if** somebody comes up with a fast, efficient ordered set implementation that doesn't perform worse than the current implementation, we will happily swap to giving sets a predictable order, as we did with dicts. (Practicality beats purity -- even if sets are *philosophically* unordered, preserving input order is too useful to give up unless we gain something in return.)
Let's split this into two separate questions:
1. Is it *innately* good that set order is non-deterministic? 2. Are there some other reasons why it is good to choose a model that implies *non-deterministic* set order?
The answer to 1. is emphatically NO. In fact the question itself is badly posed: why are we even asking about "set order" rather than the benefits of determinism in general? If I want my code to be deterministic then that's just something that I want regardless of whether sets, dicts, floats etc are involved.
As for point 2. the fact that sets are currently non-deterministic is actually a relatively new thing in Python. Before hash-randomisation set and dict order *was* deterministic but with an arbitrary order. That was only changed because of a supposed security issue with hash collisions. Prior to that it was well understood that determinism was beneficial (honestly I don't understand why I have to state this point explicitly: determinism is almost always best in our context).
Please everyone don't confuse arbitrary order, implementation defined order and non-deterministic order. There is no reason why sets in Python need to have a *non-deterministic* order or at least why there shouldn't be a way to control that. There is no performance penalty in making the order *deterministic*. (If you think that there might be a performance penalty then you haven't understood the suggestion!)
It would be useful to have a straight-forward way to sort a set into a deterministic ordering but no such feature exists after the Py3K changes (sorted used to do this in Python 2.x).
`sorted()` works fine on homogeneous sets. It is only heterogeneous sets that are a problem, and in practice, that usually means None mixed in with some other type.
That is of course precisely the context for this thread!
-- Oscar
---------- Forwarded message ---------- From: Chris Angelico <rosuav@gmail.com> To: python-dev@python.org Cc: Bcc: Date: Tue, 29 Nov 2022 13:20:30 +1100 Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant On Tue, 29 Nov 2022 at 13:12, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
As for point 2. the fact that sets are currently non-deterministic is actually a relatively new thing in Python. Before hash-randomisation set and dict order *was* deterministic but with an arbitrary order. That was only changed because of a supposed security issue with hash collisions. Prior to that it was well understood that determinism was beneficial (honestly I don't understand why I have to state this point explicitly: determinism is almost always best in our context).
To clarify: The hash collision attack is a very real one, but specific to dictionaries of string keys, since there are quite a few ways for an attacker to send a string that gets automatically parsed into such a dictionary (eg web app frameworks where the request parameters are made available as a dictionary). But since that attack surface is *so* specific, randomization of non-string hashes is unimportant.
ChrisA
---------- Forwarded message ---------- From: "Steven D'Aprano" <steve@pearwood.info> To: python-dev@python.org Cc: Bcc: Date: Mon, 28 Nov 2022 17:19:22 +1100 Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant On Tue, Nov 29, 2022 at 02:07:34AM +0000, Oscar Benjamin wrote:
Let's split this into two separate questions:
Let's not. Your first question about non-deterministic set order being "innately good" is a straw man: as we've already discussed, set order is not non-deterministic (except in the informal sense of "hard to predict") and I don't think anyone is arguing in favour of keeping set order unpredictable even if there are faster, more compact, simpler implementations which preserve order.
Talking about determinism is just muddying the waters. Sets are deterministic: their order is determinied by the implementation, the set history, and potentially any environmental sources of entropy used in address randomisation. Deterministic does not mean predictable.
If we want to get this discussion onto a more useful path, we should start with a security question:
The hash of None changes because of address randomisation. Address randomisation is enabled as a security measure. Are there any security consequences of giving None a constant hash?
I can't see any, but then I couldn't see the security consequences of predictable string hashes until they were pointed out to me. So it would be really good to have some security experts comment on whether this is safe or not.
why are we even asking about "set order" rather than the benefits of determinism in general?
Because this entire discussion is motivated by the OP who wants consistent set order across multiple runs of his Python application. That's what he needs; having None hash to a constant value is just a means to that end.
His sets contain objects whose hashes depend on `Optional[int]`, which means sometimes they include None, and when he runs under an interpreter built with address randomisation, the hash of None can change.
Even if we grant None a constant hash, that still does not guarantee consistent set order across runs. At best, we might get such consistent order as an undocumented and changeable implementation detail, until we change the implementation of hashing, or of sets, or of something seemingly unrelated like address randomisation.
-- Steve _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/
participants (1)
-
Yoni Lavi