> I can't see any, but then I couldn't see the security consequences of
> predictable string hashes until they were pointed out to me. So it would
> be really good to have some security experts comment on whether this is
> safe or not.

I can't either. I can point out that the complexity attack via hash collisions is not possible on None specifically because it is a singleton, so:
1. There may only be one None in a dict at most, 
2. For composite types that depend on None and other things, like say a tuple of string and an optional int, they become as resistant to attack as the same types without None, in this case string and int.
3. Python only explicitly added this security (hash randomization) feature to string and bytes hashes, and if your composite key type depends on those types, it will still be protected regardless of what None does


> Because this entire discussion is motivated by the OP who wants
> consistent set order across multiple runs of his Python application.
> That's what he needs; having None hash to a constant value is just a
> means to that end.

Not entirely. I do explain in my doc why there is a foundational reason why, once the choice was made to have None represent the `None` of Optional[T], it entered the family of types that you would expect to have deterministic behavior. And as the hashing function is intrinsic to types in Python, it is included in that.


> Even if we grant None a constant hash, that still does not guarantee
> consistent set order across runs. At best, we might get such consistent
> order as an undocumented and changeable implementation detail, until we
> change the implementation of hashing, or of sets, or of something
> seemingly unrelated like address randomisation.

ASLR will not cause any trouble so long as I keep objects with identity based hashing out of my sets. Or at least, any sets I later iterate on and run imperative code with side-effects under the loop.
The possibility of disabling ASLR is not a reason to dismiss this change. No one guarantees a user of Python is in a position to make infosec related decisions on the computer system they are working on.

Regarding the possibilities of hashes changing for the worse (in this regard) - sure. Anything is possible.

Regarding sets - the set implementation is deterministic, and always has been (for decades).
Yes, in theory it is possible that a new version or implementation of Python will make sets non-deterministic - shuffle themselves in the background independently from their history of operations, or what have you, and then my change loses all of its value.
Like I said, anything is possible. But I think these last two points are essentially FUD.

I made my proposal because I believe the FUD scenarios are strongly unlikely. (and even then, at worst we end up with a "practically useless" behavior on None, that can also be freely reverted along with such other changes anyway)



On Tue, Nov 29, 2022 at 5:34 AM <python-dev-request@python.org> wrote:
Send Python-Dev mailing list submissions to
        python-dev@python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman3/lists/python-dev.python.org/
or, via email, send a message with subject or body 'help' to
        python-dev-request@python.org

You can reach the person managing the list at
        python-dev-owner@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Python-Dev digest..."Today's Topics:

   1. Re: A proposal to modify `None` so that it hashes to a constant
      (Steven D'Aprano)
   2. Re: A proposal to modify `None` so that it hashes to a constant
      (Oscar Benjamin)
   3. Re: A proposal to modify `None` so that it hashes to a constant
      (Chris Angelico)
   4. Re: A proposal to modify `None` so that it hashes to a constant
      (Steven D'Aprano)



---------- Forwarded message ----------
From: "Steven D'Aprano" <steve@pearwood.info>
To: python-dev@python.org
Cc: 
Bcc: 
Date: Mon, 28 Nov 2022 15:35:44 +1100
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, Nov 29, 2022 at 01:34:54PM +1300, Greg Ewing wrote:

> I got the impression that there were some internal language reasons
> to want stable dicts, e.g. so that the class dict passed to __prepare__
> preserves the order in which names are assigned in the class body. Are
> there any such use cases for stable sets?

Some people wanted order preserving kwargs, I think for web frameworks.
There was even a discussion for a while about using OrderedDict for
kwargs and leaving dicts unordered.

For me, the biggest advantage of preserving input order in dicts is that
it is now easier to write doctests using the dict's repr. It would be
nice to be able to do the same for sets, but not nice enough to justify
making them bigger or slower.


--
Steve



---------- Forwarded message ----------
From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
To: 
Cc: python-dev@python.org
Bcc: 
Date: Tue, 29 Nov 2022 02:07:34 +0000
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, 29 Nov 2022 at 01:33, Steven D'Aprano <steve@pearwood.info> wrote:
>
> On Mon, Nov 28, 2022 at 11:13:34PM +0000, Oscar Benjamin wrote:
> > On Mon, 28 Nov 2022 at 22:56, Brett Cannon <brett@python.org> wrote:
>
> As I understand it, we could make sets ordered, but only at the cost of
> space (much more memory) or time (slower) or both.
>
> I am sure that Guido is correct that **if** somebody comes up with a
> fast, efficient ordered set implementation that doesn't perform worse
> than the current implementation, we will happily swap to giving sets a
> predictable order, as we did with dicts. (Practicality beats purity --
> even if sets are *philosophically* unordered, preserving input order is
> too useful to give up unless we gain something in return.)

Let's split this into two separate questions:

1. Is it *innately* good that set order is non-deterministic?
2. Are there some other reasons why it is good to choose a model that
implies *non-deterministic* set order?

The answer to 1. is emphatically NO. In fact the question itself is
badly posed: why are we even asking about "set order" rather than the
benefits of determinism in general? If I want my code to be
deterministic then that's just something that I want regardless of
whether sets, dicts, floats etc are involved.

As for point 2. the fact that sets are currently non-deterministic is
actually a relatively new thing in Python. Before hash-randomisation
set and dict order *was* deterministic but with an arbitrary order.
That was only changed because of a supposed security issue with hash
collisions. Prior to that it was well understood that determinism was
beneficial (honestly I don't understand why I have to state this point
explicitly: determinism is almost always best in our context).

Please everyone don't confuse arbitrary order, implementation defined
order and non-deterministic order. There is no reason why sets in
Python need to have a *non-deterministic* order or at least why there
shouldn't be a way to control that. There is no performance penalty in
making the order *deterministic*. (If you think that there might be a
performance penalty then you haven't understood the suggestion!)

> > It would be useful to have a straight-forward way to sort a set into a
> > deterministic ordering but no such feature exists after the Py3K
> > changes (sorted used to do this in Python 2.x).
>
> `sorted()` works fine on homogeneous sets. It is only heterogeneous sets
> that are a problem, and in practice, that usually means None mixed in
> with some other type.

That is of course precisely the context for this thread!

--
Oscar



---------- Forwarded message ----------
From: Chris Angelico <rosuav@gmail.com>
To: python-dev@python.org
Cc: 
Bcc: 
Date: Tue, 29 Nov 2022 13:20:30 +1100
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, 29 Nov 2022 at 13:12, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
> As for point 2. the fact that sets are currently non-deterministic is
> actually a relatively new thing in Python. Before hash-randomisation
> set and dict order *was* deterministic but with an arbitrary order.
> That was only changed because of a supposed security issue with hash
> collisions. Prior to that it was well understood that determinism was
> beneficial (honestly I don't understand why I have to state this point
> explicitly: determinism is almost always best in our context).

To clarify: The hash collision attack is a very real one, but specific
to dictionaries of string keys, since there are quite a few ways for
an attacker to send a string that gets automatically parsed into such
a dictionary (eg web app frameworks where the request parameters are
made available as a dictionary). But since that attack surface is *so*
specific, randomization of non-string hashes is unimportant.

ChrisA



---------- Forwarded message ----------
From: "Steven D'Aprano" <steve@pearwood.info>
To: python-dev@python.org
Cc: 
Bcc: 
Date: Mon, 28 Nov 2022 17:19:22 +1100
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, Nov 29, 2022 at 02:07:34AM +0000, Oscar Benjamin wrote:

> Let's split this into two separate questions:

Let's not. Your first question about non-deterministic set order being
"innately good" is a straw man: as we've already discussed, set order is
not non-deterministic (except in the informal sense of "hard to
predict") and I don't think anyone is arguing in favour of keeping set
order unpredictable even if there are faster, more compact, simpler
implementations which preserve order.

Talking about determinism is just muddying the waters. Sets are
deterministic: their order is determinied by the implementation, the set
history, and potentially any environmental sources of entropy used in
address randomisation. Deterministic does not mean predictable.

If we want to get this discussion onto a more useful path, we should
start with a security question:

The hash of None changes because of address randomisation. Address
randomisation is enabled as a security measure. Are there any security
consequences of giving None a constant hash?

I can't see any, but then I couldn't see the security consequences of
predictable string hashes until they were pointed out to me. So it would
be really good to have some security experts comment on whether this is
safe or not.


> why are we even asking about "set order" rather than the
> benefits of determinism in general?

Because this entire discussion is motivated by the OP who wants
consistent set order across multiple runs of his Python application.
That's what he needs; having None hash to a constant value is just a
means to that end.

His sets contain objects whose hashes depend on `Optional[int]`, which
means sometimes they include None, and when he runs under an interpreter
built with address randomisation, the hash of None can change.

Even if we grant None a constant hash, that still does not guarantee
consistent set order across runs. At best, we might get such consistent
order as an undocumented and changeable implementation detail, until we
change the implementation of hashing, or of sets, or of something
seemingly unrelated like address randomisation.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/