Send Python-Dev mailing list submissions to
python-dev@python.org
To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman3/lists/python-dev.python.org/
or, via email, send a message with subject or body 'help' to
python-dev-request@python.org
You can reach the person managing the list at
python-dev-owner@python.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Python-Dev digest..."Today's Topics:
1. Re: A proposal to modify `None` so that it hashes to a constant
(Steven D'Aprano)
2. Re: A proposal to modify `None` so that it hashes to a constant
(Oscar Benjamin)
3. Re: A proposal to modify `None` so that it hashes to a constant
(Chris Angelico)
4. Re: A proposal to modify `None` so that it hashes to a constant
(Steven D'Aprano)
---------- Forwarded message ----------
From: "Steven D'Aprano" <steve@pearwood.info>
To: python-dev@python.org
Cc:
Bcc:
Date: Mon, 28 Nov 2022 15:35:44 +1100
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, Nov 29, 2022 at 01:34:54PM +1300, Greg Ewing wrote:
> I got the impression that there were some internal language reasons
> to want stable dicts, e.g. so that the class dict passed to __prepare__
> preserves the order in which names are assigned in the class body. Are
> there any such use cases for stable sets?
Some people wanted order preserving kwargs, I think for web frameworks.
There was even a discussion for a while about using OrderedDict for
kwargs and leaving dicts unordered.
For me, the biggest advantage of preserving input order in dicts is that
it is now easier to write doctests using the dict's repr. It would be
nice to be able to do the same for sets, but not nice enough to justify
making them bigger or slower.
--
Steve
---------- Forwarded message ----------
From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
To:
Cc: python-dev@python.org
Bcc:
Date: Tue, 29 Nov 2022 02:07:34 +0000
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, 29 Nov 2022 at 01:33, Steven D'Aprano <steve@pearwood.info> wrote:
>
> On Mon, Nov 28, 2022 at 11:13:34PM +0000, Oscar Benjamin wrote:
> > On Mon, 28 Nov 2022 at 22:56, Brett Cannon <brett@python.org> wrote:
>
> As I understand it, we could make sets ordered, but only at the cost of
> space (much more memory) or time (slower) or both.
>
> I am sure that Guido is correct that **if** somebody comes up with a
> fast, efficient ordered set implementation that doesn't perform worse
> than the current implementation, we will happily swap to giving sets a
> predictable order, as we did with dicts. (Practicality beats purity --
> even if sets are *philosophically* unordered, preserving input order is
> too useful to give up unless we gain something in return.)
Let's split this into two separate questions:
1. Is it *innately* good that set order is non-deterministic?
2. Are there some other reasons why it is good to choose a model that
implies *non-deterministic* set order?
The answer to 1. is emphatically NO. In fact the question itself is
badly posed: why are we even asking about "set order" rather than the
benefits of determinism in general? If I want my code to be
deterministic then that's just something that I want regardless of
whether sets, dicts, floats etc are involved.
As for point 2. the fact that sets are currently non-deterministic is
actually a relatively new thing in Python. Before hash-randomisation
set and dict order *was* deterministic but with an arbitrary order.
That was only changed because of a supposed security issue with hash
collisions. Prior to that it was well understood that determinism was
beneficial (honestly I don't understand why I have to state this point
explicitly: determinism is almost always best in our context).
Please everyone don't confuse arbitrary order, implementation defined
order and non-deterministic order. There is no reason why sets in
Python need to have a *non-deterministic* order or at least why there
shouldn't be a way to control that. There is no performance penalty in
making the order *deterministic*. (If you think that there might be a
performance penalty then you haven't understood the suggestion!)
> > It would be useful to have a straight-forward way to sort a set into a
> > deterministic ordering but no such feature exists after the Py3K
> > changes (sorted used to do this in Python 2.x).
>
> `sorted()` works fine on homogeneous sets. It is only heterogeneous sets
> that are a problem, and in practice, that usually means None mixed in
> with some other type.
That is of course precisely the context for this thread!
--
Oscar
---------- Forwarded message ----------
From: Chris Angelico <rosuav@gmail.com>
To: python-dev@python.org
Cc:
Bcc:
Date: Tue, 29 Nov 2022 13:20:30 +1100
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, 29 Nov 2022 at 13:12, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
> As for point 2. the fact that sets are currently non-deterministic is
> actually a relatively new thing in Python. Before hash-randomisation
> set and dict order *was* deterministic but with an arbitrary order.
> That was only changed because of a supposed security issue with hash
> collisions. Prior to that it was well understood that determinism was
> beneficial (honestly I don't understand why I have to state this point
> explicitly: determinism is almost always best in our context).
To clarify: The hash collision attack is a very real one, but specific
to dictionaries of string keys, since there are quite a few ways for
an attacker to send a string that gets automatically parsed into such
a dictionary (eg web app frameworks where the request parameters are
made available as a dictionary). But since that attack surface is *so*
specific, randomization of non-string hashes is unimportant.
ChrisA
---------- Forwarded message ----------
From: "Steven D'Aprano" <steve@pearwood.info>
To: python-dev@python.org
Cc:
Bcc:
Date: Mon, 28 Nov 2022 17:19:22 +1100
Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to a constant
On Tue, Nov 29, 2022 at 02:07:34AM +0000, Oscar Benjamin wrote:
> Let's split this into two separate questions:
Let's not. Your first question about non-deterministic set order being
"innately good" is a straw man: as we've already discussed, set order is
not non-deterministic (except in the informal sense of "hard to
predict") and I don't think anyone is arguing in favour of keeping set
order unpredictable even if there are faster, more compact, simpler
implementations which preserve order.
Talking about determinism is just muddying the waters. Sets are
deterministic: their order is determinied by the implementation, the set
history, and potentially any environmental sources of entropy used in
address randomisation. Deterministic does not mean predictable.
If we want to get this discussion onto a more useful path, we should
start with a security question:
The hash of None changes because of address randomisation. Address
randomisation is enabled as a security measure. Are there any security
consequences of giving None a constant hash?
I can't see any, but then I couldn't see the security consequences of
predictable string hashes until they were pointed out to me. So it would
be really good to have some security experts comment on whether this is
safe or not.
> why are we even asking about "set order" rather than the
> benefits of determinism in general?
Because this entire discussion is motivated by the OP who wants
consistent set order across multiple runs of his Python application.
That's what he needs; having None hash to a constant value is just a
means to that end.
His sets contain objects whose hashes depend on `Optional[int]`, which
means sometimes they include None, and when he runs under an interpreter
built with address randomisation, the hash of None can change.
Even if we grant None a constant hash, that still does not guarantee
consistent set order across runs. At best, we might get such consistent
order as an undocumented and changeable implementation detail, until we
change the implementation of hashing, or of sets, or of something
seemingly unrelated like address randomisation.
--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/