
On Tue, 29 Nov 2022 at 01:33, Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Nov 28, 2022 at 11:13:34PM +0000, Oscar Benjamin wrote:
On Mon, 28 Nov 2022 at 22:56, Brett Cannon <brett@python.org> wrote:
As I understand it, we could make sets ordered, but only at the cost of space (much more memory) or time (slower) or both.
I am sure that Guido is correct that **if** somebody comes up with a fast, efficient ordered set implementation that doesn't perform worse than the current implementation, we will happily swap to giving sets a predictable order, as we did with dicts. (Practicality beats purity -- even if sets are *philosophically* unordered, preserving input order is too useful to give up unless we gain something in return.)
Let's split this into two separate questions: 1. Is it *innately* good that set order is non-deterministic? 2. Are there some other reasons why it is good to choose a model that implies *non-deterministic* set order? The answer to 1. is emphatically NO. In fact the question itself is badly posed: why are we even asking about "set order" rather than the benefits of determinism in general? If I want my code to be deterministic then that's just something that I want regardless of whether sets, dicts, floats etc are involved. As for point 2. the fact that sets are currently non-deterministic is actually a relatively new thing in Python. Before hash-randomisation set and dict order *was* deterministic but with an arbitrary order. That was only changed because of a supposed security issue with hash collisions. Prior to that it was well understood that determinism was beneficial (honestly I don't understand why I have to state this point explicitly: determinism is almost always best in our context). Please everyone don't confuse arbitrary order, implementation defined order and non-deterministic order. There is no reason why sets in Python need to have a *non-deterministic* order or at least why there shouldn't be a way to control that. There is no performance penalty in making the order *deterministic*. (If you think that there might be a performance penalty then you haven't understood the suggestion!)
It would be useful to have a straight-forward way to sort a set into a deterministic ordering but no such feature exists after the Py3K changes (sorted used to do this in Python 2.x).
`sorted()` works fine on homogeneous sets. It is only heterogeneous sets that are a problem, and in practice, that usually means None mixed in with some other type.
That is of course precisely the context for this thread! -- Oscar