[Python-ideas] Re: Reliable Ordering Of Sets

24 Aug 2020

      When I quoted 10% as a rough measurement I was referring more to set
operations like union, etc, which choosing the larger set may have a very
real performance advantage.

For membership tests which are false, it may be quicker, as less memory is
accessed. The buckets will be smaller (1,2,4, or 8 bytes vs 8 or 16 (32bit
vs 64bit). But, to compare the entry's hash, an indirection into the
entries array is required... The hueristics become less clear when
considering how many collisions there are. It may also change for different
tuning parameters (i.e. a lower load factor would alleviate the pointer
dereferences require to compare items, but at the cost of more memory).

It may very well be the case that the target load factor only decreases a
small amount (so that the ordered implementation still uses less memory),
but the performance benefits of quick rejections make overall performance
better. I don't know how 'objective' we can be here without discussing
specific data sets people may be using.

On Mon, Aug 24, 2020, 7:15 PM David Mertz  wrote:
...
The main purpose of sets is FAST membership testing. Everything I've seen
in prior discussions convinced me that preserving insertion order would
slow that down.
If membership testing lost even 10% speed, I would be -1000 on the idea.
If someone comes up with actual working code that is no more than 1%
slower, is move to -0.
If speed miraculously got faster in such a rewrite (analogous with
dictionaries whose order was initially a happy side-effect, but I've seen
explanations of why that wouldn't happen) I'd be +1.
But in the last case, the entirely of my support would be for the speed
up, and exactly none of it would be for the ordering.

[Python-ideas] Re: Reliable Ordering Of Sets

Cade Brown