When I quoted 10% as a rough measurement I was referring more to set
operations like union, etc, which choosing the larger set may have a very
real performance advantage.
For membership tests which are false, it may be quicker, as less memory is
accessed. The buckets will be smaller (1,2,4, or 8 bytes vs 8 or 16 (32bit
vs 64bit). But, to compare the entry's hash, an indirection into the
entries array is required... The hueristics become less clear when
considering how many collisions there are. It may also change for different
tuning parameters (i.e. a lower load factor would alleviate the pointer
dereferences require to compare items, but at the cost of more memory).
It may very well be the case that the target load factor only decreases a
small amount (so that the ordered implementation still uses less memory),
but the performance benefits of quick rejections make overall performance
better. I don't know how 'objective' we can be here without discussing
specific data sets people may be using.
On Mon, Aug 24, 2020, 7:15 PM David Mertz
The main purpose of sets is FAST membership testing. Everything I've seen in prior discussions convinced me that preserving insertion order would slow that down.
If membership testing lost even 10% speed, I would be -1000 on the idea. If someone comes up with actual working code that is no more than 1% slower, is move to -0.
If speed miraculously got faster in such a rewrite (analogous with dictionaries whose order was initially a happy side-effect, but I've seen explanations of why that wouldn't happen) I'd be +1.
But in the last case, the entirely of my support would be for the speed up, and exactly none of it would be for the ordering.