When I quoted 10% as a rough measurement I was referring more to set operations like union, etc, which choosing the larger set may have a very real performance advantage.


For membership tests which are false, it may be quicker, as less memory is accessed. The buckets will be smaller (1,2,4, or 8 bytes vs 8 or 16 (32bit vs 64bit). But, to compare the entry's hash, an indirection into the entries array is required... The hueristics become less clear when considering how many collisions there are. It may also change for different tuning parameters (i.e. a lower load factor would alleviate the pointer dereferences require to compare items, but at the cost of more memory). 

It may very well be the case that the target load factor only decreases a small amount (so that the ordered implementation still uses less memory), but the performance benefits of quick rejections make overall performance better. I don't know how 'objective' we can be here without discussing specific data sets people may be using.

On Mon, Aug 24, 2020, 7:15 PM David Mertz <mertz@gnosis.cx> wrote:
The main purpose of sets is FAST membership testing. Everything I've seen in prior discussions convinced me that preserving insertion order would slow that down.

If membership testing lost even 10% speed, I would be -1000 on the idea. If someone comes up with actual working code that is no more than 1% slower, is move to -0.

If speed miraculously got faster in such a rewrite (analogous with dictionaries whose order was initially a happy side-effect, but I've seen explanations of why that wouldn't happen) I'd be +1.

But in the last case, the entirely of my support would be for the speed up, and exactly none of it would be for the ordering.