[Python-Dev] Re: Should set objects maintain insertion order too?

24 Dec 2019

      [Kyle]
...
...
For some reason, I had assumed in the back of my head (without
giving it much thought) that the average collision rate would be the
same for set items and dict keys. Thanks for the useful information.
I know the theoretical number of probes for dicts, but not for sets
anymore.  The latter use a mix of probe strategies now, "randomish"
jumps (same as for dicts) but also purely linear ("up by 1") probing
to try to exploit L1 cache.

It's not _apparent_ to me that the mix actually helps ;-)  I just
trust that Raymond timed stuff until he was sure it does.
...
...
Ah, I forgot to consider how the hash function actually works for continuous
integers. A better comparison to demonstrate the collision differences would
likely use random strings.
And, at an extreme, a class with a __hash__ that always returns the
same integer.
...
Also, I believe that max "reasonable" integer range of no collision
is (-2305843009213693951, 2305843009213693951), ...
Any range  that does _not_ contain both -2 and -1 (-1 is an annoying
special case, with hash(-1) == hash(-2) == -2), and spans no more than
sys.hash_info.modulus integers.  Apart from that, the sign and
magnitude of the start of the range don't matter; e.g.,
...
...
...
len(set(hash(i) for i in range(10**5000, 10**5000 + 1000000)))
1000000
len(set(hash(i) for i in range(-10**5000, -10**5000 + 1000000)))
1000000

[Python-Dev] Re: Should set objects maintain insertion order too?

Tim Peters