[Kyle]
... For some reason, I had assumed in the back of my head (without giving it much thought) that the average collision rate would be the same for set items and dict keys. Thanks for the useful information.
I know the theoretical number of probes for dicts, but not for sets anymore. The latter use a mix of probe strategies now, "randomish" jumps (same as for dicts) but also purely linear ("up by 1") probing to try to exploit L1 cache. It's not _apparent_ to me that the mix actually helps ;-) I just trust that Raymond timed stuff until he was sure it does.
... Ah, I forgot to consider how the hash function actually works for continuous integers. A better comparison to demonstrate the collision differences would likely use random strings.
And, at an extreme, a class with a __hash__ that always returns the same integer.
Also, I believe that max "reasonable" integer range of no collision is (-2305843009213693951, 2305843009213693951), ...
Any range that does _not_ contain both -2 and -1 (-1 is an annoying special case, with hash(-1) == hash(-2) == -2), and spans no more than sys.hash_info.modulus integers. Apart from that, the sign and magnitude of the start of the range don't matter; e.g.,
len(set(hash(i) for i in range(10**5000, 10**5000 + 1000000))) 1000000 len(set(hash(i) for i in range(-10**5000, -10**5000 + 1000000))) 1000000