[Python-Dev] Re: Should set objects maintain insertion order too?

23 Dec 2019

      Sorry!  A previous attempt to reply got sent before I typed anything :-(

Very briefly:
...
...
...
...
timeit.timeit("set(i for i in range(1000))", number=100_000)
[and other examples using a range of integers]

The collision resolution strategy for sets evolved to be fancier than
for dicts, to reduce cache misses.  This is important for sets because
the _only_ interesting thing about an element wrt a set is whether or
not the set contains it.   Lookup speed is everything for sets.

If you use a contiguous range of "reasonable" integers for keys, the
integer hash function is perfect:  there's never a collision.  So any
such test misses all the work Raymond did to speed set lookups.
String keys have sufficiently "random" hashes to reliably create
collisions, though.  Cache misses can, of course, have massive effects
on timing.
...
Add (much faster for dicts):
...
...
...
timeit.timeit("s = set(); s.add(0)", number=100_000_000)
13.330938750001224
timeit.timeit("d = {}; d[0] = None", number=100_000_000)
5.788865385999088
In the former case you're primarily measuring the time to look up the
"add" method of sets (that's more expensive than adding 0 to the set).
A better comparison would, e.g., move `add = s.add` to a setup line,
and use plain "add(0)" in the loop.

That's it!

[Python-Dev] Re: Should set objects maintain insertion order too?

Tim Peters