[Python-Dev] Compact ordered set
raymond.hettinger at gmail.com
Tue Feb 26 16:02:36 EST 2019
Quick summary of what I found when I last ran experiments with this idea:
* To get the same lookup performance, the density of index table would need to go down to around 25%. Otherwise, there's no way to make up for the extra indirection and the loss of cache locality.
* There was a small win on iteration performance because its cheaper to loop over a dense array than a sparse array (fewer memory access and elimination of the unpredictable branch). This is nice because iteration performance matters in some key use cases.
* I gave up on ordering right away. If we care about performance, keys can be stored in the order added; but no effort should be expended to maintain order if subsequent deletions occur. Likewise, to keep set-to-set operations efficient (i.e. looping over the smaller input), no order guarantee should be given for those operations. In general, we can let order happen but should not guarantee it and work to maintain it or slow-down essential operations to make them ordered.
* Compacting does make sets a little smaller but does cost an indirection and incurs a cost for switching index sizes between 1-byte arrays, 2-byte arrays, 4-byte arrays, and 8-byte arrays. Those don't seem very expensive; however, set lookups are already very cheap when the hash values are known (when they're not, the cost of computing the hash value tends to dominate anything done by the setobject itself).
* I couldn't find any existing application that would notice the benefit of making sets a bit smaller. Most applications use dictionaries (directly or indirectly) everywhere, so compaction was an overall win. Sets tend to be used more sparsely (no pun intended) and tend to be only a small part of overall memory usage. I had to consider this when bumping the load factor down to 60%, prioritizing speed over space.
More information about the Python-Dev