
Martin v. Löwis wrote:
The problem arises because we're systematically unbalancing the hash table. The iterator returns the first valid entry in the hash table, which we remove. Repeat several times and pretty soon the iterator has to spend O(n) time scanning through empty entries to find the first remaining valid entry.
Interesting. Something goes wrong, it seems: if items get removed over and over again, I think the set should shrink (not sure whether it actually does). Then, I think you should end up with an amortized O(1) for selecting an element (assuming that the underlying hashes don't collide).
The behaviour is inherited (in spirit) from dicts. Courtesy of dictnotes.txt: """ * Shrinkage rate upon exceeding maximum sparseness. The current CPython code never even checks sparseness when deleting a key. When a new key is added, it resizes based on the number of active keys, so that the addition may trigger shrinkage rather than growth. """ """ Dictionary operations involving only a single key can be O(1) unless resizing is possible. By checking for a resize only when the dictionary can grow (and may *require* resizing), other operations remain O(1), and the odds of resize thrashing or memory fragmentation are reduced. In particular, an algorithm that empties a dictionary by repeatedly invoking .pop will see no resizing, which might not be necessary at all because the dictionary is eventually discarded entirely. """ So the rationale is to ensure that only add operations perform a resize and so that sequential pop() operations don't incur excessive resizing costs. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------