[Python-Dev] Retrieve an arbitrary element from asetwithoutremoving it

Nick Coghlan ncoghlan at gmail.com
Mon Nov 9 14:32:15 CET 2009


Martin v. Löwis wrote:
>> The problem arises because we're systematically unbalancing the hash
>> table.  The iterator returns the first valid entry in the hash table,
>> which we remove.  Repeat several times and pretty soon the iterator has
>> to spend O(n) time scanning through empty entries to find the first
>> remaining valid entry.
> 
> Interesting. Something goes wrong, it seems: if items get removed over
> and over again, I think the set should shrink (not sure whether it
> actually does). Then, I think you should end up with an amortized O(1)
> for selecting an element (assuming that the underlying hashes don't
> collide).

The behaviour is inherited (in spirit) from dicts. Courtesy of
dictnotes.txt:

"""
* Shrinkage rate upon exceeding maximum sparseness.  The current
    CPython code never even checks sparseness when deleting a
    key.  When a new key is added, it resizes based on the number
    of active keys, so that the addition may trigger shrinkage
    rather than growth.
"""

"""
Dictionary operations involving only a single key can be O(1) unless
resizing is possible.  By checking for a resize only when the
dictionary can grow (and may *require* resizing), other operations
remain O(1), and the odds of resize thrashing or memory fragmentation
are reduced. In particular, an algorithm that empties a dictionary
by repeatedly invoking .pop will see no resizing, which might
not be necessary at all because the dictionary is eventually
discarded entirely.
"""

So the rationale is to ensure that only add operations perform a resize
and so that sequential pop() operations don't incur excessive resizing
costs.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------


More information about the Python-Dev mailing list