finding items that occur more than once in a list
Hrvoje Niksic
hniksic at xemacs.org
Tue Mar 18 11:18:04 EDT 2008
Ninereeds <stephenhorne100 at aol.com> writes:
> The dictionary version Chris suggests (and the essentially
> equivalent set-based approach) is doing essentially the same thing
> in a way, but using hashing rather than ordering to organise the
> list and spot duplicates. This is *not* O(n) due to the rate of
> collisions increasing as the hash table fills. If collisions are
> handled by building a linked list for each hash table entry, the
> performance overall is still O(n^2)
This doesn't apply to Python, which implements dict storage as an
open-addressed table and automatically (and exponentially) grows the
table when the number of entries approaches 2/3 of the table size.
Assuming a good hash function, filling the dict should yield amortized
constant time for individual additions.
> I suspect Python handles collisions using linked lists,
Why suspect, it's trivial to check. dictobject.c says:
The basic lookup function used by all operations.
This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
Open addressing is preferred over chaining since the link overhead for
chaining would be substantial (100% with typical malloc overhead).
More information about the Python-list
mailing list