finding items that occur more than once in a list
arnodel at googlemail.com
Thu Mar 20 10:43:26 CET 2008
On Mar 19, 3:08 am, Bryan Olson <fakeaddr... at nowhere.org> wrote:
> Arnaud Delobelle wrote:
> > Ninereeds wrote:
> >> Hrvoje Niksic wrote:
> >>> This doesn't apply to Python, which implements dict storage as an
> >>> open-addressed table and automatically (and exponentially) grows the
> >>> table when the number of entries approaches 2/3 of the table size.
> >>> Assuming a good hash function, filling the dict should yield amortized
> >>> constant time for individual additions.
> > Isn't this average constant time rather than amortized?
> This is expected amortized constant time. Is that really the same
> thing as average constant time? Hmmm... that's subtle.
I am not sure what the difference between expected amortized time
complexity and average time complexity is (I know that they are
defined as different notions but I don't know whether they reduce to
the same thing or not). Anyway both are average case complexities and
AFAIK worst case time complexity of insertion / lookup in a hashtable
is still O(n).
> >> OK. I obviously need to look up open-addressed tables. I thought this
> >> was just, in effect, implicit linked listing - ie it still needs a
> >> linear search to handle collisions, it just avoids the need for
> >> explicitly stored link fields. Perhaps I'm mistaken.
> The amortized doubling breaks that.
> > I don't think you are mistaken, but if I'm wrong I'd be grateful for a
> > link to further details.
> Arnaud Delobelle offered a good Wikipedia link, and for more background
> look up "amortized analysis.
Hrvoje Niksic provided the link :). I still think two unrelated
things are being compared in this thread when people say that method A
(using dictionaries / sets) is O(n) and method B (sorting the list)
Isn't it the case that:
| Worst case | Average case
Method A | O(n^2) | O(n)
Method B | O(nlogn) | O(nlogn)
Which method is the best would then be determined by the distribution
of the hash values of your data, and whether you want to have a
guarantee the method will have a less-than-quadratic behaviour.
More information about the Python-list