sorting a dictionary

Alex Martelli aleax at aleax.it
Wed Feb 5 09:20:14 EST 2003


Giovanni Bajo wrote:

> 
> "Alex Martelli" <aleax at aleax.it> ha scritto nel messaggio
> news:nXL%9.174528$AA2.6987601 at news2.tin.it...
> 
>> def get_highest_in_linear_time_without_explicit_iterators(d):
>>     for k in d:
>>         maxkey, maxval = k, d[k]
>>         break
>>     else:
>>         raise ValueError, "empty dictionary has no highest key"
>>     for k in d:
>>         v = d[k]
>>         if v>maxval: maxkey, maxval = k, v
>>     return maxkey
> 
> Doesn't d[k] require O(logN) too? Or is Python smart enough to optimize
> this away since we are going through the dictionary?

d[k] is roughly constant time (dictionaries are hash tables) in
terms of number of entries in the dictionary.  If the KEYS take
non-O(1) time to hash, as I believe strings do [hash(x) is O(len(x))
if x is a string -- I *think*], then that might be an issue, yes.

So, this is O(N) where N is the number of entries in d, but only
assuming each of d's keys is of bounded length (or some other kind
of data type which takes bounded time per hash computation, i.e.,
_most_ of them but I believe not all).

However, I think this issue (which often does get ignored when
analyzing algorithms, said he to lighten the burden of his guilt;-)
applies to other "atoms" here.  For example, the v>maxval comparison
is ASSUMED to be O(1) -- but it MIGHT depend on the v and maxval
values and types; if v and maxval are strings of len K that only
differs towards the end, then the comparison takes O(K) [and as
far as I can see there is no way to safeguard against this by
changing algorithms, since the comparisons _will_ need to be done].

> Your previous solution with explicit iterators didn't have this "issue".

Yes, that solution never indexed the dictionary (but it did still
compare the values, inevitably).  Here's another O(N) solution,
similar to this one and not using explicit iterators (works in
old Python versions too):

def get_highest_in_linear_time_with_items(d):
    dit = d.items()
    maxkey, maxval = dit[0]
    for k, v in dit:
        if v>maxval: maxkey, maxval = k, v
    return maxkey

This does require extra *space* that is O(N), though (to store
dit).  This doesn't _directly_ translate to using iteritems,
because it does one access via dit[0].  Avoiding that access
can bring us back closer to the previous solution:

def get_highest_in_linear_time_with_iteritems(d):
    idit = d.iteritems()
    for maxkey, maxval in idit:
        break
    else:
        raise ValueError, "empty dictionary has no highest key"
    for k, v in idit:
        if v>maxval: maxkey, maxval = k, v
    return maxkey

However, getting maxkey and maxval with a call to idit.next()
is clearly a more direct approach than that "for which runs
only once because it immediately breaks" -- and that in turn
gets us all the way around to "my previous solution" you mention.


Alex





More information about the Python-list mailing list