Sorted and reversed on huge dict ?

Fredrik Lundh fredrik at pythonware.com
Fri Nov 3 14:25:08 EST 2006


vd12005 at yahoo.fr wrote:

> i would like to sort(ed) and reverse(d) the result of many huge
> dictionaries (a single dictionary will contain ~ 150000 entries). Keys
> are words, values are count (integer).

not sure 150k entries qualify as huge, though, unless you're short on 
memory.

> i'm wondering if i can have a 10s of these in memory, or if i should
> proceed one after the other.

why not just try it out?

> but moreover i'm interested in saving theses as values, keys sorted and
> reversed (ie most frequent first), i can do it with sort from unix
> command but i wonder how i should do it with python to be memory
> friendly.
> 
> can it be done by using :
> 
> from itertools import izip
> pairs = izip(d.itervalues(), d.iterkeys())
> for v, k in reversed(sorted(pairs)):
>     print k, v
> 
> or will it be the same as building the whole list ?

sorted() needs access to all data, so it'll build a list, even if you 
feed it a generator.  you will have to test it yourself, but I suspect 
that the most memory-efficient way to do what you want could be:

     items = d.items()
     items.sort(key=operator.itemgetter(1), reverse=True)

the items list would require a couple of megabytes for 150k dictionary 
entries, or so.  the key map needs some memory too, but the rest of the 
sort is done in place.

</F>




More information about the Python-list mailing list