Scalable python dict {'key_is_a_string': [count, some_val]}

Jonathan Gardner jgardner at
Sat Feb 20 08:27:56 CET 2010

On Fri, Feb 19, 2010 at 10:36 PM, krishna <krishna.k.0001 at> wrote:
> I have to manage a couple of dicts with huge dataset (larger than
> feasible with the memory on my system), it basically has a key which
> is a string (actually a tuple converted to a string) and a two item
> list as value, with one element in the list being a count related to
> the key. I have to at the end sort this dictionary by the count.
> The platform is linux. I am planning to implement it by setting a
> threshold beyond which I write the data into files (3 columns: 'key
> count some_val' ) and later merge those files (I plan to sort the
> individual files by the key column and walk through the files with one
> pointer per file and merge them; I would add up the counts when
> entries from two files match by key) and sorting using the 'sort'
> command. Thus the bottleneck is the 'sort' command.
> Any suggestions, comments?

You should be using BDBs or even something like PostgreSQL. The
indexes there will give you the scalability you need. I doubt you will
be able to write anything that will select, update, insert or delete
data better than what BDBs and PostgreSQL can give you.

Jonathan Gardner
jgardner at

More information about the Python-list mailing list