splitting a large dictionary into smaller ones

per perfreem at gmail.com
Sun Mar 22 22:32:23 EDT 2009


hi all,

i have a very large dictionary object that is built from a text file
that is about 800 MB -- it contains several million keys.  ideally i
would like to pickle this object so that i wouldnt have to parse this
large file to compute the dictionary every time i run my program.
however currently the pickled file is over 300 MB and takes a very
long time to write to disk - even longer than recomputing the
dictionary from scratch.

i would like to split the dictionary into smaller ones, containing
only hundreds of thousands of keys, and then try to pickle them. is
there a way to easily do this? i.e. is there an easy way to make a
wrapper for this such that i can access this dictionary as just one
object, but underneath it's split into several? so that i can write
my_dict[k] and get a value, or set my_dict[m] to some value without
knowing which sub dictionary it's in.

if there aren't known ways to do this, i would greatly apprciate any
advice/examples on how to write this data structure from scratch,
reusing as much of the dict() class as possible.

thanks.

large_dict[a]



More information about the Python-list mailing list