splitting a large dictionary into smaller ones
tjreedy at udel.edu
Mon Mar 23 04:10:21 CET 2009
> hi all,
> i have a very large dictionary object that is built from a text file
> that is about 800 MB -- it contains several million keys. ideally i
> would like to pickle this object so that i wouldnt have to parse this
> large file to compute the dictionary every time i run my program.
> however currently the pickled file is over 300 MB and takes a very
> long time to write to disk - even longer than recomputing the
> dictionary from scratch.
But you only write it once. How does the read and reconstruct time
compare to the recompute time?
> i would like to split the dictionary into smaller ones, containing
> only hundreds of thousands of keys, and then try to pickle them.
Do you have any evidence that this would really be faster?
> there a way to easily do this? i.e. is there an easy way to make a
> wrapper for this such that i can access this dictionary as just one
> object, but underneath it's split into several? so that i can write
> my_dict[k] and get a value, or set my_dict[m] to some value without
> knowing which sub dictionary it's in.
Searching for a key in, say, 10 dicts will be slower than searching for
it in just one. The only reason I would do this would be if the dict
had to be split, say over several machines. But then, you could query
them in parallel.
> if there aren't known ways to do this, i would greatly apprciate any
> advice/examples on how to write this data structure from scratch,
> reusing as much of the dict() class as possible.
Terry Jan Reedy
More information about the Python-list