splitting a large dictionary into smaller ones
steve at holdenweb.com
Mon Mar 23 14:57:29 CET 2009
> hi all,
> i have a very large dictionary object that is built from a text file
> that is about 800 MB -- it contains several million keys. ideally i
> would like to pickle this object so that i wouldnt have to parse this
> large file to compute the dictionary every time i run my program.
> however currently the pickled file is over 300 MB and takes a very
> long time to write to disk - even longer than recomputing the
> dictionary from scratch.
> i would like to split the dictionary into smaller ones, containing
> only hundreds of thousands of keys, and then try to pickle them. is
> there a way to easily do this? i.e. is there an easy way to make a
> wrapper for this such that i can access this dictionary as just one
> object, but underneath it's split into several? so that i can write
> my_dict[k] and get a value, or set my_dict[m] to some value without
> knowing which sub dictionary it's in.
> if there aren't known ways to do this, i would greatly apprciate any
> advice/examples on how to write this data structure from scratch,
> reusing as much of the dict() class as possible.
You aren't by any chance running this on Python 3.0, are you? The I/O
implementation for that release is known to be slow, and this would have
its effect on pickle dump/load performance.
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/
More information about the Python-list