splitting a large dictionary into smaller ones

odeits odeits at gmail.com
Mon Mar 23 03:56:05 CET 2009


On Mar 22, 7:32 pm, per <perfr... at gmail.com> wrote:
> hi all,
>
> i have a very large dictionary object that is built from a text file
> that is about 800 MB -- it contains several million keys.  ideally i
> would like to pickle this object so that i wouldnt have to parse this
> large file to compute the dictionary every time i run my program.
> however currently the pickled file is over 300 MB and takes a very
> long time to write to disk - even longer than recomputing the
> dictionary from scratch.
>
> i would like to split the dictionary into smaller ones, containing
> only hundreds of thousands of keys, and then try to pickle them. is
> there a way to easily do this? i.e. is there an easy way to make a
> wrapper for this such that i can access this dictionary as just one
> object, but underneath it's split into several? so that i can write
> my_dict[k] and get a value, or set my_dict[m] to some value without
> knowing which sub dictionary it's in.
>
> if there aren't known ways to do this, i would greatly apprciate any
> advice/examples on how to write this data structure from scratch,
> reusing as much of the dict() class as possible.
>
> thanks.
>
> large_dict[a]

So that I understand the goal, the reason you wish to split the
dictionary into smaller ones is so that you dont have to write all of
the dictionaries to disk if they haven't changed? Or are you trying to
speed up the initial load time?

If you are trying to speed up the initial load time I don't think this
will help. If the save time is what you are after maybe you want to
check out memory mapped files.




More information about the Python-list mailing list