writing large dictionaries to file using cPickle

Aaron Brady castironpi at gmail.com
Thu Jan 29 00:08:40 CET 2009


On Jan 28, 4:43 pm, perfr... at gmail.com wrote:
> On Jan 28, 5:14 pm, John Machin <sjmac... at lexicon.net> wrote:
>
>
>
> > On Jan 29, 3:13 am, perfr... at gmail.com wrote:
>
> > > hello all,
>
> > > i have a large dictionary which contains about 10 keys, each key has a
> > > value which is a list containing about 1 to 5 million (small)
> > > dictionaries. for example,
>
> > > mydict = {key1: [{'a': 1, 'b': 2, 'c': 'hello'}, {'d', 3, 'e': 4, 'f':
> > > 'world'}, ...],
> > >                 key2: [...]}
>
> > > in total there are about 10 to 15 million lists if we concatenate
> > > together all the values of every key in 'mydict'. mydict is a
> > > structure that represents data in a very large file (about 800
> > > megabytes).
snip

> in reply to the other poster: i thought 'shelve' simply calls pickle.
> if thats the case, it wouldnt be any faster, right ?

Yes, but not all at once.  It's a clear winner if you need to update
any of them later, but if it's just write-once, read-many, it's about
the same.

You said you have a million dictionaries.  Even if each took only one
byte, you would still have a million bytes.  Do you expect a faster I/
O time than the time it takes to write a million bytes?

I want to agree with John's worry about RAM, unless you have several+
GB, as you say.  You are not dealing with small numbers.



More information about the Python-list mailing list