pickle vs .pyc
Michael Hudson
mwh21 at cam.ac.uk
Wed Jun 2 17:02:32 EDT 1999
Michael Hudson <mwh21 at cam.ac.uk> writes:
> Michael Vezie <mlv at pobox.com> writes:
>
> > I need to be able to read a couple very complex (dictionary of arrays
> > of dictionaries, and array of dictionaries of array of dictionaries)
> > data structures into python. To generate it by hand takes too long,
> > so I want to generate it once, and read it each time (the data doesn't
> > change).
> >
> > The obvious choice is, of course pickle, or some flavor thereof.
> > But can someone tell me why this wouldn't be faster:
> >
> > In the code that does the "pickling", simply do:
> > f = open("cache.py", "w")
> > f.write("# cache file for fast,slow\n")
> > f.write("fast = "+`fast`+'\n')
> > f.write("slow = "+`slow'+'\n')
> > f.close()
> > import cache
> >
> > Then, later, when I want the data, I just do:
> >
> > from cache import fast,slow
> >
> > and it's right there. It's compiled, and seems really fast (loading a
> > 50k file in .12 seconds). I just tried the same data using cPickle, and
> > it took 1.4 seconds. It's also not as portable. There is a space savings
> > with pickle, but it's only 5% (well, 56% if you count both the .py and
> > .pyc files), but that doesn't really matter to me.
> >
> > Am I missing something here? This sounds like an obvious, and fast,
> > way to do things. True, the caching part may take longer. But I
> > really don't care about that, since it's done only once, and in the
> > background.
> >
> > Michael
>
> Hmm, you're relying on all the data you're storing having faithful
> __repr__ methods. This certainly isn't universally true. I'd regard
> this method as too fragile.
>
> If you're only storing simple data (by which I mean simple types of
> data, not that the data is simple) (and I think you must be for the
> approach you're using to work) give the marshal module a whirl.
>
> I think it will be substantially faster than your repr-based method
> (cryptic hint: if it wasn't, the marshal module probably wouldn't
> exist).
>
> Eg:
>
> import marshal
>
> complex_data_structure = {'key1':['nested list'],9:"mixed types"}
>
> marshal.dump(complex_data_structure,open('/tmp/foo','w'))
>
> print marshal.load(open('/tmp/foo'))
>
> HTH
> Michael
Duh! Of course, once you've imported cache.py once, it's compiled to a
.pyc file and all the literals within it will be marshalled
anyway. Still using marshal directly is certainly more robust and
probably faster...
one-day-I'll-learn-to-read-thesubject-as-part-of-the-message-ly y'rs
Michael
More information about the Python-list
mailing list