pickle vs .pyc

Michael Hudson mwh21 at cam.ac.uk
Wed Jun 2 17:02:32 EDT 1999


Michael Hudson <mwh21 at cam.ac.uk> writes:

> Michael Vezie <mlv at pobox.com> writes:
> 
> > I need to be able to read a couple very complex (dictionary of arrays 
> > of dictionaries, and array of dictionaries of array of dictionaries) 
> > data structures into python.  To generate it by hand takes too long, 
> > so I want to generate it once, and read it each time (the data doesn't 
> > change).
> > 
> > The obvious choice is, of course pickle, or some flavor thereof.
> > But can someone tell me why this wouldn't be faster:
> > 
> > In the code that does the "pickling", simply do:
> > f = open("cache.py", "w")
> > f.write("# cache file for fast,slow\n")
> > f.write("fast = "+`fast`+'\n')
> > f.write("slow = "+`slow'+'\n')
> > f.close()
> > import cache
> > 
> > Then, later, when I want the data, I just do:
> > 
> > from cache import fast,slow
> > 
> > and it's right there.  It's compiled, and seems really fast (loading a 
> > 50k file in .12 seconds).  I just tried the same data using cPickle, and 
> > it took 1.4 seconds.  It's also not as portable.  There is a space savings 
> > with pickle, but it's only 5% (well, 56% if you count both the .py and 
> > .pyc files), but that doesn't really matter to me.
> > 
> > Am I missing something here?  This sounds like an obvious, and fast, 
> > way to do things.  True, the caching part may take longer.  But I 
> > really don't care about that, since it's done only once, and in the 
> > background.  
> > 
> > Michael
> 
> Hmm, you're relying on all the data you're storing having faithful
> __repr__ methods. This certainly isn't universally true. I'd regard
> this method as too fragile.
> 
> If you're only storing simple data (by which I mean simple types of
> data, not that the data is simple) (and I think you must be for the
> approach you're using to work) give the marshal module a whirl.
> 
> I think it will be substantially faster than your repr-based method
> (cryptic hint: if it wasn't, the marshal module probably wouldn't
> exist).
> 
> Eg:
> 
> import marshal
> 
> complex_data_structure = {'key1':['nested list'],9:"mixed types"}
> 
> marshal.dump(complex_data_structure,open('/tmp/foo','w'))
> 
> print marshal.load(open('/tmp/foo'))
> 
> HTH
> Michael

Duh! Of course, once you've imported cache.py once, it's compiled to a
.pyc file and all the literals within it will be marshalled
anyway. Still using marshal directly is certainly more robust and
probably faster...

one-day-I'll-learn-to-read-thesubject-as-part-of-the-message-ly y'rs
Michael




More information about the Python-list mailing list