pickle vs .pyc
Michael Hudson
mwh21 at cam.ac.uk
Wed Jun 2 16:57:06 EDT 1999
Michael Vezie <mlv at pobox.com> writes:
> I need to be able to read a couple very complex (dictionary of arrays
> of dictionaries, and array of dictionaries of array of dictionaries)
> data structures into python. To generate it by hand takes too long,
> so I want to generate it once, and read it each time (the data doesn't
> change).
>
> The obvious choice is, of course pickle, or some flavor thereof.
> But can someone tell me why this wouldn't be faster:
>
> In the code that does the "pickling", simply do:
> f = open("cache.py", "w")
> f.write("# cache file for fast,slow\n")
> f.write("fast = "+`fast`+'\n')
> f.write("slow = "+`slow'+'\n')
> f.close()
> import cache
>
> Then, later, when I want the data, I just do:
>
> from cache import fast,slow
>
> and it's right there. It's compiled, and seems really fast (loading a
> 50k file in .12 seconds). I just tried the same data using cPickle, and
> it took 1.4 seconds. It's also not as portable. There is a space savings
> with pickle, but it's only 5% (well, 56% if you count both the .py and
> .pyc files), but that doesn't really matter to me.
>
> Am I missing something here? This sounds like an obvious, and fast,
> way to do things. True, the caching part may take longer. But I
> really don't care about that, since it's done only once, and in the
> background.
>
> Michael
Hmm, you're relying on all the data you're storing having faithful
__repr__ methods. This certainly isn't universally true. I'd regard
this method as too fragile.
If you're only storing simple data (by which I mean simple types of
data, not that the data is simple) (and I think you must be for the
approach you're using to work) give the marshal module a whirl.
I think it will be substantially faster than your repr-based method
(cryptic hint: if it wasn't, the marshal module probably wouldn't
exist).
Eg:
import marshal
complex_data_structure = {'key1':['nested list'],9:"mixed types"}
marshal.dump(complex_data_structure,open('/tmp/foo','w'))
print marshal.load(open('/tmp/foo'))
HTH
Michael
Random aside: something fishy's going on when I try to try to marshal
*arrays* (as opposed to mere lists):
>>> import array,marshal
>>> marshal.loads(marshal.dumps(array.array('f',[0,1])))
'\000\000\000\000\000\000\200?'
>>>
That shouldn't be happening should it? Surely that should be raising
an unmarshalable object exception? Oh well...
More information about the Python-list
mailing list