marshal vs pickle

Wed Oct 31 15:27:16 EDT 2007

On Oct 31, 1:37 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> On Oct 31, 6:45 am, Aaron Watters <aaron.watt... at gmail.com> wrote:
>
> >  I like to use
> > marshal a lot because it's the absolutely fastest
> > way to store and load data to/from Python....
>
> I believe this FUD is somewhat out-of-date.  Marshalling
> became smarter about repeated and shared objects.  The
> pickle module (using mode 2) has a similar implementation
> to marshal

Raymond: happy days!  We are both right!
I just ran some tests from the test suite for
http://nucular.sourceforge.net with marshalling
and pickling switched in and out and to my
surprise I didn't find too much difference
on the "load" end (marshalling 10% faster),
but for the "bigLtreeTest.py" I found that
the build ("dump") process was about 1/3
slower with cPickle (mode 2/python2.4).  For
the more complex tests (mondial and gutenberg)
I found that the speed up for using marshal was
in the 1-2% range (and sometimes inverted
because of processor load I think, on a shared
hosting machine).

I'm pretty sure things were much worse for cPickle
many moons ago.  Nice to see that some things
get better :).  It makes sense that the
"dump" side would be slower because that's
where you need to remember all the objects
in case you see them again...

Anyway since it's easy and makes sense I think
the next version of nucular will have a
switchable option between marshal and cPickle
for persistant storage.

Thanks!  -- Aaron Watters

===
The pursuit of hypothetical performance
improvements is the root of all evil.
   -- Bill Tutt
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=tutt