Pickling speed [was: Re: eval(repr(x)) == x]

François Pinard pinard at iro.umontreal.ca
Mon Jan 28 13:35:44 CET 2002

[Jonathan Hogg]

> I once wrote an interpreter for a domain specific language in Haskell. It
> would parse and typecheck the input language to produce a data structure
> suitable for interpreting. To save the time of parsing and typechecking
> repeatedly I wrote the entire data structure out to a file.  Unfortunately,
> the resulting file was twice as big as the original input and took four
> times longer to read back in again as it had taken to generate in the
> first place.

In my little Pymacs, there is an example named `rebox.py', meant to refill
comments within boxes, or switch between various box styles.  It may be
used either from within Emacs, interactively, or in batch over a single box.

The main effort of `rebox.py' is recognising the box style in use, before
starting the reformatting work.  So, it organises and compiles a lot of
regexps as part of its initialisation, which takes more than one second on
my home machine.  When used from within Emacs, this second is bearable,
because the initialisation is done for the first refilled box only,
not for all succeeding ones in the same Emacs session.  But in batch,
the initialisation is restarted for each and every box, and `rebox.py'
is not as instantaneous as I would like it to be.

Wanting to save this initialisation time, I added some mechanics to
`pybox.py' so it cPickles all its recognition/rebuilding data on disk after
generation, and just read back this data from the pickle given it exists.
The goal was to skip the long initialisation.  To my surprise, I did not get
any speed improvement this way, it was a bit longer to unpickle the data
than to initialise it afresh, would it mean a lot of `re.compile' calls.
So, I merely removed the pickling mechanics and decided to live with the
initialisation time.  I guess I learned (:-) that pickling is not worth
doing unless one as rather very lengthy initialisations, that is, much
more than in the `rebox.py' case.

Does this match the experience of others, with regard to cPickle speed?

François Pinard   http://www.iro.umontreal.ca/~pinard

More information about the Python-list mailing list