New subject: The memo of pickle

9 Aug 2002

      ...
The slowdown of text-mode pickle is due to the extremely expensive way
of unpickling pickled strings in text-mode: it invokes eval() (well,
PyRun_String()) to parse the string literal!  (After checking that
there's really only a string literal there to prevent trojan horses.)
After re-reading the quoted thread, there was another phenomenon
remarked upon there: the slow text-mode pickle used less memory.  I
noticed this too when I ran the test program.  The explanation is that
the strings in the test program were "key0", "key1", ... "key24" and
"value0" ... "value24", over and over (each test dict has the same
keys and values).  Because these literals look like identifiers, they
are interned, so the unpickled data structure shares the string
references -- while the original test data has 10,000 copies of each
string!

If we really want this as a feature, a call to
PyString_InternFromString() could be made under certain conditions in
load_short_binstring() (e.g. when the length is at most 10 and
all_name_chars() from compile.c returns true).

I'm not sure that this is a desirable feature though.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Re: [Python-Dev] The memo of pickle

Guido van Rossum

Tim Peters

Guido van Rossum

Oren Tirosh

Tim Peters

Guido van Rossum

Oren Tirosh

Guido van Rossum

Guido van Rossum

Neil Schemenauer

Guido van Rossum

Tim Peters

Greg Ewing

tags

participants (5)