[Python-bugs-list] [ python-Bugs-502503 ] pickle interns strings

noreply@sourceforge.net noreply@sourceforge.net
Fri, 11 Jan 2002 13:21:51 -0800


Bugs item #502503, was opened at 2002-01-11 13:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470

Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Kelley (wc2so1)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: pickle interns strings

Initial Comment:
Pickle (and cPickle) use eval to reconstruct string
variables from the stored format.  Eval is used because
it correctly reconstructs the repr of a string back
into the original string object by translating all the
appropriately escape characters like "\m" and "\n"

There is an side effect in that eval interns string
variables for faster lookup.

This causes the following sample code to unexpectedly
grow in memory consumption:

import pickle
import random
import string

def genstring(length=100):
    s = [random.choice(string.letters) for x in
range(length)]
    return "".join(s)

def test():
    while 1:
        s = genstring()
        dump = pickle.dumps(s)
        s2 = pickle.loads(dump)
        assert s == s2

test()

Note that all strings are not interned, just ones that,
as Tim Peters once said, "look like", variable names. 
The above example is contrived to generate a lot of
different names that "look like" variables names but
since this has happened in practice it probably should
documented.

Interestingly, by inserting
 s.append(" ")
before
 return "".join(s)

The memory consumption is not seen because the names no
longer "look like" variable names.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470