[Python-bugs-list] [ python-Bugs-502503 ] pickle interns strings

noreply@sourceforge.net noreply@sourceforge.net
Fri, 11 Jan 2002 13:36:07 -0800


Bugs item #502503, was opened at 2002-01-11 13:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470

Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Kelley (wc2so1)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: pickle interns strings

Initial Comment:
Pickle (and cPickle) use eval to reconstruct string
variables from the stored format.  Eval is used because
it correctly reconstructs the repr of a string back
into the original string object by translating all the
appropriately escape characters like "\m" and "\n"

There is an side effect in that eval interns string
variables for faster lookup.

This causes the following sample code to unexpectedly
grow in memory consumption:

import pickle
import random
import string

def genstring(length=100):
    s = [random.choice(string.letters) for x in
range(length)]
    return "".join(s)

def test():
    while 1:
        s = genstring()
        dump = pickle.dumps(s)
        s2 = pickle.loads(dump)
        assert s == s2

test()

Note that all strings are not interned, just ones that,
as Tim Peters once said, "look like", variable names. 
The above example is contrived to generate a lot of
different names that "look like" variables names but
since this has happened in practice it probably should
documented.

Interestingly, by inserting
 s.append(" ")
before
 return "".join(s)

The memory consumption is not seen because the names no
longer "look like" variable names.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-01-11 13:36

Message:
Logged In: YES 
user_id=31435

Noting that Security Geeks are uncomfortable with using eval
() for this purpose regardless.  Would be good if Python 
got refactored so that pickle and cPickle and the front end 
all called a new routine that simply parsed the escape 
sequences in a character buffer, returning a Python string 
object.

Don't ask me about Unicode <wink>.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470