[Python-bugs-list] [ python-Bugs-502503 ] pickle interns strings
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 11 Jan 2002 13:36:07 -0800
Bugs item #502503, was opened at 2002-01-11 13:21
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470
Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Kelley (wc2so1)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: pickle interns strings
Initial Comment:
Pickle (and cPickle) use eval to reconstruct string
variables from the stored format. Eval is used because
it correctly reconstructs the repr of a string back
into the original string object by translating all the
appropriately escape characters like "\m" and "\n"
There is an side effect in that eval interns string
variables for faster lookup.
This causes the following sample code to unexpectedly
grow in memory consumption:
import pickle
import random
import string
def genstring(length=100):
s = [random.choice(string.letters) for x in
range(length)]
return "".join(s)
def test():
while 1:
s = genstring()
dump = pickle.dumps(s)
s2 = pickle.loads(dump)
assert s == s2
test()
Note that all strings are not interned, just ones that,
as Tim Peters once said, "look like", variable names.
The above example is contrived to generate a lot of
different names that "look like" variables names but
since this has happened in practice it probably should
documented.
Interestingly, by inserting
s.append(" ")
before
return "".join(s)
The memory consumption is not seen because the names no
longer "look like" variable names.
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2002-01-11 13:36
Message:
Logged In: YES
user_id=31435
Noting that Security Geeks are uncomfortable with using eval
() for this purpose regardless. Would be good if Python
got refactored so that pickle and cPickle and the front end
all called a new routine that simply parsed the escape
sequences in a character buffer, returning a Python string
object.
Don't ask me about Unicode <wink>.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470