[Python-bugs-list] [ python-Bugs-502503 ] pickle interns strings
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 14 Aug 2002 01:00:05 -0700
Bugs item #502503, was opened at 2002-01-11 22:21
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470
Category: Documentation
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Brian Kelley (wc2so1)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: pickle interns strings
Initial Comment:
Pickle (and cPickle) use eval to reconstruct string
variables from the stored format. Eval is used because
it correctly reconstructs the repr of a string back
into the original string object by translating all the
appropriately escape characters like "\m" and "\n"
There is an side effect in that eval interns string
variables for faster lookup.
This causes the following sample code to unexpectedly
grow in memory consumption:
import pickle
import random
import string
def genstring(length=100):
s = [random.choice(string.letters) for x in
range(length)]
return "".join(s)
def test():
while 1:
s = genstring()
dump = pickle.dumps(s)
s2 = pickle.loads(dump)
assert s == s2
test()
Note that all strings are not interned, just ones that,
as Tim Peters once said, "look like", variable names.
The above example is contrived to generate a lot of
different names that "look like" variables names but
since this has happened in practice it probably should
documented.
Interestingly, by inserting
s.append(" ")
before
return "".join(s)
The memory consumption is not seen because the names no
longer "look like" variable names.
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2002-08-14 10:00
Message:
Logged In: YES
user_id=21627
This is fixed with patch 505705.
----------------------------------------------------------------------
Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 03:03
Message:
Logged In: YES
user_id=35752
See patch 505705 for a slightly different solution.
----------------------------------------------------------------------
Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 22:21
Message:
Logged In: YES
user_id=35752
Attached is a first stab at documentation.
----------------------------------------------------------------------
Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 21:27
Message:
Logged In: YES
user_id=35752
Okay, I moved unquote to a method of str. I also fixed
pickle.py and the tests (no need to test for insecure
strings).
Fred, do you have to time to write documentation for
_PyString_Unquote and str.unquote?
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-22 20:12
Message:
Logged In: YES
user_id=31435
I haven't tried the patch, but yes, it looks sane to me. A
deprecated module is definitely a poor place to add a new
feature <wink>; it makes as much sense as a string method
as, say, .upper(), right? That is, why not? String in,
string out.
----------------------------------------------------------------------
Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 19:35
Message:
Logged In: YES
user_id=35752
Attached is a patch that implements _PyString_Unquote
and strop.unquote. strop is probably the wrong place since
it's deprecated but I'm not sure if unquote as a string method
is right either. cPickle.c is changed to use
_PyString_Unquote instead of calling eval. pickle.py
still needs to be fixed, documentation added, tests fixed.
Before all that, does this patch look sane?
----------------------------------------------------------------------
Comment By: paul rubin (phr)
Date: 2002-02-16 02:26
Message:
Logged In: YES
user_id=72053
I agree about eval being dangerous. Also, the memory leak
is itself a security concern: if an attacker can stuff
enough strings into the unpickler to exhaust memory, that's
a denial of service attack.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-01-11 22:36
Message:
Logged In: YES
user_id=31435
Noting that Security Geeks are uncomfortable with using eval
() for this purpose regardless. Would be good if Python
got refactored so that pickle and cPickle and the front end
all called a new routine that simply parsed the escape
sequences in a character buffer, returning a Python string
object.
Don't ask me about Unicode <wink>.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470