[Python-bugs-list] [ python-Bugs-502503 ] pickle interns strings

noreply@sourceforge.net noreply@sourceforge.net
Wed, 14 Aug 2002 01:00:05 -0700


Bugs item #502503, was opened at 2002-01-11 22:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470

Category: Documentation
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Brian Kelley (wc2so1)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: pickle interns strings

Initial Comment:
Pickle (and cPickle) use eval to reconstruct string
variables from the stored format.  Eval is used because
it correctly reconstructs the repr of a string back
into the original string object by translating all the
appropriately escape characters like "\m" and "\n"

There is an side effect in that eval interns string
variables for faster lookup.

This causes the following sample code to unexpectedly
grow in memory consumption:

import pickle
import random
import string

def genstring(length=100):
    s = [random.choice(string.letters) for x in
range(length)]
    return "".join(s)

def test():
    while 1:
        s = genstring()
        dump = pickle.dumps(s)
        s2 = pickle.loads(dump)
        assert s == s2

test()

Note that all strings are not interned, just ones that,
as Tim Peters once said, "look like", variable names. 
The above example is contrived to generate a lot of
different names that "look like" variables names but
since this has happened in practice it probably should
documented.

Interestingly, by inserting
 s.append(" ")
before
 return "".join(s)

The memory consumption is not seen because the names no
longer "look like" variable names.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-08-14 10:00

Message:
Logged In: YES 
user_id=21627

This is fixed with patch 505705.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 03:03

Message:
Logged In: YES 
user_id=35752

See patch 505705 for a slightly different solution.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 22:21

Message:
Logged In: YES 
user_id=35752

Attached is a first stab at documentation.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 21:27

Message:
Logged In: YES 
user_id=35752

Okay, I moved unquote to a method of str.  I also fixed
pickle.py and the tests (no need to test for insecure
strings).

Fred, do you have to time to write documentation for 
_PyString_Unquote and str.unquote?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 20:12

Message:
Logged In: YES 
user_id=31435

I haven't tried the patch, but yes, it looks sane to me.  A 
deprecated module is definitely a poor place to add a new 
feature <wink>; it makes as much sense as a string method 
as, say, .upper(), right?  That is, why not?  String in, 
string out.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 19:35

Message:
Logged In: YES 
user_id=35752

Attached is a patch that implements _PyString_Unquote
and strop.unquote.  strop is probably the wrong place since
it's deprecated but I'm not sure if unquote as a string method
is right either.  cPickle.c is changed to use
_PyString_Unquote instead of calling eval.  pickle.py
still needs to be fixed, documentation added, tests fixed.
Before all that, does this patch look sane?

----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2002-02-16 02:26

Message:
Logged In: YES 
user_id=72053

I agree about eval being dangerous.  Also, the memory leak
is itself a security concern: if an attacker can stuff
enough strings into the unpickler to exhaust memory, that's
a denial of service attack.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-01-11 22:36

Message:
Logged In: YES 
user_id=31435

Noting that Security Geeks are uncomfortable with using eval
() for this purpose regardless.  Would be good if Python 
got refactored so that pickle and cPickle and the front end 
all called a new routine that simply parsed the escape 
sequences in a character buffer, returning a Python string 
object.

Don't ask me about Unicode <wink>.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=502503&group_id=5470