[Python-bugs-list] [ python-Bugs-471893 ] Security review of pickle/marshal docs

noreply@sourceforge.net noreply@sourceforge.net
Sat, 10 Nov 2001 14:07:19 -0800


Bugs item #471893, was opened at 2001-10-16 15:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=471893&group_id=5470

Category: Extension Modules
Group: None
Status: Open
Resolution: None
Priority: 7
Submitted By: Tim Peters (tim_one)
Assigned to: Barry Warsaw (bwarsaw)
Summary: Security review of pickle/marshal docs

Initial Comment:
Paul Rubin points out that the security implications 
of using marshal and/or pickle aren't clear from the 
docs.  Assigning to Jeremy as he's more sensitive to 
such issues than I am; maybe Barry would like to get 
paranoid too <wink>.

A specific example:  the pickle docs say that pickle 
doesn't support code objects, and "at least this 
avoids the possibility of smuggling Trojan horses into 
a program".  However,

1) The marshal docs don't mention this vulnerability 
at all.

while

2) The pickle docs don't spell out possible dangers 
due to things pickle does that marshal doesn't (like 
importing modules, and running class constructors).

----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2001-11-10 14:07

Message:
Logged In: YES 
user_id=31392

I don't think we should be doing anything about marshal.  
Maybe we should name in pyclib or something <0.9 wink>.  It 
works fine for .pyc files, but I don't see a reason for it 
to do anymore than is necessary for that purpose.

I think the notion of an unpickler that only handles 
builtin datatypes is the most attractive option you offer, 
but Paul has a good point about eval for strings.  (It 
currently has some hacks that address specific exploits, 
but I doubt they are sufficient.)  I also wonder how hard 
it is to handle builtin types and avoid subclasses of 
builtin types.

If there are any changes to pickle, I think we need to be 
careful about how it is described.  If we claim that an 
unpickler is safe for untrusted pickles, we've made a 
fairly strong claim.  I still think such a design change 
requires a PEP that includes some requirements and use 
cases and a thorough analysis of potential exploits. 


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-10 11:42

Message:
Logged In: NO 

See bug #467384 for discussion about marshal.  Besides the
recursion issue, marshal's format is explicitly undocumented
and subject to change--you can't rely on it to interoperate
between two different Python versions, so it's no good as
an RPC serializer.  The format has kludges (e.g. the
representation of long ints) that make it undesirable to
freeze and document it and force future versions to be
backward compatible.

Adding a pickle.loads flag to prevent instance unpickling
isn't perfect but is probably the best alternative on 
your list.  Perhaps the flag can have a value that allows
unpickling the instances by restoring the instance 
attributes rather than calling the initializer.  That's
not always the right way to unpickle an instance (that's
why the unpickler no longer works that way) but it's good
enough a lot of the time.  

There's another issue with pickle/cPickle which is that they
unpickle quoted strings by evaling them.  This is scary.
While I don't see an immediate exploit, I also haven't
examined the 1000's of lines of code I'd need to examine
to convince myself that there's NOT an exploit.  I think
the unpickler should be changed to never call eval but just
parse the string as it needs to.  

Guido seemed to think pickle might have other possible
exploits.  I don't know what he had in mind but before
declaring it safe for untrusted data I think it needs to
be gone over with a fine toothed comb.

Paul


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2001-11-10 09:37

Message:
Logged In: YES 
user_id=12800

I'm going to agree with Paul that this is a problem needing
fixing, however there are really several issues.

1. Cookie module makes it too easy to code exploits.  Cookie
exports a class, also called Cookie, which is aliased to
SmartCookie, so that a naive program will simply pass cookie
data to Cookie.Cookie() and you're screwed.  So, Cookie
module's defaults should be for more security rather than
less, and Cookie.Cookie should be aliased to SimpleCookie
instead.

2. There is no built-in safe mechanism for de-serializing
data from untrusted sources.  You can't use pickle without
overloading a "magic" method.  You can't use cPickle because
you can't do the overloading trick.  You can't use marshal
because it isn't bulletproof against recursive
datastructures.  So how /do/ you do it?

I think it just may be serious enough to deal with in Python
2.2, and I volunteer to address it (so I'll steal this bug
report).  Without looking at the code, or the level of
effort necessary, I would make the following suggestions:

1. Add to the public interface of pickle and cPickle, a flag
that either disables the unpickling of instances altogether,
or disables calling any code with unpickled data, e.g.
constructors.

2. Fix marshal to be bulletproof against recursive
datastructures.
3. Update the docs for both pickle/cPickle and marshal to
explain how to safely write de-serializers of untrusted strings.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-09 01:21

Message:
Logged In: NO 

Well, Guido and Tim agree with you that it's not a pickle
bug.  I still feel it is one, because its docs currently
make you think you can securely load untrusted pickles, and
because it's a natural, non-obscure thing to want to do
(see pyro and the cookie module), but whatever.  If it's
not a code bug then I feel it's a significant functionality
shortcoming in the python library.

Pyro uses pickle to serialize data for RPC calls over the
internet.  A malicious client could make a hostile pickle
take over the server.  The cookie module lets web
applications store user session state in browser cookies.
Its SerialCookie and SmartCookie classes let you put
arbitrary Python objects into the user session, and
serializes them when pickle.  Again, a malicious client
can make a hostile pickle, send it in a cookie header to
the http server, and take over the server when the 
application unpickles the cookie.

The current documentation for the pickle module makes it
very clear to me that the doc writer thought it was safe
to unpickle untrusted cookies.  If pickle wasn't designed
for that, then there was a communication failure between
the designer and the doc writer.

Yes, I'm willing to help with a PEP for fixing this
situation.

Paul


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-11-08 09:37

Message:
Logged In: YES 
user_id=31392

I don't think of the issue you describe as a bug in the
code.  You're suggesting a new feature for pickle.  As far
as I can tell, the original design requirements for pickle
did not include the ability to securely load a pickle from
an untrusted source.

It may be a legitimate feature request, but it's too late to
make it into Python 2.2.  I suggest we look at the design
issues for Python 2.3 and decide if it's a feature we want
to support.  I imagine a PEP may be necessary to lay out the
issues and the solution.  Do you want to have a hand in that
PEP?

I still don't understand what it means that Pyro and cookie
were bit by a bug.  It sounds like they were using pickle in
ways that pickle was not intended to support.  A careful
analysis of how those two applications use pickle would be
helpful for generating requirements.


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-07 15:54

Message:
Logged In: NO 

IMO it's a code bug that you can't unpickle strings from
untrusted sources.  Pyro and the cookie module are examples
of programs that got bitten by this bug.  Whether it's
really a bug is a matter of opinion--I had a big email
exchange with Guido and Tim about it, and they felt it
was enough to fix the pickle documentation.

Pickle has the same problem as cPickle, but with pickle
you can subclass the pickler and override the method that
unpickles class objects, and work around the (IMO) bug.
The workaround doesn't help cPickle since cPickle can't
be subclassed.  See bug #467384 for some related discussion.

Paul


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-11-07 14:02

Message:
Logged In: YES 
user_id=31392

What's the code bug?  Your last message has a lot of gloom
and doom <wink>, but I'm not sure what specific problem
you'd like to see fixed.  Are you saying that something
needs to be fixed in cPickle and not in pickle?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-07 12:08

Message:
Logged In: NO 

Irmen de Jong points out that the standard cookie module
uses pickling for serial and smart cookies.  The 2.1.1
cookie module docs explicitly say not to use those
classes because of the security hole--that they're provided
for backward compatibility only (but with what?!  Any
server that uses those classes on open ports needs to be
changed right away).

Irmen's library, http://pyro.sourceforge.net, also uses
unpickle insecurely (he's aware of this now and is figuring
out a fix).

IMO this is really a code bug rather than a documentation
bug, and should be fixed in the code rather than the docs.
Documenting the bug rather than fixing it leaves a
deficiency in the Python library: obvious uses of pickling,
like Pyro and the cookie module, can't be implemented
using cPickle and have to resort to a slower Python
deserializer, or use marshal and have possible compatibility
problems between versions (and not be able to serialize
class instances).

Paul



----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2001-10-16 16:06

Message:
Logged In: YES 
user_id=72053

Certainly anyone unserializing potentially malicious data
with pickle, marshal, or anything else, should check the
results before doing anything dangerous with them (like
executing code).  However, unpickle can potentially do
damage before it even returns, by creating loading modules
and creating initialized class instances.  So pickle.loads
should never be used on untrusted strings, except possibly
in a rexec wrapper (as proposed by Tim).  Another
possibility (also by Tim) is to override the load_inst
method of the Pickler class, though I don't think you can
do that for cPickle.

A sample exploit for unpickle can be found at
<http://www.nightsong.com/phr/python/pickletest.py>.
Unpickling the test string runs penguin.__init__ contrary
to the doc's saying no initialization unless there's a
__getinitargs__ method in the class def.

The "exploding penguin" class is artificial, but
applications are vulnerable if there's an unsafe
constructor anywhere in any class of the application or in
the python library (example: the NNTP constructor opens an
IP connection to an arbitrary address, so a malicious
imported string can send a message through your firewall
when you import it).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=471893&group_id=5470