[Python-bugs-list] [ python-Bugs-471893 ] Security review of pickle/marshal docs

noreply@sourceforge.net noreply@sourceforge.net
Thu, 15 Nov 2001 16:12:46 -0800


Bugs item #471893, was opened at 2001-10-16 15:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=471893&group_id=5470

Category: Extension Modules
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 7
Submitted By: Tim Peters (tim_one)
Assigned to: Barry Warsaw (bwarsaw)
Summary: Security review of pickle/marshal docs

Initial Comment:
Paul Rubin points out that the security implications 
of using marshal and/or pickle aren't clear from the 
docs.  Assigning to Jeremy as he's more sensitive to 
such issues than I am; maybe Barry would like to get 
paranoid too <wink>.

A specific example:  the pickle docs say that pickle 
doesn't support code objects, and "at least this 
avoids the possibility of smuggling Trojan horses into 
a program".  However,

1) The marshal docs don't mention this vulnerability 
at all.

while

2) The pickle docs don't spell out possible dangers 
due to things pickle does that marshal doesn't (like 
importing modules, and running class constructors).

----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2001-11-15 16:12

Message:
Logged In: YES 
user_id=12800

I have rewritten the pickle documentation, and updated the
marshal, cPickle, and copy_reg documentation.  I believe all
the documentation issues raised here have been fixed.  

Implementation issues will be pushed to Python 2.3, and the
plan is to write a PEP to plan future work.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2001-11-13 13:39

Message:
Logged In: YES 
user_id=12800

Assigning to Jeremy so he'll remember to forward me Jim's
email.  Jeremy can ping-pong it to Fred if he wants, and
then reassign to me after forwarding the email.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-12 22:18

Message:
Logged In: NO 

The find_global variable sounds encouraging and if it
fixes this problem, that's great.  I'd still feel better
if the eval call goes away.  Is removing it allowed
under the 2.2 feature freeze?  It doesn't affect anything
visible, so no tests would have to change, etc.

Paul


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-12 16:38

Message:
Logged In: NO 

It's possible that the eval is safe, but validating that
takes a line by line code review of all the paths that
evaling a string can go through.  It's also brittle in
that maybe someone will change the evaluator (say to
support Perl-like interpolation) in a future Python version
and not remember to change the unpickler.  Something like
that has already happened with the Cookie module.  My
guess is that it happened with the pickle module--the
whole point of that quoted string check is that the 
original pickle implementer didn't trust the input. 
The stuff about unpickling class instances was added
later (maybe by another person) without remembering the
security issue. 

Using eval this way is like storing a vat of cyanide in
your child's bedroom.  Sure, maybe if you check the seals
and locks on the vat carefully enough, you can convince
yourself that your child won't be able to get to the
cyanide.  But wouldn't you feel safer just not having the
vat there at all?  That's basic safety-consciousness.
Security consciousness works the same way.  Try to keep
dangerous ingredients and attackers as far away from each
other as possible.

Paul



----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-11-12 10:52

Message:
Logged In: YES 
user_id=31435

Why are people (Paul, Jeremy) concerned about eval'ing 
strings?  cPickle and pickle both check that they're 
properly quoted, and this isn't sh or Perl:  Python has 
no "dynamic" gimmicks buried in string literals.  All 
eval'ing a string literal can do is produce a binary blob 
via interpeting simple escape sequences.  They're like C 
strings this way -- maybe we'll run out of memory, but 
that's it.

I would agree that Python should be refactored internally 
to supply a clean function for changing string literals 
into binary blobs, but that would be for clarity and 
efficiency, not security.

----------------------------------------------------------------------

Comment By: Jim Fulton (dcjim)
Date: 2001-11-12 10:14

Message:
Logged In: YES 
user_id=73023

It sounds like there are some documentation bugs:

- The security ramifications are not discussed, nor are the
remedies.

- The cPickle module isn't documented very well. I submitted
some
   documentation a long time ago that never got incorporated
AFAIK.
   I wish I still had it. :)

- cPickle has a feature for turning off instance support and
for 
   restricting which classes can be unpickled. You can set
the find_global
   attribute on a cPickle.Unpickler. The find_global
attribute can be 
   a function or None.  If it is None, then no instances can
be 
   unpickled. If it is a function, then it should accept a
module and name
   and return the corresponding global. It is responsible
for looking
   up the global and can raise an error to prevent a
global's use.

   See the ZEO storage server implementation for an example
   of using this hook.


  



----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-11 22:33

Message:
Logged In: NO 

I like marshal and think it's much cleaner than pickle.
I'd like to see its format cleaned up and documented
so it can be a portable transport format.  Even the
recursion issue isn't so compelling--there's only a danger
on marshalling, not unmarshalling, and the application
has control over what it marshals.  So I think the idea
of documenting marshal (bug #467384) should be kept alive.
However, it shouldn't be changed in 2.2.  

If the Cookie class is still really aliased by default to
SmartCookie, that's a bad bug and should definitely be
fixed ASAP.  The Cookie docs clearly say not to use
SmartCookie or SerialCookie.  In fact I think SmartCookie
and SerialCookie should be disabled altogether.  Any
applications using them on the open internet have a security
but and should be fixed immediately.

Here's a possible temporary approach to fixing SmartCookie
and SerialCookie: append a cryptographic message
authentication code (MAC) to each pickled cookie.  That
means the application has some secret string K as a
configuration parameter (it should be 10 or more random
8-bit characters, or 20 or more random hex digits, etc). 
Then a smart cookie would be a pickle p, followed by
SHA(p+K).  The Cookie module would validate the MAC before
calling unpickle, and raise an exception if the MAC wasn't
valid.

The security of this scheme rests on K being kept secret
from attackers, which is generally not an easy thing to
manage, but it's something to think about.

Paul

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-11-10 14:07

Message:
Logged In: YES 
user_id=31392

I don't think we should be doing anything about marshal.  
Maybe we should name in pyclib or something <0.9 wink>.  It 
works fine for .pyc files, but I don't see a reason for it 
to do anymore than is necessary for that purpose.

I think the notion of an unpickler that only handles 
builtin datatypes is the most attractive option you offer, 
but Paul has a good point about eval for strings.  (It 
currently has some hacks that address specific exploits, 
but I doubt they are sufficient.)  I also wonder how hard 
it is to handle builtin types and avoid subclasses of 
builtin types.

If there are any changes to pickle, I think we need to be 
careful about how it is described.  If we claim that an 
unpickler is safe for untrusted pickles, we've made a 
fairly strong claim.  I still think such a design change 
requires a PEP that includes some requirements and use 
cases and a thorough analysis of potential exploits. 


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-10 11:42

Message:
Logged In: NO 

See bug #467384 for discussion about marshal.  Besides the
recursion issue, marshal's format is explicitly undocumented
and subject to change--you can't rely on it to interoperate
between two different Python versions, so it's no good as
an RPC serializer.  The format has kludges (e.g. the
representation of long ints) that make it undesirable to
freeze and document it and force future versions to be
backward compatible.

Adding a pickle.loads flag to prevent instance unpickling
isn't perfect but is probably the best alternative on 
your list.  Perhaps the flag can have a value that allows
unpickling the instances by restoring the instance 
attributes rather than calling the initializer.  That's
not always the right way to unpickle an instance (that's
why the unpickler no longer works that way) but it's good
enough a lot of the time.  

There's another issue with pickle/cPickle which is that they
unpickle quoted strings by evaling them.  This is scary.
While I don't see an immediate exploit, I also haven't
examined the 1000's of lines of code I'd need to examine
to convince myself that there's NOT an exploit.  I think
the unpickler should be changed to never call eval but just
parse the string as it needs to.  

Guido seemed to think pickle might have other possible
exploits.  I don't know what he had in mind but before
declaring it safe for untrusted data I think it needs to
be gone over with a fine toothed comb.

Paul


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2001-11-10 09:37

Message:
Logged In: YES 
user_id=12800

I'm going to agree with Paul that this is a problem needing
fixing, however there are really several issues.

1. Cookie module makes it too easy to code exploits.  Cookie
exports a class, also called Cookie, which is aliased to
SmartCookie, so that a naive program will simply pass cookie
data to Cookie.Cookie() and you're screwed.  So, Cookie
module's defaults should be for more security rather than
less, and Cookie.Cookie should be aliased to SimpleCookie
instead.

2. There is no built-in safe mechanism for de-serializing
data from untrusted sources.  You can't use pickle without
overloading a "magic" method.  You can't use cPickle because
you can't do the overloading trick.  You can't use marshal
because it isn't bulletproof against recursive
datastructures.  So how /do/ you do it?

I think it just may be serious enough to deal with in Python
2.2, and I volunteer to address it (so I'll steal this bug
report).  Without looking at the code, or the level of
effort necessary, I would make the following suggestions:

1. Add to the public interface of pickle and cPickle, a flag
that either disables the unpickling of instances altogether,
or disables calling any code with unpickled data, e.g.
constructors.

2. Fix marshal to be bulletproof against recursive
datastructures.
3. Update the docs for both pickle/cPickle and marshal to
explain how to safely write de-serializers of untrusted strings.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-09 01:21

Message:
Logged In: NO 

Well, Guido and Tim agree with you that it's not a pickle
bug.  I still feel it is one, because its docs currently
make you think you can securely load untrusted pickles, and
because it's a natural, non-obscure thing to want to do
(see pyro and the cookie module), but whatever.  If it's
not a code bug then I feel it's a significant functionality
shortcoming in the python library.

Pyro uses pickle to serialize data for RPC calls over the
internet.  A malicious client could make a hostile pickle
take over the server.  The cookie module lets web
applications store user session state in browser cookies.
Its SerialCookie and SmartCookie classes let you put
arbitrary Python objects into the user session, and
serializes them when pickle.  Again, a malicious client
can make a hostile pickle, send it in a cookie header to
the http server, and take over the server when the 
application unpickles the cookie.

The current documentation for the pickle module makes it
very clear to me that the doc writer thought it was safe
to unpickle untrusted cookies.  If pickle wasn't designed
for that, then there was a communication failure between
the designer and the doc writer.

Yes, I'm willing to help with a PEP for fixing this
situation.

Paul


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-11-08 09:37

Message:
Logged In: YES 
user_id=31392

I don't think of the issue you describe as a bug in the
code.  You're suggesting a new feature for pickle.  As far
as I can tell, the original design requirements for pickle
did not include the ability to securely load a pickle from
an untrusted source.

It may be a legitimate feature request, but it's too late to
make it into Python 2.2.  I suggest we look at the design
issues for Python 2.3 and decide if it's a feature we want
to support.  I imagine a PEP may be necessary to lay out the
issues and the solution.  Do you want to have a hand in that
PEP?

I still don't understand what it means that Pyro and cookie
were bit by a bug.  It sounds like they were using pickle in
ways that pickle was not intended to support.  A careful
analysis of how those two applications use pickle would be
helpful for generating requirements.


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-07 15:54

Message:
Logged In: NO 

IMO it's a code bug that you can't unpickle strings from
untrusted sources.  Pyro and the cookie module are examples
of programs that got bitten by this bug.  Whether it's
really a bug is a matter of opinion--I had a big email
exchange with Guido and Tim about it, and they felt it
was enough to fix the pickle documentation.

Pickle has the same problem as cPickle, but with pickle
you can subclass the pickler and override the method that
unpickles class objects, and work around the (IMO) bug.
The workaround doesn't help cPickle since cPickle can't
be subclassed.  See bug #467384 for some related discussion.

Paul


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-11-07 14:02

Message:
Logged In: YES 
user_id=31392

What's the code bug?  Your last message has a lot of gloom
and doom <wink>, but I'm not sure what specific problem
you'd like to see fixed.  Are you saying that something
needs to be fixed in cPickle and not in pickle?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-07 12:08

Message:
Logged In: NO 

Irmen de Jong points out that the standard cookie module
uses pickling for serial and smart cookies.  The 2.1.1
cookie module docs explicitly say not to use those
classes because of the security hole--that they're provided
for backward compatibility only (but with what?!  Any
server that uses those classes on open ports needs to be
changed right away).

Irmen's library, http://pyro.sourceforge.net, also uses
unpickle insecurely (he's aware of this now and is figuring
out a fix).

IMO this is really a code bug rather than a documentation
bug, and should be fixed in the code rather than the docs.
Documenting the bug rather than fixing it leaves a
deficiency in the Python library: obvious uses of pickling,
like Pyro and the cookie module, can't be implemented
using cPickle and have to resort to a slower Python
deserializer, or use marshal and have possible compatibility
problems between versions (and not be able to serialize
class instances).

Paul



----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2001-10-16 16:06

Message:
Logged In: YES 
user_id=72053

Certainly anyone unserializing potentially malicious data
with pickle, marshal, or anything else, should check the
results before doing anything dangerous with them (like
executing code).  However, unpickle can potentially do
damage before it even returns, by creating loading modules
and creating initialized class instances.  So pickle.loads
should never be used on untrusted strings, except possibly
in a rexec wrapper (as proposed by Tim).  Another
possibility (also by Tim) is to override the load_inst
method of the Pickler class, though I don't think you can
do that for cPickle.

A sample exploit for unpickle can be found at
<http://www.nightsong.com/phr/python/pickletest.py>.
Unpickling the test string runs penguin.__init__ contrary
to the doc's saying no initialization unless there's a
__getinitargs__ method in the class def.

The "exploding penguin" class is artificial, but
applications are vulnerable if there's an unsafe
constructor anywhere in any class of the application or in
the python library (example: the NNTP constructor opens an
IP connection to an arbitrary address, so a malicious
imported string can send a message through your firewall
when you import it).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=471893&group_id=5470