An official complaint regarding the marshal and pickle documentation
I just checked the python site documentation on marshal and pickle and I consider them to be irresponsibly and dangerously misleading. For example. Suppose Mercurial is implemented using pickle.load (I sure hope it isn't -- is it?). 1) I send someone a "patch" for their software claiming it makes their package run faster. 2) That person uses mercurial to "unpack" the patch and mercurial uses pickle.load. BAM! That person's filesystem is GONE! AND I'M NOT ASSUMING THAT THERE IS ANY BUG IN MERCURIAL! Now: suppose Mercurial is implemented using marshal: no such scenario is possible unless there is a security bug in mercurial where they explicitly execute something. RESOLVED: pickle should come with a large red label: WARNING: LARK'S VOMIT -- NEVER USE PICKLE TO IMPLEMENT UNTRUSTED ARCHIVING OF ANY KIND. It doesn't have one. Marshal needs no such label: but it has one: *Warning:* The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source. This is bullshit. Sorry, for the french and the caps, but this is REALLY IMPORTANT. -- Aaron Watters
On Wed, Mar 5, 2008 at 10:11 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
I just checked the python site documentation on marshal and pickle and I consider them to be irresponsibly and dangerously misleading. RESOLVED: pickle should come with a large red label:
WARNING: LARK'S VOMIT -- NEVER USE PICKLE TO IMPLEMENT UNTRUSTED ARCHIVING OF ANY KIND.
It doesn't have one.
So what is this [1] ? ''' Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. ''' You may want to check your facts better next time you go on a rampage. George [1] http://docs.python.org/lib/node314.html
On Wed, Mar 05, 2008 at 10:11:48AM -0500, Aaron Watters wrote:
RESOLVED: pickle should come with a large red label:
WARNING: LARK'S VOMIT -- NEVER USE PICKLE TO IMPLEMENT UNTRUSTED ARCHIVING OF ANY KIND.
http://docs.python.org/lib/node314.html "Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source." Enough for me, though it is not as big or as red... Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
In response to Oleg and George. Yes apparently there is an acknowledgement in some subordinate page somewhere that there might be some problem with security and pickle. This should be on the first page in bold face like the unneeded one for marshal. I missed it just now because I just looked at the first page for marshal and pickle, like most people probably would, sorry. Also this line from the marshal doc has got to go: "For general persistence and transfer of Python objects through RPC calls, see the modules pickle <http://docs.python.org/lib/module-pickle.html> and shelve <http://docs.python.org/lib/module-shelve.html>. " http://docs.python.org/lib/module-marshal.html which should read "For RPC calls never use pickle." And the security warning for marshal benieth it should be removed because it is nonsense. The implication of the current documentation is that most of my public projects contain serious security holes when they don't. And if you don't read the documentation carefully (like the implementers of Plone apparently didn't) the docs seem to suggest that pickle is somehow "safer" when it is about as unsafe as it could be. -- Aaron Watters
I'm assuming that someone confronted you with this security issue somehow? Otherwise I don't understand why you'd be so upset about it. BTW the warning for marshal is legit -- the C code that unpacks marshal data has not been carefully analyzed against buffer overflows and so on. Remember the first time someone broke into a system through a malicious JPEG? The same could happen with marshal. Seriously. I agree that the pickle module's warning needs to be moved to a more prominent place (Georg has probably aready done this by the time I'm finished typing this message :-). But I see no reason to get so upset about it as to use all caps. --Guido On Wed, Mar 5, 2008 at 8:11 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
In response to Oleg and George.
Yes apparently there is an acknowledgement in some subordinate page somewhere that there might be some problem with security and pickle. This should be on the first page in bold face like the unneeded one for marshal. I missed it just now because I just looked at the first page for marshal and pickle, like most people probably would, sorry.
Also this line from the marshal doc has got to go:
"For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. " http://docs.python.org/lib/module-marshal.html
which should read "For RPC calls never use pickle."
And the security warning for marshal benieth it should be removed because it is nonsense.
The implication of the current documentation is that most of my public projects contain serious security holes when they don't. And if you don't read the documentation carefully (like the implementers of Plone apparently didn't) the docs seem to suggest that pickle is somehow "safer" when it is about as unsafe as it could be.
-- Aaron Watters
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum schrieb:
I'm assuming that someone confronted you with this security issue somehow? Otherwise I don't understand why you'd be so upset about it.
BTW the warning for marshal is legit -- the C code that unpacks marshal data has not been carefully analyzed against buffer overflows and so on. Remember the first time someone broke into a system through a malicious JPEG? The same could happen with marshal. Seriously.
I agree that the pickle module's warning needs to be moved to a more prominent place (Georg has probably aready done this by the time I'm finished typing this message :-). But I see no reason to get so upset about it as to use all caps.
I used the time machine :) Though the warning is at the same location in <http://docs.python.org/dev/library/pickle>, since all pickle docs are on the same page it's visible enough in my opinion. cheers, Georg
I'd also agree, that the warning should be really prominent (especially since I just saw someone saying "for game states: Just pickle them", which could result in people getting problems when they get a mail saying "hey, look, I got to the 14th level"), but I don't think the warning was irresponsibly small. At least I saw it, when I began to learn python (but I had forgotten it until now). Maybe it could be replaced by yaml at some point, though, which offers a mode that doesn't execute everything (safe_load): http://pyyaml.org/wiki/PyYAMLDocumentation#LoadingYAML "safe_load(stream) parses the given stream and returns a Python object constructed from for the first document in the stream. If there are no documents in the stream, it returns None. safe_load recognizes only standard YAML tags and cannot construct an arbitrary Python object." And there's also a C implementation: http://pyyaml.org/browser/libyaml/trunk Which can be relicensed under the Python License: http://pyyaml.org/browser/libyaml/trunk/LICENSE Or pickle could get a safe_load function itself (if it doesn't yet have it). Best wishes, Arne El Wednesday, 5 de March de 2008 18:36:56 Guido van Rossum escribió:
I'm assuming that someone confronted you with this security issue somehow? Otherwise I don't understand why you'd be so upset about it.
BTW the warning for marshal is legit -- the C code that unpacks marshal data has not been carefully analyzed against buffer overflows and so on. Remember the first time someone broke into a system through a malicious JPEG? The same could happen with marshal. Seriously.
I agree that the pickle module's warning needs to be moved to a more prominent place (Georg has probably aready done this by the time I'm finished typing this message :-). But I see no reason to get so upset about it as to use all caps.
--Guido
On Wed, Mar 5, 2008 at 8:11 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
In response to Oleg and George.
Yes apparently there is an acknowledgement in some subordinate page somewhere that there might be some problem with security and pickle. This should be on the first page in bold face like the unneeded one for marshal. I missed it just now because I just looked at the first page for marshal and pickle, like most people probably would, sorry.
Also this line from the marshal doc has got to go:
"For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. " http://docs.python.org/lib/module-marshal.html
which should read "For RPC calls never use pickle."
And the security warning for marshal benieth it should be removed because it is nonsense.
The implication of the current documentation is that most of my public projects contain serious security holes when they don't. And if you don't read the documentation carefully (like the implementers of Plone apparently didn't) the docs seem to suggest that pickle is somehow "safer" when it is about as unsafe as it could be.
-- Aaron Watters
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Unpolitisch sein Heißt politisch sein Ohne es zu merken. - Arne Babenhauserheide ( http://draketo.de ) -- Weblog: http://blog.draketo.de -- Mein öffentlicher Schlüssel (PGP/GnuPG): http://draketo.de/inhalt/ich/pubkey.txt
Guido van Rossum wrote:
BTW the warning for marshal is legit -- the C code that unpacks marshal data has not been carefully analyzed against buffer overflows and so on.
I thought the main issue with marshal is that it's happy to create code objects, which pickle doesn't do -- ostensibly for security reasons. But if pickle is inherently insecure anyway, does the exclusion of code objects really make much difference? BTW, I only consider pickle suitable for quick and dirty uses anyway, because it ties the external representation very closely to internal details of your program, which can make it difficult to evolve the program without invalidating previously written files. For long-term use, it's better to invest time in a properly-thought-out external format for the task, designed with extensibility in mind. -- Greg
On Thu, Mar 06, 2008, Greg Ewing wrote:
BTW, I only consider pickle suitable for quick and dirty uses anyway, because it ties the external representation very closely to internal details of your program, which can make it difficult to evolve the program without invalidating previously written files.
For long-term use, it's better to invest time in a properly-thought-out external format for the task, designed with extensibility in mind.
Maybe so, but my company has been using pickle as a primary long-term storage mechanism for more than a decade. We only rarely have problems with code changes causing pickle problems (less than once per year). OTOH, we mostly only have a growing internal format -- we almost never change the internal format. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "All problems in computer science can be solved by another level of indirection." --Butler Lampson
On 06/03/2008, at 17:25, Aahz wrote:
Maybe so, but my company has been using pickle as a primary long-term storage mechanism for more than a decade. We only rarely have problems with code changes causing pickle problems (less than once per year). OTOH, we mostly only have a growing internal format -- we almost never change the internal format.
And then tons of companies are using ZODB wich uses pickle... so no biggie... but using pickle directly can be a problem if you have lots of data. -- Leonardo Santagada
On 05/03/2008, at 13:11, Aaron Watters wrote:
like the implementers of Plone apparently didn't
I know this is just a strange angry rant, but why do you say this? Do you mean that ZODB and ZRPC should be implemented using marshal? Can this even be done? Now if you want to do secure pickles, just sign them with a cripto method (completely secure). Simple and I think, the only way to do this being secure. Or using a secure transport layer like ssl to transfer things with signatures to identify your peers. Not a reason to rant, but to install the crypto package. -- Leonardo Santagada
What follows is a brief summary of offline discussions with Guido and Leonardo (I hope represented correctly, please complain if not): Guido pointed out that previous versions of marshal could crash python. I replied that that is a bug and all known instances have been fixed. Pickle executes arbitrary code by design -- which is much worse than just crashing a program. Leonardo mentioned that pickle security concerns could be addressed using crypto tricks. I replied that I would be comfortable unmarshalling a file from a known hostile party -- no crypto verification required, because the worst that could happen is that it would crash the interpreter. With pickle I'd be handing my keyboard to a villian. In summary: I think marshal.loads(s) is just as safe as unicode(s) or file.read(). pickle.loads(s) is morally equivalant to __import__(s) or eval(s). I think the security warning for marshal and the implied recommendation that pickle is okay for RPC should be removed. alright already, 'nuff said. whatever. -- Aaron Watters
Aaron Watters wrote:
In summary: I think marshal.loads(s) is just as safe as unicode(s) or file.read(). pickle.loads(s) is morally equivalant to __import__(s) or eval(s).
According to the docs, you can use a customised unpickler to restrict the set of things it can use as constructors. It might be worth mentioning that in a prominent place near the security warning as well. -- Greg
On 05/03/2008, at 16:03, Aaron Watters wrote:
Guido pointed out that previous versions of marshal could crash python.
I replied that that is a bug and all known instances have been fixed. Pickle executes arbitrary code by design -- which is much worse than just crashing a program.
Just read carefully what Guido said, if there is a bug it can not just crash your program, it can execute any kind of code, as bad or even worse than pickle... that is what is called a buffer overflow Talking about it the pypy project has a directory somewhere with lots of snippets of ways to crash cpython... Not just the set recursion limit and overflow the stack one.
Leonardo mentioned that pickle security concerns could be addressed using crypto tricks.
For some uses, for others some modified version of pure python pickle could be used, so you have a controled and almost safe pickle.
I replied that I would be comfortable unmarshalling a file from a known hostile party -- no crypto verification required, because the worst that could happen is that it would crash the interpreter. With pickle I'd be handing my keyboard to a villian.
In summary: I think marshal.loads(s) is just as safe as unicode(s) or file.read(). pickle.loads(s) is morally equivalant to __import__(s) or eval(s).
No marshall load do lots of stuff in pure unverified C code... anything could happen, as guido pointed out.
I think the security warning for marshal and the implied recommendation that pickle is okay for RPC should be removed.
No, AFAIK marshal can only load ints and simple objects... and that will give you a very poor rpc (for example it could never be used to replace pickle as it is used in ZODB and ZRPC). -- Leonardo Santagada
On Wed, Mar 5, 2008 at 8:33 PM, Leonardo Santagada <santagada@gmail.com> wrote:
On 05/03/2008, at 16:03, Aaron Watters wrote:
Guido pointed out that previous versions of marshal could crash python.
I replied that that is a bug and all known instances have been fixed. Pickle executes arbitrary code by design -- which is much worse than just crashing a program.
Just read carefully what Guido said, if there is a bug it can not just crash your program, it can execute any kind of code, as bad or even worse than pickle... that is what is called a buffer overflow
I'd like to know the actual number of successful buffer overflow attacks that have ever happened on the planet in the wild. Maybe one? Okay, according to Wikipedia there have been 4. I don't really know but I think an overflowing buffer in marshal is not very likely to be somewhere near where a code segment could jump to because almost everything in marshal is dynamically allocated. The known attacks have been where the arrays were in static locations, I believe. And it's not worse than pickle because pickle is perfectly capable of compiling and loading an assembly language component without you knowing anything about it. Pickle can do anything that the computer can do. Also it's not worse than pickle because you have to be a highly experienced and perverted assembly language programmer to construct an overflow attack and there has to be a bug in marshal to allow it. To abuse pickle requires almost no skill at all, and you don't have to be perverted, you just have to be stupid. In fact pickle is designed to execute arbitrary code, and even documented. For all I know it's just as feasible to stage buffer overflow attacks in many other places in python as it is in marshal -- like maybe unicode.join or anyplace else where an array is constructed. Which is to say it's not very feasible in those places either. I was clearly off my medication to start this discussion. I suppose misleading people into thinking marshal is dangerous is better than suggesting pickle is safe. Peace and love everyone. bye now. -- Aaron Watters
Leonardo Santagada wrote:
I replied that that is a bug and all known instances have been fixed. Pickle executes arbitrary code by design -- which is much worse than just crashing a program.
Just read carefully what Guido said, if there is a bug it can not just crash your program, it can execute any kind of code, as bad or even worse than pickle... that is what is called a buffer overflow
marshal is *ONLY* designed to store and load trusted pyc files. It's not desinged for anything else. It *CAN* be used for simple stuff, too. But it doesn't support fancy stuff and it can easily be broken. IIRC it doesn't support nested structured like a list containing a reference to itself. Use it on your own risk. Christian
Boris Borcic wrote:
Aaron Watters wrote: [...]
Sorry, for the french and the caps, but this is REALLY IMPORTANT.
I see nothing french in your post.
http://en.wikipedia.org/wiki/Pardon_my_French HTH Christian
Boris Borcic wrote:
Aaron Watters wrote: [...]
Sorry, for the french and the caps, but this is REALLY IMPORTANT.
I see nothing french in your post.
I could have sworn I read something like "I fart in your general direction". Neil
participants (12)
-
Aahz
-
Aaron Watters
-
Arne Babenhauserheide
-
Boris Borcic
-
Christian Heimes
-
Georg Brandl
-
George Sakkis
-
Greg Ewing
-
Guido van Rossum
-
Leonardo Santagada
-
Neil Toronto
-
Oleg Broytmann