[pypy-dev] Use of marshal in the sandbox: is stdlib marshal OK?

Ned Batchelder ned at nedbatchelder.com
Wed Dec 28 15:34:28 CET 2011


I guess that is a possibility, but another principle is to use well-used 
and widely-reviewed code where possible, no?  I guess the problem is 
that built-in marshal isn't trying hard to protect itself against 
malicious data?

The problem with "bundling pypy's marshal.py" is that it pulls in a lot 
of infrastructure modules, which bulks up the calling process.  Maybe 
there's some low-hanging fruit there that we can trim.

Any thoughts on the second issue?

--Ned.

On 12/27/2011 10:09 PM, lahwran wrote:
> it will become an issue if there is a bug in the marshal code inside
> pypy-c-sandbox which is /creating/ the marshalled data, a bug that
> would allow a sandboxed program to alter the marshalled data in such a
> way that it can exploit the vulnerability of the stdlib marshal.
> Doesn't sound too likely, but in the spirit of having as many layers
> of security as possible, I propose simply bundling pypy's marshal.py
> with the sandbox.
>
> -- lahwran
>
> On Tue, Dec 27, 2011 at 7:30 PM, Ned Batchelder<ned at nedbatchelder.com>  wrote:
>> The sandbox uses pypy's own implementation of marshal.  In
>> pypy/translator/sandbox/sandlib.py is this comment:
>>
>> # Note: we use lib_pypy/marshal.py instead of the built-in marshal
>> # for two reasons.  The built-in module could be made to segfault
>> # or be attackable in other ways by sending malicious input to
>> # load().  Also, marshal.load(f) blocks with the GIL held when
>> # f is a pipe with no data immediately avaialble, preventing the
>> # _waiting_thread to run.
>>
>> I'd like to remove as many dependencies as possible from the sandbox code,
>> so I'd like to explore the possibility of using the standard library marshal
>> module.
>>
>> The first reason above is about crashing marshal with malicious input.  To
>> my thinking, we are in control of what data is marshaled, so we don't have
>> to worry about malicious input.  The untrusted Python code running in the
>> sandbox doesn't have a way of sending marshaled data, so we don't have to
>> worry that it will be used to attack the marshal module.  The stdout of the
>> untrusted Python code will become a string that is marshaled, but that
>> doesn't provide a way for the untrusted code to attack the marshal module.
>>   Or have I missed something?
>>
>> The second reason I can't address, is this still a problem?  What bad
>> effects will we see if it is?
>>
>> --Ned.
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> http://mail.python.org/mailman/listinfo/pypy-dev


More information about the pypy-dev mailing list