[pypy-dev] Use of marshal in the sandbox: is stdlib marshal OK?
ned at nedbatchelder.com
Wed Dec 28 15:34:28 CET 2011
I guess that is a possibility, but another principle is to use well-used
and widely-reviewed code where possible, no? I guess the problem is
that built-in marshal isn't trying hard to protect itself against
The problem with "bundling pypy's marshal.py" is that it pulls in a lot
of infrastructure modules, which bulks up the calling process. Maybe
there's some low-hanging fruit there that we can trim.
Any thoughts on the second issue?
On 12/27/2011 10:09 PM, lahwran wrote:
> it will become an issue if there is a bug in the marshal code inside
> pypy-c-sandbox which is /creating/ the marshalled data, a bug that
> would allow a sandboxed program to alter the marshalled data in such a
> way that it can exploit the vulnerability of the stdlib marshal.
> Doesn't sound too likely, but in the spirit of having as many layers
> of security as possible, I propose simply bundling pypy's marshal.py
> with the sandbox.
> -- lahwran
> On Tue, Dec 27, 2011 at 7:30 PM, Ned Batchelder<ned at nedbatchelder.com> wrote:
>> The sandbox uses pypy's own implementation of marshal. In
>> pypy/translator/sandbox/sandlib.py is this comment:
>> # Note: we use lib_pypy/marshal.py instead of the built-in marshal
>> # for two reasons. The built-in module could be made to segfault
>> # or be attackable in other ways by sending malicious input to
>> # load(). Also, marshal.load(f) blocks with the GIL held when
>> # f is a pipe with no data immediately avaialble, preventing the
>> # _waiting_thread to run.
>> I'd like to remove as many dependencies as possible from the sandbox code,
>> so I'd like to explore the possibility of using the standard library marshal
>> The first reason above is about crashing marshal with malicious input. To
>> my thinking, we are in control of what data is marshaled, so we don't have
>> to worry about malicious input. The untrusted Python code running in the
>> sandbox doesn't have a way of sending marshaled data, so we don't have to
>> worry that it will be used to attack the marshal module. The stdout of the
>> untrusted Python code will become a string that is marshaled, but that
>> doesn't provide a way for the untrusted code to attack the marshal module.
>> Or have I missed something?
>> The second reason I can't address, is this still a problem? What bad
>> effects will we see if it is?
>> pypy-dev mailing list
>> pypy-dev at python.org
More information about the pypy-dev