[pypy-dev] Use of marshal in the sandbox: is stdlib marshal OK?

Dec. 27, 2011

      The sandbox uses pypy's own implementation of marshal.  In 
pypy/translator/sandbox/sandlib.py is this comment:

# Note: we use lib_pypy/marshal.py instead of the built-in marshal
# for two reasons.  The built-in module could be made to segfault
# or be attackable in other ways by sending malicious input to
# load().  Also, marshal.load(f) blocks with the GIL held when
# f is a pipe with no data immediately avaialble, preventing the
# _waiting_thread to run.

I'd like to remove as many dependencies as possible from the sandbox 
code, so I'd like to explore the possibility of using the standard 
library marshal module.

The first reason above is about crashing marshal with malicious input.  
To my thinking, we are in control of what data is marshaled, so we don't 
have to worry about malicious input.  The untrusted Python code running 
in the sandbox doesn't have a way of sending marshaled data, so we don't 
have to worry that it will be used to attack the marshal module.  The 
stdout of the untrusted Python code will become a string that is 
marshaled, but that doesn't provide a way for the untrusted code to 
attack the marshal module.  Or have I missed something?

The second reason I can't address, is this still a problem?  What bad 
effects will we see if it is?

--Ned.

[pypy-dev] Use of marshal in the sandbox: is stdlib marshal OK?

Ned Batchelder