data:image/s3,"s3://crabby-images/25c1c/25c1c3af6a72513b68fa05e2e58c268428e42e0d" alt=""
The sandbox uses pypy's own implementation of marshal. In pypy/translator/sandbox/sandlib.py is this comment: # Note: we use lib_pypy/marshal.py instead of the built-in marshal # for two reasons. The built-in module could be made to segfault # or be attackable in other ways by sending malicious input to # load(). Also, marshal.load(f) blocks with the GIL held when # f is a pipe with no data immediately avaialble, preventing the # _waiting_thread to run. I'd like to remove as many dependencies as possible from the sandbox code, so I'd like to explore the possibility of using the standard library marshal module. The first reason above is about crashing marshal with malicious input. To my thinking, we are in control of what data is marshaled, so we don't have to worry about malicious input. The untrusted Python code running in the sandbox doesn't have a way of sending marshaled data, so we don't have to worry that it will be used to attack the marshal module. The stdout of the untrusted Python code will become a string that is marshaled, but that doesn't provide a way for the untrusted code to attack the marshal module. Or have I missed something? The second reason I can't address, is this still a problem? What bad effects will we see if it is? --Ned.