[pypy-dev] Re: Mixed modules for both PyPy and CPython

15 Apr 2006

      Hello,

holger krekel wrote:
...
...
Second, comments on py3k list indicated that secure python is difficult 
because of a) introspection, b) type inference, and c) GIL acquisition.
Hum, this list looks a bit weird to me.  Could you state what
the actual attacks are for which security measures are discussed? 
Or which use cases are people on py3k having in mind?
This is an amalgam of several different posts (and maybe different 
threads) but here goes:

In the thread "Will we have a true restricted exec environment for 
python 3000," Vineet Jain asked for a restricted mode which would

"1. Limit the memory consumed by the script
2. Limit access to file system and other system resources
3. Limit cpu time that the script will take
4. Be able to specify which modules are available for import."

In responses to that request, various people commented on the 
difficulties of implementing such a restricted mode.  On that thread, 
several people had the same idea I had, to try to use PyPy for this 
purpose - however, it didn't look like many people were up-to-date 
reading both lists (and thus familiar-ish with PyPy's execution model).

A) Introspection

Nick Coghlan stated that:

"I'm interested, but I'm also aware of how much work it would be. I'm 
disinclined to trust any mechanism which allows the untrusted code to 
run in  the same process, as the implications of being able to do:

self.__class__.__mro__[-1].__subtypes__()

are somewhat staggering, and designing an in-process sandbox to cope 
with that is a big ask (and demonstrating that the sandbox actually 
*achieves* that goal is even tougher)."

Vineet volunteered with a proposal to start a "light" python 
subinterpreter, which would be controlled by the main interpreter.

Nick countered, "But will it allow you to use numbers or strings?

If yes, then you can get to object(), and hence to pretty much whatever 
C builtins you want. So its not enough to try to hide dangerous builtins 
like file(), you want to remove them from the light version entirely 
(routing all file system and network access requests through the main 
application). But if the file objects are gone, what happens to the 
Python machinery that relies on them (like import)?

Python's powerful introspection is a severe drawback from a security POV 
- it is *really* hard to make a user stay in a box you put them in 
without crippling some part of the language as a side effect."

Thus, in CPy, allowing someone to access a C type effectively opens up 
all the C types.  In PyPy, however, each type is effectively in its own 
box.  Further, PyPy already has a structure that can deal with these 
sorts of accesses: the flowgraph.  Operations in PyPy come about because 
of traversals of the graph - certain branches of the graph could be 
restricted or proxied out to a trusted interpreter.

B) GIL Acquisition

Another person suggested leveraging the multiple subinterpreter code 
which already exists in CPython to create a restricted-exec interpreter. 
  MvL noted that GIL acquisition made that difficult:

"Part of the problem is that it doesn't really work. Some objects *are* 
shared across interpreters, such as global objects in extension modules 
(extension modules are initialized only once). I believe that the GIL 
management code (for acquiring the GIL out of nowhere) breaks if there 
are multiple interpreters."

C) Type inference

I tried to find the thread for this one - its not from the Py3K list - 
but I recall a couple years ago someone attempting to make an rexec 
version of python.  One of the comments that I recall from that 
discussion had to do with understanding what types were being 
manipulated.  I believe there was an example somewhat like

operator.add is trusted

class A:
    def __add__(self, other):
       ... something evil here ...

a, b = A(), 1

a + b
[something evil happens]

However, this is a foggy memory that I have so far been unable to 
substantiate.

Thanks,

VanL