[pypy-dev] Sandboxing in Pypy and Crunchy

Mon Dec 10 12:09:11 CET 2007

In a message of Sun, 09 Dec 2007 18:56:17 -0400, "Andre Roberge" writes:
>Hi Laura,
>Hmm, I'm not sure I can do that very well...   the best I can do I
>think is to describe what it does.
>
>1. Crunchy retrieves an html page.
>2. It process it, removing pre-existing javascript and various
>undesired html tags
>3. It identifies where it needs to add custom elements (new html tags
>& javascript code)
>4. It feeds the page to the browser, leaving a line of communication
>open, waiting for user instruction.
>
>At step 3. above, a new thread is started for each place in the page
>where an interaction with a Python interpreter is required.
>Following user interaction (click of a button or entering some code in
>an html input box), the user code is fed back to the appropriate
>interpreter (thread) and the result is sent back to the browser.  If I
>recall correctly, the interpreter used is a small variation from
>code.py included in Python standard library.    It is this part (I
>believe) that needs to be sandboxed - a single module.

I'm still confused.  Once you have figured out "each place in the page
where an interaction with a python interpreter is wanted" -- where do
you want the python interpreter which runs that code to itself run?
locally, on the same machine that has the browser?  inside the
browser? or do you open up a connection to some other machine --
yours, perhaps -- send the code there, and run it, and then send some
sort of result back to the machine where the student is running
the browser?

Wherever this is, this is where you need to run pypy.  And all of
these are possible.  You can run a sandboxed version of javascript --
a pypy with a javascript front end -- in your browser.  You can even
embed a sandboxed version of a python console in a browser.  too.

>>  Where does
>> the student's code run?  On the student's machine? or on the teacher's
>> server machine?
>
>Right now Crunchy is primarily used in a single user environment.  It
>would be possible to host it on a server, but it would be very
>insecure to do so.  Ideally it should be hosted in a secure way on a
>server in most situations.

Let me check one more thing -- are you running crunchy with the
idea that every student has one host machine, and every host machine
has one student?  And crunchy runs standalone on each and every
machine?  So your sandbox is to protect the student from him or
her self?  Because that is what this sounds like to me.  If so,
then each student needs a pypy intepreter.

But, usually, you want to sandbox somebody because the code they
are running is on a machine they share with other users, especially
a time sharing system where the other users could be using the
machine simultaneously.  Is this what you plan to do?  Because, in
that case, it is only the timesharing system that needs to be running
pypy.  And the students could connect to it using a crunchy-client
that ran under Cpython.

What is the part you want to sandbox?  The code that parses the
html page, looking for python code to run?  (I don't think so, but
I could be confused.)  Or the actual python code it finds?

>> The ability to sandbox is a property of the architecture of pypy.
>> It's not a module that you could port to Cpython.  The person you
>> want to sandbox has to be running pypy.
>>
>
>Darn :(    I was hoping I could somehow just call a sandboxed
>interpreter module ....    

Aha.  Sandboxing is not something that you construct around the outside
of arbitrary code.  It is a function of the transation process itself.
You need to construct a sandboxed version of pypy, and use that to
run any code that you want sandboxed.

When you run the special "pypy-c-sandbox" executable, instead of
running any library or system calls, it instead marshals the operation
name and the arguments to its stdout and it waits for the marshalled
result on its stdin. Which means there has to be an 'other side' -- a
separate process that reads the stdout and does something with it and
then marshals it up and sends it back.  This has 2 implications.
First of all, if the other side hasn't been written with the ability
to do something appropriate with a particular library or system call
that is of interest to you, then we will have to write that part
before the program will be of use. Not every python system call is
supported now.  We'd have to investigate to see if you need one that
hasn't been written yet.

And secondly, it is not the case that this second program, the other
side, is written in such a way that it, in effect, says 'using my
Godlike powers of understanding, I know that you, nasty person, just
tried to break out of the sandbox using a buffer overflow!  I won't
allow that!! '  Instead, the reason that things are secure is that no
matter what horrible things the other side is asked to deal with, the
user is still waiting for something to come back to his stdin.  If all
you can do is read from stdin and write to stdout, then there is no
ability to exploit whatever bug you are interested in.  You discover
that when some other process attempted to run the nasty code you typed
in, it generated a fatal RPython error.  That doesn't help you use the
error.  Thus, no cleverness and Godlike understanding is required.

>Then again, it means that I'll have to try
>pypy myself, and play with it - something I meant to do ... but did
>not for lack of time. It also makes it more of a burden on potential
>users  if they have to install pypy in addition to Crunchy.

Maybe over Christmas -- if there is no pyContest for the shortest
python program to do something -- we could work together on getting a
sandboxed-server version of crunchy working.  You wouldn't have any
more trips to Switzerland planned, either, would you?  Because we
could probably get together over a weekend around a trip like that, too.
And there is always PyCon.  

> >Thanks
for your clarifications,

Any time.

>
>> Laura
>>
>
>PS.  Yes, Laura, it is cold and there is snow (unusual at this time of
>the year) in your beloved Nova Scotia ;-)

You _really_ know how to make a person want to go home. :-(
Enjoy it for those of us who cannot.  Nothing but rain around here
in Göteborg.  Bah!  Where's the real winter?

PS. It's a _very polite and friendly_ irc channel.  I promise.