Mailman 3 Running untrusted code in pypy - pypy-dev

Running untrusted code in pypy

Vinj Vinj

Feb. 19, 2007

4:05 p.m.

Hi, I've been following pypy dev for several years and this is my first post here. Thanks for all the hard work you guys have put in, it is truly exciting to see what pypy has accomplished so far. I'm currently building a distributing financial trading application that allows users to write trading models in python and lua. I had to introduce lua, since there is no way to completely "secure" user models written in cPython. I'm working with a modified Lua core which allows me to restrict the max memory and max CPU cycles available for each vm. I then have a python-lua bridge code that allows you exchange data and function calls between the two. Would I be able to do something similar with pypy? Would I be able to ensure that no malicious user is able to bring my hosted application down? Are any of you guys going to be at pycon-2007? Thanks, Vineet ____________________________________________________________________________________ Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. http://games.yahoo.com/games/front

Show replies by date

holger krekel

February 2007

4:27 p.m.

Hi Vinj, On Mon, Feb 19, 2007 at 08:05 -0800, Vinj Vinj wrote:

...

I've been following pypy dev for several years and this is my first post here. Thanks for all the hard work you guys have put in, it is truly exciting to see what pypy has accomplished so far.

I'm currently building a distributing financial trading application that allows users to write trading models in python and lua. I had to introduce lua, since there is no way to completely "secure" user models written in cPython.

I'm working with a modified Lua core which allows me to restrict the max memory and max CPU cycles available for each vm. I then have a python-lua bridge code that allows you exchange data and function calls between the two.

Would I be able to do something similar with pypy?

PyPy does not (currently) aim at offering cpu/mem restrictions, but you could use virtual hosts (XEN or vserver) for that, both offer such restriction settings. PyPy itself may help with the Taint Object Space: http://codespeak.net/pypy/dist/pypy/doc/objspace-proxies.html#the-taint-obje... to track sensitive data flows in your application and prevent it from accidentally leaking.

...

Would I be able to ensure that no malicious user is able to bring my hosted application down?

I'd probably use kernel-level security for that, maybe in combination with VM provided features. (not sure if you refer to processing of user-input or to DOS attacks or some other security aspects, it obviously all depends a bit on concrete use cases and intentions).

...

Are any of you guys going to be at pycon-2007?

Michael and Christian are going to be there, myself i am busy preparing for the upcoming sprints, but i am happy to discuss possibilities some time. best, holger -- merlinux GmbH Steinbergstr. 42 31139 Hildesheim http://merlinux.de tel +49 5121 20800 75 (fax 77)

Vinj Vinj

4:47 p.m.

PyPy does not (currently) aim at offering cpu/mem

...

restrictions, but you could use virtual hosts (XEN or vserver) for that, both offer such restriction settings. PyPy itself may help with the Taint Object Space:

Unfortunately, for my use cases, using virtual hosts will not work. All the user models work on time series price data. This data can get very large and has to be shared by all user models. It is not practical for each vmware client to have its own copy of user data. With lua, I'm able to share this time series with all user models and still ensure that all the models are run securely. obviously all depends a bit on concrete use cases

...

and intentions).

Among many things, will I be able to restrict the user from doing: 1. a = []*10000000000000000000000000 2. a = 23**3294832098980989898 3. disable recursive calls 4. writing while loops which never end 5. etc. I'm trying to get a feel for whether this kind of thing would be (in the future) possible with pypy. With cPython, I've been told that it is just not going to be possible. Which is why, I moved to lua for user models. I would much rather use python so that I don't have to maintain the python-lua bridge. Michael and Christian are going to be there,

...

Cool. Thanks, Vineet ____________________________________________________________________________________ Looking for earth-friendly autos? Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. http://autos.yahoo.com/green_center/

Vinj Vinj

4:54 p.m.

...

1. a = []*10000000000000000000000000

What I meant was: a = [None]*10000000000000000000000000 ____________________________________________________________________________________ Need a quick answer? Get one in minutes from people who know. Ask your question on www.Answers.yahoo.com

Carl Friedrich Bolz

5:51 p.m.

Hi Vineet! Vinj Vinj wrote:

...

PyPy does not (currently) aim at offering cpu/mem

...
restrictions, but you could use virtual hosts (XEN or vserver) for that, both offer such restriction settings. PyPy itself may help with the Taint Object Space:

Unfortunately, for my use cases, using virtual hosts will not work. All the user models work on time series price data. This data can get very large and has to be shared by all user models. It is not practical for each vmware client to have its own copy of user data. With lua, I'm able to share this time series with all user models and still ensure that all the models are run securely.

How is the data shared? Using files or somehow differently?

...

...
obviously all depends a bit on concrete use cases and intentions).

Among many things, will I be able to restrict the user from doing:

1. a = []*10000000000000000000000000

Already CPython notices that this will use too much memory. But yes, in PyPy you could impose an upper limit to memory used, by using our custom mark-and-sweep garbage collector. This collector collects quite a bit information while it is running, especially how much non-dead memory is used currently. This would make it possible to impose a hard limit there.

...

2. a = 23**3294832098980989898

Doable, but harder (and I guess you mean this in a more general way than just checking during long computations). You would need a transformation that inserts checks into the PyPy graphs to see whether something is running too long without ever reaching the interpreter main loop. You might be better served with having the program run with a timeout.

...

3. disable recursive calls

You could fix the recursion limit.

...

4. writing while loops which never end

You could limit the maximum numbers of bytecode instructions executed. Again, a timeout might be the better solution.

...

5. etc.

Another thing I can see there is accessing the file system in malicious ways. Can be fixed on the OS level, I guess. You could not include things like socket into your PyPy interpreter executable.

...

I'm trying to get a feel for whether this kind of thing would be (in the future) possible with pypy.

As Holger said, it depends very much on what exactly you want. In PyPy usually many things are possible and you have to choose the right possibilities.

...

With cPython, I've been told that it is just not going to be possible. Which is why, I moved to lua for user models. I would much rather use python so that I don't have to maintain the python-lua bridge.

Do you know about lunatic Python? http://labix.org/lunatic-python Cheers, Carl Friedrich

Vinj Vinj

6:49 p.m.

...

How is the data shared? Using files or somehow differently?

Slices of numeric python arrays

...

custom mark-and-sweep garbage collector. This

collector collects quite a > bit information while it is running, especially how much non-dead memory > is used currently. This would make it possible to impose a hard limit there.

Ok. But this limit would be for the entire app and not per user model. This should be fine, I would just take the penalty of the OS/interpretor than releasing back all the unused memory.

...

Doable, but harder (and I guess you mean this in a more general way than just checking during long computations). You would need a transformation that inserts checks into the PyPy graphs to see whether something is running too long without ever reaching the interpreter main loop. You might be better served with having the program run with a timeout.

I think again os based timeout interrupts would work fine? Do you see any downside of using os level interrupts? Any way that the application would not be able to catch them?

...

You could fix the recursion limit.

Again this would be for the entire application and not per user model.

...

Another thing I can see there is accessing the file system in malicious ways. Can be fixed on the OS level, I guess. You could not include things like socket into your PyPy interpreter executable.

This is the tricky part. The main python application used a lot of cPython libraries, so not including them in the interpreter was not an option. I was hoping that there would be some other way which could tell the pypy interpreter, before it executes a certain piece of code, that access to the following list of modules ([x, y, z...]) is allowed.

...

As Holger said, it depends very much on what exactly you want. In PyPy usually many things are possible and you have to choose the right possibilities.

...

Do you know about lunatic Python? http://labix.org/lunatic-python

Yes. I'm using a modified version of this library. Vineet ____________________________________________________________________________________ Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html

Carl Friedrich Bolz

8:27 p.m.

Vinj Vinj wrote:

...

...
How is the data shared? Using files or somehow differently?

Slices of numeric python arrays

Then the foremost problem will probably be that PyPy is not close to supporting numeric arrays :-). I guess that will change at one point.

...

...
custom mark-and-sweep garbage collector. This collector collects quite a bit information while it is running, especially how much non-dead memory is used currently. This would make it possible to impose a hard limit there.

Ok. But this limit would be for the entire app and not per user model. This should be fine, I would just take the penalty of the OS/interpretor than releasing back all the unused memory.

There can be more advanced solutions: The GC has "memory pools" and you could have a solution where the main app has its own (unlimited) pool and the user models have limited pools.

...

I think again os based timeout interrupts would work fine? Do you see any downside of using os level interrupts? Any way that the application would not be able to catch them?

There is a downside to os-level interrupts (both in CPython and in PyPy): Only the intepreter main loop checks for interrupts, that means that this does not work against 2**(something big) if I see it correctly. I cannot think of a good way to fix this, I fear.

...

...
You could fix the recursion limit.

Again this would be for the entire application and not per user model.

...
Another thing I can see there is accessing the file system in malicious ways. Can be fixed on the OS level, I guess. You could not include things like socket into your PyPy interpreter executable.

This is the tricky part. The main python application used a lot of cPython libraries, so not including them in the interpreter was not an option. I was hoping that there would be some other way which could tell the pypy interpreter, before it executes a certain piece of code, that access to the following list of modules ([x, y, z...]) is allowed.

This is something which is quite hard to enforce, given Python's very introspective nature. There are some ideas to support a rather strict distinction between two different sorts of code within the same process with PyPy. You would have two interpreters in the same executable, one for trusted, one for sandboxed code. The sandboxed interpreter would only get access to a very limited set of modules and builtins. The trusted interpreter could somehow "control" what sort of operations the untrusted part would be allowed to do. This is quite a mess to implement correctly (but easier than in CPython, I suppose), but might give a general solution to this set of problems. Cheers, Carl Friedrich

Jeff Rush

8:30 p.m.

Vinj Vinj wrote:

...

With cPython, I've been told that it is just not going to be possible. Which is why, I moved to lua for user models.

It sounds like you're going to be at PyCon, so be sure not to miss the talk on Saturday afternoon: Securing Python: "Protecting the interpreter from code wielding fresh fruit." (#41) by Brett Cannon "Python currently has no security model. This talk discusses why this is and how I am fixing the problem." -Jeff

James Matthews

2:51 a.m.

How can you detect such code running ( all the bad code) On 2/19/07, Jeff Rush <jeff@taupro.com> wrote:

...

Vinj Vinj wrote:

...
With cPython, I've been told that it is just not going to be possible. Which is why, I moved to lua for user models.

It sounds like you're going to be at PyCon, so be sure not to miss the talk on Saturday afternoon:

Securing Python: "Protecting the interpreter from code wielding fresh fruit." (#41) by Brett Cannon

"Python currently has no security model. This talk discusses why this is and how I am fixing the problem."

-Jeff

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

-- http://www.goldwatches.com/watches.asp?Brand=39 http://www.wazoozle.com

Jeff Rush

3:14 a.m.

James Matthews wrote:

...

How can you detect such code running ( all the bad code)

A complicated topic that can't be covered in a brief email but Python used to have a security model/features: http://www.python.org/doc/2.3.5/lib/restricted.html but there were ways to escape the sandbox. Perhaps they could be closed but no one had the time to carefully study the matter, so it was disabled in 2.3 and I believe removed in 2.5. Brett Cannon is re-opening the matter. You can read about his approach at: http://tinyurl.com/2sh55f Many in the Python community are excited because it will finally bring capability-based security to Python, if it works. There is also some cross-pollination of ideas re capabilities with the one-laptop-per-child project, who recently published their security model. They have a lot of Python code to secure, in a potentially hostile laptop/network environment. You can read about their model at: http://wiki.laptop.org/go/Bitfrost Ask Ivan Krstić about Bitfrost, whose development he led. He is giving the opening keynote at PyCon on Friday morning. -Jeff

...

On 2/19/07, *Jeff Rush* <jeff@taupro.com <mailto:jeff@taupro.com>> wrote:

Vinj Vinj wrote: > > With cPython, I've been told that it is just not going > to be possible. Which is why, I moved to lua for user > models.

It sounds like you're going to be at PyCon, so be sure not to miss the talk on Saturday afternoon:

Securing Python: "Protecting the interpreter from code wielding fresh fruit." (#41) by Brett Cannon

"Python currently has no security model. This talk discusses why this is and how I am fixing the problem."

Gangadhar NPK

6:19 p.m.

Vineet, Though not directly related to this post, I would be interested to know how you have modified the lua core to provide individual VMs for users to operate within. The problem you are trying to solve is to enable pythonic access within Lua, but I am more interested in restricted user level execution model, and would like to know more about it. May be you can drop a note (either to the list or to my mail id) when you get some time. Thank You Gangadhar On 2/20/07, Jeff Rush <jeff@taupro.com> wrote:

...

James Matthews wrote:

...
How can you detect such code running ( all the bad code)

A complicated topic that can't be covered in a brief email but Python used to have a security model/features:

http://www.python.org/doc/2.3.5/lib/restricted.html

but there were ways to escape the sandbox. Perhaps they could be closed but no one had the time to carefully study the matter, so it was disabled in 2.3 and I believe removed in 2.5.

Brett Cannon is re-opening the matter. You can read about his approach at:

http://tinyurl.com/2sh55f

Many in the Python community are excited because it will finally bring capability-based security to Python, if it works.

There is also some cross-pollination of ideas re capabilities with the one-laptop-per-child project, who recently published their security model. They have a lot of Python code to secure, in a potentially hostile laptop/network environment. You can read about their model at:

http://wiki.laptop.org/go/Bitfrost

Ask Ivan Krstić about Bitfrost, whose development he led. He is giving the opening keynote at PyCon on Friday morning.

-Jeff

...
On 2/19/07, *Jeff Rush* <jeff@taupro.com <mailto:jeff@taupro.com>> wrote:

Vinj Vinj wrote: > > With cPython, I've been told that it is just not going > to be possible. Which is why, I moved to lua for user > models.

It sounds like you're going to be at PyCon, so be sure not to miss the talk on Saturday afternoon:

Securing Python: "Protecting the interpreter from code wielding fresh fruit." (#41) by Brett Cannon

"Python currently has no security model. This talk discusses why this is and how I am fixing the problem."

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

6600

Age (days ago)

6601

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Carl Friedrich Bolz
Gangadhar NPK
holger krekel
James Matthews
Jeff Rush
Vinj Vinj

Running untrusted code in pypy

Vinj Vinj

holger krekel

Vinj Vinj

Vinj Vinj

Carl Friedrich Bolz

Vinj Vinj

Carl Friedrich Bolz

Jeff Rush

James Matthews

Jeff Rush

Gangadhar NPK

tags

participants (6)