Challenge: Please break this! (a.k.a restricted mode revisited)

I've made another attempt at Python sandboxing, which does something which I've not seen tried before - using the 'ast' module to do static analysis of the untrusted code before it's executed, to prevent most of the sneaky tricks that have been used to break out of past attempts at sandboxes. In short, I'm turning Python's usual "gentleman's agreement" that you should not access names and attributes that are indicated as private by starting with an underscore into a rigidly enforced rule: try and access anything starting with an underscore and your code will not be run. Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes). I would be very interested to see if anyone can manage to break it. Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable?

On 8 April 2016 at 15:18, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
I would be very interested to see if anyone can manage to break it. Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable?
What are the limitations? It seems to even block "import" which seems over-zealous (no import math?) Paul

On Fri, Apr 08, 2016 at 03:37:45PM +0100, Paul Moore wrote:
On 8 April 2016 at 15:18, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
I would be very interested to see if anyone can manage to break it. Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable?
What are the limitations? It seems to even block "import" which seems over-zealous (no import math?)
The restrictions are: Of the builtins, __import__, compile, globals, input, locals, memoryview, open, print, type and vars are unavailable (and some of the exceptions, but mostly because they're irrelevant). You cannot access any name or attribute which starts with "_", or is called "gi_frame" or "gi_code". You cannot use the "with" statement (although it's possible it might be safe for me to add that back in if I also disallow access to attributes called "tb_frame"). Importing modules is fundamentally unsafe because the untrusted code might alter the module, and the altered version would then be used by the containing application. My code has a "_copy_module" function which copies (some of) the contents of modules, so some sort of import functionality of a white-list of modules could be added using this, but there's no point in me going through working out which modules are safe to white-list until I'm vaguely confident that my approach isn't fundamentally broken in the first place.

On 8 April 2016 at 16:18, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
I've made another attempt at Python sandboxing, which does something which I've not seen tried before - using the 'ast' module to do static analysis of the untrusted code before it's executed, to prevent most of the sneaky tricks that have been used to break out of past attempts at sandboxes.
In short, I'm turning Python's usual "gentleman's agreement" that you should not access names and attributes that are indicated as private by starting with an underscore into a rigidly enforced rule: try and access anything starting with an underscore and your code will not be run.
Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes).
I would be very interested to see if anyone can manage to break it. Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable?
If i'm not mistaken, this breaks out:
exec('open("out", "w").write("a")', {})
because if the second argument of exec does not contain a __builtins__ key, then a copy of the original builtins module is inserted: https://docs.python.org/3/library/functions.html#exec

On Fri, Apr 08, 2016 at 05:21:38PM +0200, Arthur Darcet wrote:
If i'm not mistaken, this breaks out:
exec('open("out", "w").write("a")', {}) because if the second argument of exec does not contain a __builtins__ key, then a copy of the original builtins module is inserted: https://docs.python.org/3/library/functions.html#exec
Ah, that's a good point. I did think allowing eval/exec was a bit ambitious. I've updated it to disallow passing namespace arguments to them.

On 08/04/16 16:18, Jon Ribbens wrote:
I've made another attempt at Python sandboxing, which does something which I've not seen tried before - using the 'ast' module to do static analysis of the untrusted code before it's executed, to prevent most of the sneaky tricks that have been used to break out of past attempts at sandboxes.
In short, I'm turning Python's usual "gentleman's agreement" that you should not access names and attributes that are indicated as private by starting with an underscore into a rigidly enforced rule: try and access anything starting with an underscore and your code will not be run.
Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes).
I would be very interested to see if anyone can manage to break it. Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable?
That one is trivially fixable, but here goes: async def a(): global c c = b.cr_frame.f_back.f_back.f_back b = a() b.send(None) c.f_builtins['print']('broken') Also, if the point of giving me a subclass of datetime is to prevent access to the actual class, that can be circumvented:
real_datetime = datetime.datetime.mro()[1] real_datetime <class 'datetime.datetime'>
But I'm not sure what good that is.

On Fri, Apr 08, 2016 at 05:49:12PM +0200, Marcin Kościelnicki wrote:
On 08/04/16 16:18, Jon Ribbens wrote: That one is trivially fixable, but here goes:
async def a(): global c c = b.cr_frame.f_back.f_back.f_back
b = a() b.send(None) c.f_builtins['print']('broken')
Ah, I've not used Python 3.5, and I can't find any documentation on this cr_frame business, but I've added cr_frame and f_back to the disallowed attributes list.
Also, if the point of giving me a subclass of datetime is to prevent access to the actual class, that can be circumvented:
real_datetime = datetime.datetime.mro()[1] real_datetime <class 'datetime.datetime'>
But I'm not sure what good that is.
It means you can alter the datetime class that is used by the containing application, which is bad - you could lie to it about what day it is for example ;-) I've made it so instead of a direct subclass it now makes an intermediate subclass which makes mro() return an empty list.

On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes).
Not being a security expert, I'm not the best one to try to break it maliciously; but I can break things accidentally. Pull request sent through. :) ChrisA

On Sat, Apr 09, 2016 at 02:20:49AM +1000, Chris Angelico wrote:
On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes).
Not being a security expert, I'm not the best one to try to break it maliciously; but I can break things accidentally. Pull request sent through. :)
Thanks, I've merged that in.

Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design. Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/ And the LWN article: https://lwn.net/Articles/574215/ There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite). I started as you, add more and more things to a blacklist, but it doesn't work. See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython. Victor

I'm with Victor here. In fact I tried (and failed) to convince Victor that the approach is entirely unworkable when he was starting, don't be the next one :-) On Sat, Apr 9, 2016 at 3:43 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design.
Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/
And the LWN article: https://lwn.net/Articles/574215/
There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite).
I started as you, add more and more things to a blacklist, but it doesn't work.
See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython.
Victor
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

On 9 April 2016 at 22:43, Victor Stinner <victor.stinner@gmail.com> wrote:
Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design.
Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/
And the LWN article: https://lwn.net/Articles/574215/
There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite).
I started as you, add more and more things to a blacklist, but it doesn't work.
See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython.
They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers There's also https://hg.python.org/cpython/file/tip/Lib/test/test_crashers.py which was designed to run them regularly to catch when they were resolved, but it was too fragile and tended to hang the buildbots. Even without those considerations though, there are system level denial of service attacks that untrusted code can perform without even trying to break out of the sandbox - the most naive is "while 1: pass", but there are more interesting ones like "from itertools import count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int, 1))". Operating system level security sandboxes still aren't particularly easy to use correctly, but they're a lot more reliable than language runtime level sandboxes, can be used to defend against many more attack vectors, and even offer increased flexibility (e.g. "can write to these directories, but no others", "can read these files, but no others", "can contact these IP addresses, but no others"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
On 9 April 2016 at 22:43, Victor Stinner <victor.stinner@gmail.com> wrote:
See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython.
They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
Thanks. I take your point that sandboxing Python requires CPython to free of code execution bugs. However I will note that none of the crashers in that directory will work inside my experiment (except "infinite_loop_re.py", which isn't a crasher just a long loop).
Even without those considerations though, there are system level denial of service attacks that untrusted code can perform without even trying to break out of the sandbox - the most naive is "while 1: pass", but there are more interesting ones like "from itertools import count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int, 1))".
Yes, of course. I have already explicitly noted that infinite loops and memory exhausation are not preventable.
Operating system level security sandboxes still aren't particularly easy to use correctly, but they're a lot more reliable than language runtime level sandboxes, can be used to defend against many more attack vectors, and even offer increased flexibility (e.g. "can write to these directories, but no others", "can read these files, but no others", "can contact these IP addresses, but no others").
I don't entirely trust operating system sandboxes either - I generally assume that if someone can execute arbitrary code on my machine, then they can do anything they want to that machine. What I *might* trust, though, would be a "sandbox Python" that is itself running inside an operating system sandbox...

On 10.04.16 19:51, Jon Ribbens wrote:
On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
On 9 April 2016 at 22:43, Victor Stinner <victor.stinner@gmail.com> wrote:
See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython.
They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
Thanks. I take your point that sandboxing Python requires CPython to free of code execution bugs. However I will note that none of the crashers in that directory will work inside my experiment (except "infinite_loop_re.py", which isn't a crasher just a long loop).
Try following example: it = iter([1]) for i in range(1000000): it = filter(None, it) next(it)

On Mon, Apr 11, 2016 at 12:07:48AM +0300, Serhiy Storchaka wrote:
On 10.04.16 19:51, Jon Ribbens wrote:
On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
On 9 April 2016 at 22:43, Victor Stinner <victor.stinner@gmail.com> wrote:
See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython.
They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
Thanks. I take your point that sandboxing Python requires CPython to free of code execution bugs. However I will note that none of the crashers in that directory will work inside my experiment (except "infinite_loop_re.py", which isn't a crasher just a long loop).
Try following example:
it = iter([1]) for i in range(1000000): it = filter(None, it) next(it)
That does indeed segfault. I guess you should report that as a bug!

On 10 Apr 2016 22:55, "Jon Ribbens" <jon+python-dev@unequivocal.co.uk> wrote:
On Mon, Apr 11, 2016 at 12:07:48AM +0300, Serhiy Storchaka wrote:
On 10.04.16 19:51, Jon Ribbens wrote:
On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
On 9 April 2016 at 22:43, Victor Stinner <victor.stinner@gmail.com>
wrote:
See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython.
They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
Thanks. I take your point that sandboxing Python requires CPython to free of code execution bugs. However I will note that none of the crashers in that directory will work inside my experiment (except "infinite_loop_re.py", which isn't a crasher just a long loop).
Try following example:
it = iter([1]) for i in range(1000000): it = filter(None, it) next(it)
That does indeed segfault. I guess you should report that as a bug!
There will be always be obscure ways to crash the interpreter. That one can be fixed but if someone really wants to break your sandbox this way then they will be able to. Remember that exploits are often based on bugs and any codebase the size of CPython has bugs. I haven't looked at your sandbox but for a different approach try this one: L = [None] L.extend(iter(L)) On my Linux machine that doesn't just crash Python. -- Oscar

On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I haven't looked at your sandbox but for a different approach try this one:
L = [None] L.extend(iter(L))
On my Linux machine that doesn't just crash Python.
For the record: don't try this if you have unsaved files open on your computer, because you will lose them. When I typed these two lines into the Py3.5 interactive prompt, it completely and totally froze Windows to the point that nothing would respond and I had to resort to the old trick of holding the power button down for five seconds to forcibly shut the computer down. Fortunately, I made extra certain everything was fully saved before I opened the Python interpreter, so I'm not TOTALLY dumb. :-P

On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I haven't looked at your sandbox but for a different approach try this one:
L = [None] L.extend(iter(L))
On my Linux machine that doesn't just crash Python.
For the record: don't try this if you have unsaved files open on your computer, because you will lose them. When I typed these two lines into the Py3.5 interactive prompt, it completely and totally froze Windows to the point that nothing would respond and I had to resort to the old trick of holding the power button down for five seconds to forcibly shut the computer down.
I think this might improve matters: http://bugs.python.org/issue26351 although I must admit I don't understand why the entire OS is effected. -- Steve

On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I haven't looked at your sandbox but for a different approach try this one:
L = [None] L.extend(iter(L))
On my Linux machine that doesn't just crash Python.
For the record: don't try this if you have unsaved files open on your computer, because you will lose them. When I typed these two lines into the Py3.5 interactive prompt, it completely and totally froze Windows to the point that nothing would respond and I had to resort to the old trick of holding the power button down for five seconds to forcibly shut the computer down.
I think this might improve matters:
http://bugs.python.org/issue26351
although I must admit I don't understand why the entire OS is effected.
Memory exhaustion?
-- Steve
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman <phd@phdru.name> wrote:
On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I haven't looked at your sandbox but for a different approach try
On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano < steve@pearwood.info> wrote: this one:
L = [None] L.extend(iter(L))
On my Linux machine that doesn't just crash Python.
For the record: don't try this if you have unsaved files open on your computer, because you will lose them. When I typed these two lines into the Py3.5 interactive prompt, it completely and totally froze Windows to the point that nothing would respond and I had to resort to the old trick of holding the power button down for five seconds to forcibly shut the computer down.
I think this might improve matters:
http://bugs.python.org/issue26351
although I must admit I don't understand why the entire OS is effected.
Memory exhaustion?
* https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-do... * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile
-- Steve
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com

On Mon, Apr 11, 2016 at 12:42:47AM -0500, Wes Turner <wes.turner@gmail.com> wrote:
On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman <phd@phdru.name> wrote:
On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I haven't looked at your sandbox but for a different approach try
On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano < steve@pearwood.info> wrote: this one:
L = [None] L.extend(iter(L))
On my Linux machine that doesn't just crash Python.
For the record: don't try this if you have unsaved files open on your computer, because you will lose them. When I typed these two lines into the Py3.5 interactive prompt, it completely and totally froze Windows to the point that nothing would respond and I had to resort to the old trick of holding the power button down for five seconds to forcibly shut the computer down.
I think this might improve matters:
http://bugs.python.org/issue26351
although I must admit I don't understand why the entire OS is effected.
Memory exhaustion?
* https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-do...
* https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile
I think memory control groups in Linux can be used to limit memory usage. I have mem. c. g. configured and I'll try to find time to experiment with the code above.
-- Steve
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Mon, Apr 11, 2016 at 08:06:34AM +0200, Oleg Broytman <phd@phdru.name> wrote: > On Mon, Apr 11, 2016 at 12:42:47AM -0500, Wes Turner <wes.turner@gmail.com> wrote: > > On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman <phd@phdru.name> wrote: > > > > > On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano < > > > steve@pearwood.info> wrote: > > > > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote: > > > > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin > > > > > <oscar.j.benjamin@gmail.com> wrote: > > > > > > I haven't looked at your sandbox but for a different approach try > > > this one: > > > > > > > > > > > > L = [None] > > > > > > L.extend(iter(L)) > > > > > > > > > > > > On my Linux machine that doesn't just crash Python. > > > > > > > > > > For the record: don't try this if you have unsaved files open on your > > > > > computer, because you will lose them. When I typed these two lines > > > > > into the Py3.5 interactive prompt, it completely and totally froze > > > > > Windows to the point that nothing would respond and I had to resort to > > > > > the old trick of holding the power button down for five seconds to > > > > > forcibly shut the computer down. > > > > > > > > > > > > I think this might improve matters: > > > > > > > > http://bugs.python.org/issue26351 > > > > > > > > although I must admit I don't understand why the entire OS is effected. > > > > > > Memory exhaustion? > > * > > https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir > > > > * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile > > I think memory control groups in Linux can be used to limit memory > usage. I have mem. c. g. configured and I'll try to find time to > experiment with the code above. With limited memory it was fast: $ ulimit -d 50000 -m 80000 -s 10000 -v 100000 $ python Python 2.7.9 (default, Mar 1 2015, 18:22:53) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> L = [None] >>> L.extend(iter(L)) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError Memory control groups don't help because they don't limit virtual memory so the process simply starts thrashing. > > > > -- > > > > Steve Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 11.04.16 00:53, Jon Ribbens wrote:
Try following example:
it = iter([1]) for i in range(1000000): it = filter(None, it) next(it)
That does indeed segfault. I guess you should report that as a bug!
There is old issue that doesn't have adequate solution. And this is only one example, you can get segfault with other recursive iterators.

On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design.
Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/
And the LWN article: https://lwn.net/Articles/574215/
There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite).
I started as you, add more and more things to a blacklist, but it doesn't work.
That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff. Even if what we end up with is an extremely restricted subset of Python, there are still cases where that could be a useful tool to have. I've read your links above, and indeed everything I can find written by anyone about historical attempts to sandbox Python. I'm aware that others have tried and failed at this in the past, so it's certainly true that there is room for suspicion that this simply cannot be done. However on the other hand, nobody has tried before to do what I am doing (static code analysis), so it's not necessarily a safe assumption that the idea is doomed. For example, as far as I can see, none of the methods used to break out of your pysandbox would work to break out of my experiment.

On Apr 10 2016, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design.
Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/
And the LWN article: https://lwn.net/Articles/574215/
There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite).
I started as you, add more and more things to a blacklist, but it doesn't work.
That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff.
That contradicts what you said in another mail: On Apr 08 2016, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
Ah, I've not used Python 3.5, and I can't find any documentation on this cr_frame business, but I've added cr_frame and f_back to the disallowed attributes list.
Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Sun, Apr 10, 2016 at 02:08:16PM -0700, Nikolaus Rath wrote:
On Apr 10 2016, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote: That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff.
That contradicts what you said in another mail:
On Apr 08 2016, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
Ah, I've not used Python 3.5, and I can't find any documentation on this cr_frame business, but I've added cr_frame and f_back to the disallowed attributes list.
No, you've just misunderstood my meaning. Obviously I'm not only allowing access to whitelisted variable and property names, that would be ridiculous ("your code may only use variables called 'foo', 'bar' and 'baz'..."). The point is that we can start with, say, only allowing expressions and not statements, and a __builtins__ that contains literally nothing. We can even limit ourselves to disallow, say, lambda and yield and generator expressions if we like. Can this minimal language be made "safe"? If so, we have already won something - the ability to use "eval" as a powerful calculator function. Then, can we allow statements? Can we allow user-defined classes? Can we allow try/catch? etc. With regard to names by the way, I suspect that disallowing just anything starting "_" and the names of the properties of frame objects would be good enough. Unless someone knows a way to get to an object's __dict__ or its type without using vars() or type() or underscore attributes...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/10/2016 06:31 PM, Jon Ribbens wrote:
Unless someone knows a way to get to an object's __dict__ or its type without using vars() or type() or underscore attributes...
Hmm, 'classmethod'-wrapped functions get passed the type. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXCwKgAAoJEPKpaDSJE9HYHbAP/ibVrlKBTqkwePFr4n4hfA5Z 6te+FCzYm4RfAiIMq0Mitc9mFzeeAx5J9Z6kxONkbCBoBbhttcngR1uHWHHR/7tk a9OVKCu0fzvQvKM9J1wPWdu6uB50TZ2PmRiZ1nChXG2XKC8F3xnj/JwZod0N+3vK zus1T6/5vB6pm+q/hm9gh1yd9gTRldzoVQ9T2Tp8vo6PiYxe5qBwfhIHKR8xtWVs eUG0OR1w8QzaU97NDTOShotDq9Ekow66zqlhppqUGSmt2nOTDndLekse6q1l/oir nMuPBxgyb/CkQ9+KNXb3UvT5l8MLmCtJaMm/To0n8OUBSXG8sspP0oUSiMLUXc5a F/haZnpD2jLmCFz9ivBxIpFRVkLIajwovzLLItSzePclZHj6TChctSQvGPY0roVD BYVnGa4i7vi46mSzkeWvXKT2XFed2pCklD+FLnS6RnShxaxj1VEct8LVAJHFNAJ4 qg1dyLlTeclWUdoerRdGG2J7oa3Ib04ydh9OxnB1Y5KGa5iDCmfydHw24BU0gzvu DIX8tEpq5XSqzN5QAkIbtIV5nyqFwPj1Jun275ETkESTvI0fdja/8RJvJ5npYZj0 yJ5Gc5iXwQWazF18ALFYdyeV+ZKKv2Q5UiYEOBxG02XYaH8GZypAqMbf5apJKQAj PXHMjfW/YIuASrzcporx =1Wrb -----END PGP SIGNATURE-----

On 11 April 2016 at 13:49, Tres Seaver <tseaver@palladion.com> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/10/2016 06:31 PM, Jon Ribbens wrote:
Unless someone knows a way to get to an object's __dict__ or its type without using vars() or type() or underscore attributes...
Hmm, 'classmethod'-wrapped functions get passed the type.
yeah, but to access that you need to assign the descriptor to the type - circular loop. If you can arrange that assignment its easy: thetype = [] class gettype: def __get__(self, obj, type=None): thetype.append((obj, type)) return None classIwant.query = gettype() classIwant().query thetype[0][1]... but you've already gotten to classIwant there. -Rob -- Robert Collins <rbtcollins@hpe.com> Distinguished Technologist HP Converged Cloud

2016-04-10 18:43 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design.
Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/
And the LWN article: https://lwn.net/Articles/574215/
There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite).
I started as you, add more and more things to a blacklist, but it doesn't work.
That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff. Even if what we end up with is an extremely restricted subset of Python, there are still cases where that could be a useful tool to have.
You design rely on the assumption that CPython is only pure Python. That's wrong. A *lot* of Python features are implemented in C and "ignore" your sandboxing code. Quick reminder: 50% of CPython is written in the C language. It means that your protections like hiding builtin functions from the Python scope don't work. If an attacker gets access to a C function giving access to the hidden builtin, the game is over. pysandbox is based on the idea of tav (his project safelite.py): remove features in the dictionary of builtin C types like FrameType, CodeObject, etc. See sandbox/attributes.py. It's not enough to be 100% safe, a C function can still access fields of the C structure directly, but it was enough to protect "most" C functions. It's hard to list all features of the C code which are indirectly accessible from the Python scope. Some examples: warnings and tracebacks. These features killed the pysandbox project because they open directly files on the filesystem, it's not possible to control these features from the Python scope. Another example which exposes a vulnerability of your sandbox: str.format() gets directly object attributes without the getattr() builtin function, so it's possible to escape your sandbox. Example: "{0.__class__}".format(obj) shows the type of an object. Think also about the new f-string which allows arbitrary Python code: f"{code}".
However on the other hand, nobody has tried before to do what I am doing (static code analysis),
You're wrong. Zope Security ("RestrictedPython") has a similar design. Analyzing AST is a common design to build a sanbox. But it's not safe. The "See also" section of my pysandbox has a long list of Python sandboxes without various design.
so it's not necessarily a safe assumption that the idea is doomed. For example, as far as I can see, none of the methods used to break out of your pysandbox would work to break out of my experiment.
What I see is that you asked to break your sandbox, and less than 1 hour later, a first vulnerability was found (exec called with two parameters). A few hours later, a second vulnerability was found (async generator and cr_frame). By the way, are you sure that you fixed the vulnerability? You blacklisted "cb_frame", not cr_frame ;-) You should look closer, pysandbox is very close to you project. It also uses whitelists for some protections (ex: builtins) and blacklist for other protections (ex: hide sensitive attributes). You are using a blacklist for attributes. By the way, you hide cr_frame but not cr_code. I'm quite sure that it's possible to execute arbitrary bytecode in your sandbox, I just don't have enough time to dig into the code. Your sandbox is not fully based on whitelists. Victor

On Mon, 11 Apr 2016, Victor Stinner wrote:
2016-04-10 18:43 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff. Even if what we end up with is an extremely restricted subset of Python, there are still cases where that could be a useful tool to have.
You design rely on the assumption that CPython is only pure Python. That's wrong. A *lot* of Python features are implemented in C and "ignore" your sandboxing code. Quick reminder: 50% of CPython is written in the C language.
It means that your protections like hiding builtin functions from the Python scope don't work. If an attacker gets access to a C function giving access to the hidden builtin, the game is over. [....]
Non-Python core developer, non-expert-specifically-in-computer-security here, so won't take up much room on this list. I know enough about almost everything in Computer Science to know just how ignorant I am about almost everything in Computer Science. But I would not use for security purposes a Python sandbox that was not formally verified to be correct and unbreakable. Of course in order for this to be possible, there first has to be a formal semantics for Python. Has anybody made a formal semantics for Python? If not, then this project is missing a pretty important pre-requisite. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist

On Mon, Apr 11, 2016 at 9:04 PM, Isaac Morland <ijmorlan@uwaterloo.ca> wrote:
But I would not use for security purposes a Python sandbox that was not formally verified to be correct and unbreakable. Of course in order for this to be possible, there first has to be a formal semantics for Python. Has anybody made a formal semantics for Python? If not, then this project is missing a pretty important pre-requisite.
Formal semantics for the language? Yes; most of docs.python.org is about the language, independently of any particular implementation. (There are odd notes here and there about "CPython implementation detail" and such, and there are some entire modules that are specifically stated as being implementation-specific, but they're a tiny proportion.) You can also read through the PEPs, which (again, for the most part) deal with language-level concerns ahead of implementation details. However, even with that information, it's virtually impossible to formally verify that the sandbox is unbreakable. A Python-in-Python sandbox is almost guaranteed to leak information across the boundary, and when information is leaked, it's extremely hard to prove that privilege escalation is impossible. ChrisA

On Mon, Apr 11, 2016 at 11:40:05AM +0200, Victor Stinner wrote:
2016-04-10 18:43 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff. Even if what we end up with is an extremely restricted subset of Python, there are still cases where that could be a useful tool to have.
You design rely on the assumption that CPython is only pure Python.
No it doesn't. Obviously I know CPython is written in C - the clue is in the name. I'm not sure what you mean here.
It means that your protections like hiding builtin functions from the Python scope don't work. If an attacker gets access to a C function giving access to the hidden builtin, the game is over.
The former is only true if you assume the latter is possible. Is there any reason to believe it is?
It's hard to list all features of the C code which are indirectly accessible from the Python scope. Some examples: warnings and tracebacks. These features killed the pysandbox project because they open directly files on the filesystem, it's not possible to control these features from the Python scope.
I think what you're referring to is when they show context for errors, for which they try and find the source code lines to display by identifying the filename, and you can subvert that process by changing __file__ and/or __name__. If so, you can't do that within my experiment because you're not allowed to access either of those names.
Another example which exposes a vulnerability of your sandbox: str.format() gets directly object attributes without the getattr() builtin function, so it's possible to escape your sandbox. Example: "{0.__class__}".format(obj) shows the type of an object.
Yes, I'd thought of that. However getting access to a string which contains the name or a representation of an object is not at all the same thing as getting access to the object itself.
Think also about the new f-string which allows arbitrary Python code: f"{code}".
Obviously I can't speak to features of future versions of Python. I'd have to see the ast generated by an f-string to know if they pose a problem or not, but I suspect they would compile to expression nodes and hence be caught by the existing checks.
However on the other hand, nobody has tried before to do what I am doing (static code analysis),
You're wrong.
Zope Security ("RestrictedPython") has a similar design. Analyzing AST is a common design to build a sanbox. But it's not safe.
Ah, I hadn't seen that one. Yes, they are doing something similar (but also much more complex!) I don't know why you say this is a "common design" though, that one is the only one that appears to use it.
What I see is that you asked to break your sandbox, and less than 1 hour later, a first vulnerability was found (exec called with two parameters). A few hours later, a second vulnerability was found (async generator and cr_frame).
The former was just a stupid bug, it says nothing about the viability of the methodology. The latter was a new feature in a Python version later than I have ever used, and again does not imply anything much about the viability. I think now I've blocked the names of frame object attributes it wouldn't be a vulnerability any more anyway.
By the way, are you sure that you fixed the vulnerability? You blacklisted "cb_frame", not cr_frame ;-)
Ah, thanks. As above, I think this doesn't actually make any difference, but I've updated the code regardless.
You should look closer, pysandbox is very close to you project.
I've just looked through it all again, and I don't understand why you are saying that. It's nothing like my experiment. It's trying to alter the global Python environment so that arbitrary code can be executed, whereas I am not even trying to allow execution of arbitrary code and am not altering the global environment.

On 11 April 2016 at 15:46, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
It's trying to alter the global Python environment so that arbitrary code can be executed, whereas I am not even trying to allow execution of arbitrary code and am not altering the global environment.
However, it's not at all clear (to me at least) what you *are* trying to do. You're limiting the subset of Python that people can use, understood. And you're trying to ensure that people can't do "bad things". Again, understood. But what subset are you actually allowing, and what things are you trying to protect against? (For example, I can't calculate sin(1.2) using the math module - why is that not alllowed? It's just as safe as using the built in exponential operator, and indeed I could write a sin() function in pure Python, although it would be too slow to be useful, unlike math.sin...) It feels at the moment as if I'm playing a game where I don't know the rules, and every time I think I scored a point, the rules are changed to retroactively disallow it. Paul

On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote:
However, it's not at all clear (to me at least) what you *are* trying to do.
I'm trying to see to what extent we can use ast node inspection to remedy the failures of prior attempts at Python sandboxing. Is there *any* extent to which Python can be sandboxed, or is even trying to use it as a calculator function unfixably insecure?
You're limiting the subset of Python that people can use, understood. And you're trying to ensure that people can't do "bad things". Again, understood. But what subset are you actually allowing, and what things are you trying to protect against? (For example, I can't calculate sin(1.2) using the math module - why is that not alllowed?
It wasn't allowed in the earlier version because I wasn't allowing import at all, because this is just an experiment. As it happens, I added 'import' yesterday so yes you can use math.sin.
It feels at the moment as if I'm playing a game where I don't know the rules, and every time I think I scored a point, the rules are changed to retroactively disallow it.
The challenge is to show some code that will escape from the sandbox, in a way that is not trivially fixable with a tiny patch, or in a way that demonstrates that such a large number of tiny patches would be required as to be unworkable.

On Tue, Apr 12, 2016 at 2:53 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote:
However, it's not at all clear (to me at least) what you *are* trying to do.
I'm trying to see to what extent we can use ast node inspection to remedy the failures of prior attempts at Python sandboxing. Is there *any* extent to which Python can be sandboxed, or is even trying to use it as a calculator function unfixably insecure?
It all depends on how much functionality you want. If all you need is a numeric expression evaluator, that's not too hard - disallow all forms of attribute access, etc, and just have simple numbers and operators. That's pretty useful, and safe. Alternatively, go completely the other way. Let people run whatever code they like... in an environment where it can't hurt anyone else. That's what PyPyJS does - don't bother looking for security holes in it, because all you're doing is attacking your own computer. The hard part comes when you want to allow *some*, but not all, interaction with the outside world. When I was looking into this kind of sandboxing (although it was Python-in-C++ rather than Python-in-Python), it was to allow untrusted users to control certain parts of server-side execution. The result was dismal, because it's fundamentally impossible to allow the level of control I wanted without allowing a level of control I didn't want. So before you can ask whether Python is unfixably insecure, you first have to decide what the minimum level of functionality is that you'll accept. Do you need basic arithmetic plus trignometric functions? Easy enough - disallow all attribute access and imports, and populate builtins with "from math import *". Need them to be able to assign variables and define functions? That's gonna be harder. ChrisA

On Tue, Apr 12, 2016 at 03:02:54AM +1000, Chris Angelico wrote:
On Tue, Apr 12, 2016 at 2:53 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote:
However, it's not at all clear (to me at least) what you *are* trying to do.
I'm trying to see to what extent we can use ast node inspection to remedy the failures of prior attempts at Python sandboxing. Is there *any* extent to which Python can be sandboxed, or is even trying to use it as a calculator function unfixably insecure?
It all depends on how much functionality you want. If all you need is a numeric expression evaluator, that's not too hard - disallow all forms of attribute access, etc, and just have simple numbers and operators. That's pretty useful, and safe.
By "calculator" I didn't necessarily mean to imply numeric-only, sorry if I was unclear. Also perhaps I should have said "non-trivial", inasmuch as if we restrict it that far then it would quite possibly be simpler and quicker just to write the expression evaluator from scratch and not use the Python interpreter at all.
Alternatively, go completely the other way. Let people run whatever code they like... in an environment where it can't hurt anyone else. That's what PyPyJS does - don't bother looking for security holes in it, because all you're doing is attacking your own computer.
That's a very specific use case though: running client-side in the user's browser.
So before you can ask whether Python is unfixably insecure, you first have to decide what the minimum level of functionality is that you'll accept. Do you need basic arithmetic plus trignometric functions? Easy enough - disallow all attribute access and imports, and populate builtins with "from math import *". Need them to be able to assign variables and define functions? That's gonna be harder.
I think calling functions and accessing variables and attributes is likely a minimum. Defining functions would be useful, and of course defining classes would be another useful step further.

On Tue, Apr 12, 2016 at 8:43 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Apr 12, 2016 at 03:02:54AM +1000, Chris Angelico wrote:
It all depends on how much functionality you want. If all you need is a numeric expression evaluator, that's not too hard - disallow all forms of attribute access, etc, and just have simple numbers and operators. That's pretty useful, and safe.
By "calculator" I didn't necessarily mean to imply numeric-only, sorry if I was unclear. Also perhaps I should have said "non-trivial", inasmuch as if we restrict it that far then it would quite possibly be simpler and quicker just to write the expression evaluator from scratch and not use the Python interpreter at all.
I'm aware you wanted more. My point is that it's not hard to secure the trivially simple, and it doesn't have to be entirely useless. But every bit of additional power brings with it additional risk. ChrisA

On 11 April 2016 at 17:53, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
You're limiting the subset of Python that people can use, understood. And you're trying to ensure that people can't do "bad things". Again, understood. But what subset are you actually allowing, and what things are you trying to protect against? (For example, I can't calculate sin(1.2) using the math module - why is that not alllowed?
It wasn't allowed in the earlier version because I wasn't allowing import at all, because this is just an experiment. As it happens, I added 'import' yesterday so yes you can use math.sin.
Well, I'll ask the obvious question, then. In allowing "import" did you allow "import ctypes"? If so, then I win :-) Or did you explicitly whitelist certain modules? And if so, which ones are they, and did I succeed if I manage to import a module you hadn't whitelisted?
It feels at the moment as if I'm playing a game where I don't know the rules, and every time I think I scored a point, the rules are changed to retroactively disallow it.
The challenge is to show some code that will escape from the sandbox, in a way that is not trivially fixable with a tiny patch, or in a way that demonstrates that such a large number of tiny patches would be required as to be unworkable.
But I'm still not clear when I count as "outside the sandbox", given that I don't know what the rules of what is allowed *in* the sandbox are... Paul

On Tue, Apr 12, 2016 at 6:17 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Well, I'll ask the obvious question, then. In allowing "import" did you allow "import ctypes"? If so, then I win :-) Or did you explicitly whitelist certain modules? And if so, which ones are they, and did I succeed if I manage to import a module you hadn't whitelisted?
The module whitelist is given at the top of the source code: _SAFE_MODULES = frozenset(( "base64", "binascii", "bisect", "calendar", "cmath", "crypt", "datetime", "decimal", "enum", "errno", "fractions", "functools", "hashlib", "hmac", "ipaddress", "itertools", "math", "numbers", "queue", "re", "statistics", "textwrap", "unicodedata", "urllib.parse", )) And yes, you win if you get another module. Interestingly, you're allowed to import urllib.parse, but not urllib itself; but "import urllib.parse" makes urllib available - and, since modules inside modules are blacklisted, "urllib.parse" doesn't exist (AttributeError). You can access the decimal module, and call decimal.getcontext(). This returns the same default context object that the "outer" Python uses; consequently, this sandboxing technique MUST NOT be used in any program that, now or ever in the future, uses the decimal module (or at least its default context; but I'm not sure how you'd be absolutely sure you never EVER use the default context). Even more curiously, you can "import fractions", but you don't get fractions.Fraction - though you *do* get fractions.Decimal. And importing enum gives you EnumMeta, but metaclasses seem to be broken, and you can't get enum.Enum. The sandbox code assumes that an attacker cannot create files in the current directory. rosuav@sikorsky:~/tmp/unsafe$ echo 'import sys; real_module = lambda mod: sys.modules[mod]' >hashlib.py rosuav@sikorsky:~/tmp/unsafe$ ./unsafe.py -i Python 3.6.0a0 (default:78b84ae0b745+, Apr 6 2016, 03:43:18) [GCC 5.3.1 20160323] on linux Type "help", "copyright", "credits" or "license" for more information. (SafeInteractiveConsole)
import hashlib hashlib.real_module("sys") <module 'sys' (built-in)>
Setting LC_ALL and then working with calendar.LocaleTextCalendar() causes locale files to be read. I'm not sure if you can turn that into an exploit, but the attack surface depends on the installed locales on the system. This is still a massive game of whack-a-mole. ChrisA

On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote:
And yes, you win if you get another module. Interestingly, you're allowed to import urllib.parse, but not urllib itself; but "import urllib.parse" makes urllib available - and, since modules inside modules are blacklisted, "urllib.parse" doesn't exist (AttributeError).
Yes, this is issue #3 on github. I'd need to spend a few minutes thinking about how to make importing of submodules work out properly.
You can access the decimal module, and call decimal.getcontext(). This returns the same default context object that the "outer" Python uses;
OK, decimal goes ;-)
Even more curiously, you can "import fractions", but you don't get fractions.Fraction - though you *do* get fractions.Decimal.
That seems to be because Fraction inherits from numbers.Number, which has a metaclass, so type(Fraction) is abc.ABCMeta not 'type'. That's obviously not a security hole and may well be fixable.
The sandbox code assumes that an attacker cannot create files in the current directory.
If the attacker can create such files then the system is already compromised even if you're not using any sandboxing system, because you won't be able to trust any normal imports from your own code.
Setting LC_ALL and then working with calendar.LocaleTextCalendar() causes locale files to be read.
I don't think that has any obvious relevance. Doing "import enum" causes "enum.py" to be read too, and that isn't a security hole.
This is still a massive game of whack-a-mole.
No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles. The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution.

On Tue, 12 Apr 2016, Jon Ribbens wrote:
This is still a massive game of whack-a-mole.
No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles.
The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution.
"I don't think"? Where's the formal proof? Without a proof, this is indeed just a game of whack-a-mole. I don't "think" Python is a suitable foundation for a sandboxing system intended for security purposes, but my "think" won't lead to security holes whereas yours will. So, I would respectfully suggest that unless you increase the rigour of your effort substantially, it is not worthwhile. Python is great for lots of applications already - there is no need to force it into unsuitable problem domains. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist

On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote:
On Tue, 12 Apr 2016, Jon Ribbens wrote:
This is still a massive game of whack-a-mole.
No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles.
The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution.
"I don't think"?
Where's the formal proof?
I disallowed the module completely, that's the proof.
Without a proof, this is indeed just a game of whack-a-mole.
Almost no computer programs are ever "formally proved" to be secure. None of those that run the global Internet are. I don't see why it makes any sense to demand that my experiment be held to a massively higher standard than the rest of the code everyone relies on every day.

On Tue, Apr 12, 2016 at 1:14 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote:
On Tue, 12 Apr 2016, Jon Ribbens wrote:
This is still a massive game of whack-a-mole.
No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles.
The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution.
"I don't think"?
Where's the formal proof?
I disallowed the module completely, that's the proof.
Without a proof, this is indeed just a game of whack-a-mole.
Almost no computer programs are ever "formally proved" to be secure. None of those that run the global Internet are. I don't see why it makes any sense to demand that my experiment be held to a massively higher standard than the rest of the code everyone relies on every day. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Jon, let me reiterate. You asked people to break it (that's the title of the thread) and they did so almost immediately. Then you patched the thing and asked them to break it again and they did. Now the faulty assumption here is that this procedure, repeated enough times will produce a secure environment - this is not how security works, you need to be secure against people who will spend more than 5 minutes and who are not on this list or reading this incredibly long email chain. You can't do that just by asking on the mailing list and whacking all the examples. As others pointed out, this particular approach (with maybe different details) has been tried again and again and again and the result has been the same - you end up with either a completely unusable python (the python that can't run anything is trivially secure) or you end up with something that's insecure. I suggest you look instead at something like PyPy sandbox - which systematically replaces all external calls with a call to a proxy. Because PyPy is written in RPython, you can do that - the amount of code that needs reviewing is relatively small, a couple pages of code. The code you need to review in order to be even remotely secure is much larger - it's the amount of C code you can call from your python with or without knowing that it can happen. Cheers, fijal

On Tue, Apr 12, 2016 at 01:38:09PM +0200, Maciej Fijalkowski wrote:
Jon, let me reiterate. You asked people to break it (that's the title of the thread) and they did so almost immediately. Then you patched the thing and asked them to break it again and they did. Now the faulty assumption here is that this procedure, repeated enough times will produce a secure environment - this is not how security works,
That is not an accurate summary of what has happened so far, nor am I making that assumption. You are misunderstanding the purpose of the experiment - I am not sure how, as I have tried to be quite clear. The question is: with a minimal (or empty) set of builtins, and a restriction on ast.Name and ast.Attribute nodes, can exec/eval be made 'safe' so they cannot execute code outside the sandbox. The answer appears to be "yes", if the restriction is "^f?_". (If you additionally inject external objects to the namespace then they need to be proxied and mro() prevented.)
You can't do that just by asking on the mailing list and whacking all the examples.
If anyone had managed to find any more examples of holes in the original featureset after the first couple then I would agree with you, but they haven't.
As others pointed out, this particular approach (with maybe different details) has been tried again and again and again
This simply isn't true either. As far as I can see, only RestrictedPython has tried anything remotely similar, and to the best of my ability to determine, that project is not considerd a failure.

2016-04-12 14:18 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
The question is: with a minimal (or empty) set of builtins, and a restriction on ast.Name and ast.Attribute nodes, can exec/eval be made 'safe' so they cannot execute code outside the sandbox.
According to multiple exploits listed in this thread, no, it's not possible.
If anyone had managed to find any more examples of holes in the original featureset after the first couple then I would agree with you, but they haven't.
See my latest exploit using functools.update_wrapper() + A.__setattr__() ;-)
As others pointed out, this particular approach (with maybe different details) has been tried again and again and again
This simply isn't true either. As far as I can see, only RestrictedPython has tried anything remotely similar, and to the best of my ability to determine, that project is not considerd a failure.
IMHO nobody seriously audited RestrictedPython. It doesn't mean that it's secure. When it was created, security was less important than nowadays. Victor

2016-04-12 13:38 GMT+02:00 Maciej Fijalkowski <fijall@gmail.com>:
(...) you end up with either a completely unusable python (the python that can't run anything is trivially secure)
Yeah, that's the obvious question: what's the purpose of such very limited Python subset, for example something limited to int with a few operators (+ - * /)? That's also why I gave up with pysandbox. It became impossible to execute anything more complex than an hello world. By the way, I noticed that enum.Enum and enum.EnumMeta don't work in your sandbox. Victor

On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote:
The sandbox code assumes that an attacker cannot create files in the current directory.
If the attacker can create such files then the system is already compromised even if you're not using any sandboxing system, because you won't be able to trust any normal imports from your own code.
Just confirming that, yeah. Though you could protect against it somewhat by pre-importing everything that can legally be imported; that way, at least the attack surface ceases once untrusted code starts executing. Consider it a privilege escalation attack; you can move from "create file in current directory" to "remote code execution" simply by creating hashlib.py and then importing it.
Setting LC_ALL and then working with calendar.LocaleTextCalendar() causes locale files to be read.
I don't think that has any obvious relevance. Doing "import enum" causes "enum.py" to be read too, and that isn't a security hole.
I mean the system locale files, not just locale.py itself. If nothing else, it's a means of discovering info about the system. I don't know what you can get by figuring out what locales are installed, but it's another concern to think about.
This is still a massive game of whack-a-mole.
No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles.
The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution.
decimal.getcontext is a simple and obvious example of a way that global mutable objects can be accessed across the boundary. There is no way to mathematically prove that there are no more, so it's still a matter of blacklisting. I still think you need to work out a "minimum viable set" and set down some concrete rules: if any feature in this set has to be blacklisted in order to achieve security, the experiment has failed. ChrisA

On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote:
On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles.
The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution.
decimal.getcontext is a simple and obvious example of a way that global mutable objects can be accessed across the boundary. There is no way to mathematically prove that there are no more, so it's still a matter of blacklisting.
No, it's a matter of reducing the whitelist. I must admit that I don't understand in what way this is not already clear. Look:
len(unsafe._SAFE_MODULES) 23
I could "mathematically prove" that there are no more security holes in that list by reducing its length to zero. There are still plenty of circumstances in which the experiment would be a useful tool even with no modules allowed to be imported.
I still think you need to work out a "minimum viable set" and set down some concrete rules: if any feature in this set has to be blacklisted in order to achieve security, the experiment has failed.
The "minimum viable set" in my view would be: no builtins at all, only allowing eval() not exec(), and disallowing yield [from], lambdas and generator expressions.

2016-04-12 13:10 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
No, it's a matter of reducing the whitelist. I must admit that I don't understand in what way this is not already clear. Look:
len(unsafe._SAFE_MODULES) 23
You don't understand that even if the visible "Python scope", "Python namespace", or call it as you want (the code that is accessible from your sandbox) looks very tiny, the real effictive code is HUGE. For example, you give a full access to the str type which is made of 20K lines of C code: haypo@smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c Objects/stringlib/*h 15670 Objects/unicodeobject.c 297 Objects/unicodectype.c 29 Objects/stringlib/asciilib.h 827 Objects/stringlib/codecs.h 27 Objects/stringlib/count.h 109 Objects/stringlib/ctype.h 25 Objects/stringlib/eq.h 250 Objects/stringlib/fastsearch.h 201 Objects/stringlib/find.h 133 Objects/stringlib/find_max_char.h 140 Objects/stringlib/join.h 180 Objects/stringlib/localeutil.h 116 Objects/stringlib/partition.h 53 Objects/stringlib/replace.h 390 Objects/stringlib/split.h 28 Objects/stringlib/stringdefs.h 266 Objects/stringlib/transmogrify.h 30 Objects/stringlib/ucs1lib.h 29 Objects/stringlib/ucs2lib.h 29 Objects/stringlib/ucs4lib.h 11 Objects/stringlib/undef.h 32 Objects/stringlib/unicodedefs.h 1284 Objects/stringlib/unicode_format.h 20156 total Did you review carefully *all* these lines? If a single C line gives access to the real Python namespace, the game is over. In a few minutes, I found "{0.__class__}".format(obj) which is not a full escape of the sandbox, but it's just to give one example. With more time, I'm sure that a line can be found in the str type to escape your sandbox.
I could "mathematically prove" that there are no more security holes in that list by reducing its length to zero.
You only see a very tiny portion of the real attack surface.
The "minimum viable set" in my view would be: no builtins at all, only allowing eval() not exec(), and disallowing yield [from], lambdas and generator expressions.
IMHO it's a waste of time to try to reduce the great Python with battery included to a simple calculator to compute 1+2. You will never be able to fix all holes, there are too many holes in your sandbox. It's very easy to implement your own calculator in pure Python, from the parser to the code to compute the operators. If you write yourself the whole code, it's much easier to control what is allowed and put limits. For example, with your own code, you can put limits on the maximum number, whereas your sandbox will kill your CPU and memory if you try 2**(2**100) (no builtin function required for this "exploit"). Victor

On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote:
2016-04-12 13:10 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
No, it's a matter of reducing the whitelist. I must admit that I don't understand in what way this is not already clear. Look:
len(unsafe._SAFE_MODULES) 23
You don't understand that even if the visible "Python scope", "Python namespace", or call it as you want (the code that is accessible from your sandbox) looks very tiny, the real effictive code is HUGE.
You are mistaken, I do understand that.
In a few minutes, I found "{0.__class__}".format(obj) which is not a full escape of the sandbox, but it's just to give one example.
It's something I'd already thought of, and it's not an escape at all.
I could "mathematically prove" that there are no more security holes in that list by reducing its length to zero.
You only see a very tiny portion of the real attack surface.
You've misunderstood my comment - I was saying that the security holes from imported modules can be easily eliminated. That doesn't say anything about security holes not from imported modules, of course.
The "minimum viable set" in my view would be: no builtins at all, only allowing eval() not exec(), and disallowing yield [from], lambdas and generator expressions.
IMHO it's a waste of time to try to reduce the great Python with battery included to a simple calculator to compute 1+2.
And in my opinion it isn't. There are plenty of use cases for such a thing. Take a look at this for example: https://developer.blender.org/D1862
It's very easy to implement your own calculator in pure Python, from the parser to the code to compute the operators. If you write yourself the whole code, it's much easier to control what is allowed and put limits. For example, with your own code, you can put limits on the maximum number, whereas your sandbox will kill your CPU and memory if you try 2**(2**100) (no builtin function required for this "exploit").
Yes, I'd already thought of that too, although if you allow functions and methods to be called (which they are, in my minimal viable set suggestion above) then I think perhaps you've not actually bought yourself very much with all that work.

I haven't been following this thread in detail, so perhaps I have missed something, but I have a question... On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote:
You don't understand that even if the visible "Python scope", "Python namespace", or call it as you want (the code that is accessible from your sandbox) looks very tiny, the real effictive code is HUGE. For example, you give a full access to the str type which is made of 20K lines of C code:
haypo@smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c Objects/stringlib/*h 15670 Objects/unicodeobject.c [...] 1284 Objects/stringlib/unicode_format.h 20156 total
Did you review carefully *all* these lines? If a single C line gives access to the real Python namespace, the game is over.
I don't follow this logic. Jon's sandbox doesn't provide an interface to calling arbitrary lines of C code from Python. It is limited to only a restricted set of Python operations. So sticking to string methods for the sake of discussion, it doesn't matter if (let's say) str.upper has access to the real Python namespace. There is no API for str.upper to return that namespace. It only returns a new string. So where is the error in the following reasoning? There are 44 string methods, excluding those that start with an underscore. So if Jon audits those 44 methods, and determines which ones return (let's say) strings and which give access to namespaces, then he can block the ones which give access to namespaces and allow the ones which return strings. To give a concrete example... suppose that the C locale library is unsafe. Further, let's suppose that the str.isdigit method calls code from the C locale library, to determine whether or not the string is made up of locale-specific digits. How does this make str.isdigit (potentially) unsafe? Regardless of what happens inside the method, it still returns either True or False and nothing else. There's no str.isdigit API to access the locale library. I can think of one possible threat. Suppose that the locale library has a bug, so that calling "aardvark".isdigit seg faults, potentially executing arbitrary C code, but at the very least crashing the application. Is that the sort of attack you're concerned by?
In a few minutes, I found "{0.__class__}".format(obj) which is not a full escape of the sandbox, but it's just to give one example. With more time, I'm sure that a line can be found in the str type to escape your sandbox.
Maybe so. And then Jon will fix that vulnerability. And somebody will find a new one. And he'll fix that too, or decide that it is too hard to fix and give up. That's how security works. Even software designed for security can have exploitable bugs: http://securityvulns.com/news/FreeBSD/jail/chdir.html It seems unfair to me to hold Jon to a higher standard than we hold people like Apple, or the Linux kernal devs. I fully accept and respect your personal opinion, based on your experience, that Jon's tactic is doomed to failure. But if he needs to learn this for himself, just as you had to learn it for yourself (otherwise you wouldn't have started your own sandbox project), I can respect that too. Progress depends on the unreasonable person who thinks they can overturn the conventional wisdom. You're telling Jon not to bother trying to sandbox CPython, he should use PyPy's sandbox instead. But if the PyPy people had believed the conventional wisdom that you can't sandbox Python, they wouldn't have a sandbox either. Even if the only thing we learn from Jon's experiment is a new set of tricks for breaking out of the sandbox, that's still interesting, if not useful. And maybe he'll find some combination of whielist and OS-level jail that together makes a practical sandbox. And if not, well, it's his own time he is wasting.
IMHO it's a waste of time to try to reduce the great Python with battery included to a simple calculator to compute 1+2.
Completely agree. But hopefully the whitelist won't be that restrictive, and will allow subtraction and multiplication as well :-) -- Steve

On Tue, Apr 12, 2016 at 11:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:
To give a concrete example... suppose that the C locale library is unsafe. Further, let's suppose that the str.isdigit method calls code from the C locale library, to determine whether or not the string is made up of locale-specific digits. How does this make str.isdigit (potentially) unsafe? Regardless of what happens inside the method, it still returns either True or False and nothing else. There's no str.isdigit API to access the locale library.
I can think of one possible threat. Suppose that the locale library has a bug, so that calling "aardvark".isdigit seg faults, potentially executing arbitrary C code, but at the very least crashing the application. Is that the sort of attack you're concerned by?
That is a potentially significant attack vector, as it depends on a lot of external-to-Python information (the current locale, for instance; and we've seen exploits that involve remotely setting environment variables, which could include LC_ALL). However, you're right that it isn't the concern here. There is one other thing to worry about, and that's anything where the "inner" system can affect or influence the "outer" system. With the str type, that's unlikely (since strings are immutable), but I raised the potential concern of the regex cache, as there's a chance someone could attack that. The mere presence of decimal.getcontext() resulted in the whole module getting off the whitelist. If you want complete isolation of one and the other, that's easy: have no communication whatsoever. But then there's no point in having them both execute in the same interpreter. You may as well create a chroot and run Python inside that, have it serialize the result to JSON and write it to stdout, which you can then retrieve. That would pretty much solve the problem. (And in fact, if I were to do-over the project where I wanted Python sandboxing, that's probably what I'd do.) ChrisA

On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote:
I can think of one possible threat. Suppose that the locale library has a bug, so that calling "aardvark".isdigit seg faults, potentially executing arbitrary C code, but at the very least crashing the application. Is that the sort of attack you're concerned by?
This thread already covered the need to address SEGV at length. For a truly evil user, almost any kind of crash is an opportunity to take control of the system, and a security solution ignoring this is no security solution at all.
Maybe so. And then Jon will fix that vulnerability. And somebody will find a new one. And he'll fix that too, or decide that it is too hard to fix and give up.
That's how security works. Even software designed for security can have exploitable bugs:
It seems unfair to me to hold Jon to a higher standard than we hold people like Apple, or the Linux kernal devs.
I don't believe that's what is happening here. In the OS analogy, Jon is generating busywork trying to secure an environment similar to Windows 3.1 that was simply never designed with e.g. memory protection in mind to begin with, and there is no evidence after numerous attempts spanning many years by multiple people that such an environment can be secured meaningfully while still remaining generally useful.
I fully accept and respect your personal opinion, based on your experience, that Jon's tactic is doomed to failure. But if he needs to learn this for himself, just as you had to learn it for yourself (otherwise you wouldn't have started your own sandbox project), I can respect that too. Progress depends on the unreasonable person who thinks they can overturn the conventional wisdom.
I'd deeply prefer it is this turned into an investigation or patchset making CPython work nicely with seccomp, sandbox(7), pledge(2) or whatever capability minimization mechanisms exist on Windows, they are all mechanisms to make it much safer for random code to be executing on your system, designed by folk who at all times expressively had security in mind. But that's not what's happening, instead a dead horse is being flogged over a hundred messages in our inboxes and IMHO it is excruciating to watch.
Even if the only thing we learn from Jon's experiment is a new set of tricks for breaking out of the sandbox, that's still interesting, if not useful.
Don't forget the worst case: a fundamentally broken security module heavily marketed to the naive using claims the core team couldn't break it. David

On Tue, Apr 12, 2016 at 01:40:57PM +0000, David Wilson wrote:
On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote:
I can think of one possible threat. Suppose that the locale library has a bug, so that calling "aardvark".isdigit seg faults, potentially executing arbitrary C code, but at the very least crashing the application. Is that the sort of attack you're concerned by?
This thread already covered the need to address SEGV at length. For a truly evil user, almost any kind of crash is an opportunity to take control of the system, and a security solution ignoring this is no security solution at all.
Indeed.
But that's not what's happening, instead a dead horse is being flogged over a hundred messages in our inboxes and IMHO it is excruciating to watch.
I don't think that is true at all, and I personally I have found this thread very interesting. I apologise if others have not.
Even if the only thing we learn from Jon's experiment is a new set of tricks for breaking out of the sandbox, that's still interesting, if not useful.
Don't forget the worst case: a fundamentally broken security module heavily marketed to the naive using claims the core team couldn't break it.
I should point out that my module is called "unsafe.py", is titled an "experiment", and prominently states in the README: Do not use this code for any purpose in the real world. I will not be putting it up as an installable package, and as already stated it was never my intention to suggest that it or anything like it be included in the stdlib. I will however leave it on github for anyone who wants to have a go at breaking into it in the future.

On Tue, Apr 12, 2016 at 9:10 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote:
decimal.getcontext is a simple and obvious example of a way that global mutable objects can be accessed across the boundary. There is no way to mathematically prove that there are no more, so it's still a matter of blacklisting.
No, it's a matter of reducing the whitelist. I must admit that I don't understand in what way this is not already clear. Look:
len(unsafe._SAFE_MODULES) 23
I could "mathematically prove" that there are no more security holes in that list by reducing its length to zero. There are still plenty of circumstances in which the experiment would be a useful tool even with no modules allowed to be imported.
Yes, you just removed decimal because of getcontext. What about the next module with that kind of issue? Or what about the next non-underscore attribute on a core type that can cause you grief (like how async functions leak stack frames)?
I still think you need to work out a "minimum viable set" and set down some concrete rules: if any feature in this set has to be blacklisted in order to achieve security, the experiment has failed.
The "minimum viable set" in my view would be: no builtins at all, only allowing eval() not exec(), and disallowing yield [from], lambdas and generator expressions.
Then start with that. Don't give ANYTHING else. Otherwise you're still playing with the blacklist. But at that point, you pretty much have something that can't be recognized as Python. You may as well start from a completely different basis and design your own expression evaluator, maybe making use of parse-to-AST, but not actually eval'ing the source code. That's how fundamental this issue is - to dodge the security problems, you get to the point where you've dodged all of what makes Python Python. ChrisA

You seem to be defining a (restricted subset of an existing) language; which will need version strings and ABI tags for compatibility purposes: * Build Tags (for Python variants): * https <https://www.python.org/dev/peps/pep-0425/>:// <https://www.python.org/dev/peps/pep-0425/>www.python.org <https://www.python.org/dev/peps/pep-0425/>/dev/peps/pep-0425/ <https://www.python.org/dev/peps/pep-0425/> * Python tag * ABI tag * Platform tag * https://www.python.org/dev/peps/pep-0513/ manylinux1 * https://www.python.org/dev/peps/pep-3149/ .so file tags * RestrictedPython does not have ABI tags An Android CPython build discussion about just exposing an extra attribute in the platform module (the Android build also ships without some modules IIRC): * https://mail.python.org/pipermail/python-dev/2014-August/135606.html * https://mail.python.org/pipermail/python-dev/2014-August/thread.html#135640 On 11 April 2016 at 15:46, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
It's trying to alter the global Python environment so that arbitrary code can be executed, whereas I am not even trying to allow execution of arbitrary code and am not altering the global environment.
However, it's not at all clear (to me at least) what you *are* trying to do. You're limiting the subset of Python that people can use, understood. And you're trying to ensure that people can't do "bad things". Again, understood. But what subset are you actually allowing, and what things are you trying to protect against? (For example, I can't calculate sin(1.2) using the math module - why is that not alllowed? It's just as safe as using the built in exponential operator, and indeed I could write a sin() function in pure Python, although it would be too slow to be useful, unlike math.sin...) It feels at the moment as if I'm playing a game where I don't know the rules, and every time I think I scored a point, the rules are changed to retroactively disallow it. Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com

On Apr 11 2016, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
What I see is that you asked to break your sandbox, and less than 1 hour later, a first vulnerability was found (exec called with two parameters). A few hours later, a second vulnerability was found (async generator and cr_frame).
The former was just a stupid bug, it says nothing about the viability of the methodology. The latter was a new feature in a Python version later than I have ever used, and again does not imply anything much about the viability.
It implies that new versions of Python may break your sandbox. That doesn't sound like a viable long-term solution.
I think now I've blocked the names of frame object attributes it wouldn't be a vulnerability any more anyway.
It seems like you're playing whack-a-mole. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Mon, Apr 11, 2016 at 08:35:11AM -0700, Nikolaus Rath wrote:
On Apr 11 2016, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
What I see is that you asked to break your sandbox, and less than 1 hour later, a first vulnerability was found (exec called with two parameters). A few hours later, a second vulnerability was found (async generator and cr_frame).
The former was just a stupid bug, it says nothing about the viability of the methodology. The latter was a new feature in a Python version later than I have ever used, and again does not imply anything much about the viability.
It implies that new versions of Python may break your sandbox. That doesn't sound like a viable long-term solution.
That is obviously always going to be true of major new versions with major new features, no matter what language we're talking about or what method is being used to sandbox - unless the sandboxing were to be built in to the language itself, which I have deliberately not suggested. But having said that, I already pointed out in the message you're responding to that with the method I'm using now, coroutines would not have been an issue even if I hadn't specifically fixed them.
I think now I've blocked the names of frame object attributes it wouldn't be a vulnerability any more anyway.
It seems like you're playing whack-a-mole.
Well, no, quite the opposite in fact. If that was true then I would have given up already as the method having been proved useless. So far it looks like blocking "_*" and the frame object attributes appears to be sufficient.

Jon Ribbens wrote:
So far it looks like blocking "_*" and the frame object attributes appears to be sufficient.
Even if your sandbox as it currently exists is secure, it's only an extremely restricted subset. You seem to be assuming that if your technique works so far, then it can be extended to cover a larger subset, but I don't think that's certain. One problem that's been raised is how to prevent untrusted code from monkeypatching imported modules. Possibly that could be addressed by giving the untrusted code a copy of the module, but I'm not entirely sure -- accidentally importing two copies of the same source file is a well-known source of bugs, after all. A related, but more difficult problem is that if we allow the untrusted code to import any pure-Python classes, it will be able to monkeypatch them. So it seems like it will need its own copy of those classes as well -- and having two copies of the same class around is *another* well known source of bugs. -- Greg

On Mon, Apr 11, 2016 at 8:08 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Jon Ribbens wrote:
So far it looks like blocking "_*" and the frame object attributes appears to be sufficient.
Even if your sandbox as it currently exists is secure, it's only an extremely restricted subset. You seem to be assuming that if your technique works so far, then it can be extended to cover a larger subset, but I don't think that's certain.
How would you test that?
One problem that's been raised is how to prevent untrusted code from monkeypatching imported modules. Possibly that could be addressed by giving the untrusted code a copy of the module, but I'm not entirely sure -- accidentally importing two copies of the same source file is a well-known source of bugs, after all.
https://en.wikipedia.org/wiki/Monkey_patch#Pitfalls * https://pypi.python.org/pypi?%3Aaction=search&term=monkeypatch&submit=search * https://pypi.python.org/pypi/apparmor_monkeys * http://eventlet.net/doc/patching.html#monkeypatching-the-standard-library * http://www.gevent.org/gevent.monkey.html * https://docs.python.org/3/library/asyncio-sync.html#locks * https://docs.python.org/2/library/threading.html#lock-objects * https://docs.python.org/2/library/sets.html?highlight=immutable#sets.Immutab... * http://doc.pypy.org/en/latest/stm.html#locks - " Infinite recursion just segfaults for now." * https://github.com/tobgu/pyrsistent #justfoundthis - https://github.com/tobgu/pyrsistent#invariants - https://github.com/tobgu/pyrsistent#freeze-and-thaw - freeze, thaw * define a @property (and no @propname.setter) - https://docs.python.org/2/howto/descriptor.html#properties - https://docs.python.org/2/library/functions.html#property
A related, but more difficult problem is that if we allow the untrusted code to import any pure-Python classes, it will be able to monkeypatch them. So it seems like it will need its own copy of those classes as well --
* https://docs.python.org/3/library/importlib.html#importlib.__import__ *
and having two copies of the same class around is *another* well known source of bugs.
One way to reduce the likelihood of this is to bundle all dependencies into a self-contained PEX ZIP package and specify entry points. * http://legacy.python.org/dev/peps/pep-0441/ * https://pex.readthedocs.org/en/stable/buildingpex.html#specifying-entry-poin... * https://pex.readthedocs.org/en/stable/buildingpex.html#tailoring-pex-executi...
-- Greg
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com

On Tue, Apr 12, 2016 at 01:08:36PM +1200, Greg Ewing wrote:
Jon Ribbens wrote:
So far it looks like blocking "_*" and the frame object attributes appears to be sufficient.
Even if your sandbox as it currently exists is secure, it's only an extremely restricted subset.
I'm not sure what you think the restrictions are, but yes a highly restricted Python that was secure would be very useful sometimes.
You seem to be assuming that if your technique works so far, then it can be extended to cover a larger subset, but I don't think that's certain.
No, I'm not assuming that.
One problem that's been raised is how to prevent untrusted code from monkeypatching imported modules. Possibly that could be addressed by giving the untrusted code a copy of the module,
Yes, that's what it does.
but I'm not entirely sure -- accidentally importing two copies of the same source file is a well-known source of bugs, after all.
I'm not sure what you mean by that.
A related, but more difficult problem is that if we allow the untrusted code to import any pure-Python classes, it will be able to monkeypatch them. So it seems like it will need its own copy of those classes as well
Yes, that's also what it does.
-- and having two copies of the same class around is *another* well known source of bugs.
I'm not sure what you mean by that either.

On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes).
Rather annoying point: Your interactive mode allows no editing keys (readline etc), and also doesn't have underscore for "last result", as that's a forbidden name. :( Makes tinkering fiddly. ChrisA

On Tue, Apr 12, 2016 at 06:28:34PM +1000, Chris Angelico wrote:
On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes).
Rather annoying point: Your interactive mode allows no editing keys (readline etc), and also doesn't have underscore for "last result", as that's a forbidden name. :( Makes tinkering fiddly.
It's just a subclass of the stdlib class code.InteractiveConsole, which seems not to offer those features unfortunately.

2016-04-08 16:18 GMT+02:00 Jon Ribbens <jon+python-dev@unequivocal.co.uk>:
I've made another attempt at Python sandboxing, which does something which I've not seen tried before - using the 'ast' module to do static analysis of the untrusted code before it's executed, to prevent most of the sneaky tricks that have been used to break out of past attempts at sandboxes.
Right, it blocks the most trivial attacks against sandboxes. But you only fixed a few holes, they are still a wide area of holes to escape your sandbox. I read your code and the code of CPython. I found many issues. Your sandbox runs untrusted code in a new namespace. The game is to get access of the outter namespace, the real Python namespace. For example, get the namespace of the unsafe module. Your bet is that blocking access to "_" variables, using a whitelist of modules and a few other protections is enough to block access to the real namespace. The problem is that Python provides a very wide range of tools for introspection. I expected to find a hole using the C code, but in fact, it was much simpler than that. Your "safe import" hides real functions with a proxy. Ok. But the code of modules is still run in the real namespace, where I expected that modules run in the untrusted (restricted) namespace. The game is now to find a way to retrieve content from the real namespace using any function exposed in modules. I found functools.update_wrapper(). I was very surprised because this function calls getattr() and setattr(), whereas your sandbox replaces these builtin functions. In fact, the "safe" getattr and setattr are only installed in the untrusted namespace, and as I wrote, the modules run in the real Python namespace.
I would be very interested to see if anyone can manage to break it.
So here you have: --- import functools # any proxy function from unsafe.py import base64 src = base64.main # hack to get any attribute of an object def getattr(obj, attr): secret = None class A: def __setattr__(self, key, value): nonlocal secret if key == attr: secret = value dst = A() functools.update_wrapper(dst, src, assigned=(attr,), updated=()) return secret builtins = getattr(base64.main, "__globals__")["__builtins__"] fn = "/tmp/owned" with builtins.open(fn, "w") as f: f.write("game over!\n") --- The exploit is based on two things: * update_wrapper() is used to get the secret attribute using the real getattr() function * update_wrapper() + A.__setattr__ are used to pass the secret from the real namespace to the untrusted namespace
Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable?
You can block the functools.update_wrapper(), or even the whole functools module. But it will not fix the root cause: modules must run in the untrusted namespace. In pysandbox, I have code to ensure that all modules run in the untrusted namespace: see CleanupBuiltins in sandbox/builtins.py. But it was not enough, many vulnerabilities were found even with all my protections. I'm sure that many others will find other ways to escape your sandbox with enough time. It's a matter of time, not a matter of whitelists. As I wrote in my long explaning why pysandbox is broken by design, writing a sandbox inside a CPython doesn't work. In fact, what you want to restrict is the access to limited resources like CPU and memory, and block access to the filesystem. This is the job of the operating system, and external sandboxes help to block access to the filesystem. Victor

2016-04-12 14:16 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
I read your code and the code of CPython. I found many issues. (...) The exploit is based on two things:
* update_wrapper() is used to get the secret attribute using the real getattr() function * update_wrapper() + A.__setattr__ are used to pass the secret from the real namespace to the untrusted namespace
Oh, I forgot to mention another vulnerability: you block access to attributes by replacing getattr and by analyzing the AST. Ok, but one more time, it's not enough. If you get access to obj.__dict__, you will likely get access to any attribute using obj_dict[attr] instead of obj.attr. I wrote pysandbox because I liked Tav's idea of *removing* sensitive dictionary keys of sensitive types like functions, frames and code objects. Again, it was not enough. Victor

On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote:
Oh, I forgot to mention another vulnerability: you block access to attributes by replacing getattr and by analyzing the AST. Ok, but one more time, it's not enough. If you get access to obj.__dict__, you will likely get access to any attribute using obj_dict[attr] instead of obj.attr.
That's not a vulnerability, and it's something I already explicitly mentioned - if you can get a function to return an object's __dict__ then you win. The question is: can you do that?

On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote:
Oh, I forgot to mention another vulnerability: you block access to attributes by replacing getattr and by analyzing the AST. Ok, but one more time, it's not enough. If you get access to obj.__dict__, you will likely get access to any attribute using obj_dict[attr] instead of obj.attr.
That's not a vulnerability, and it's something I already explicitly mentioned - if you can get a function to return an object's __dict__ then you win. The question is: can you do that?
The question is, rather: Can you prove that we cannot? ChrisA

On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote:
On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
That's not a vulnerability, and it's something I already explicitly mentioned - if you can get a function to return an object's __dict__ then you win. The question is: can you do that?
The question is, rather: Can you prove that we cannot?
I refer you to the answer given previously. Can you prove you cannot write code to escape JavaScript sandboxes? No? Then why have you not disabled JavaScript in your browser?

On Tue, Apr 12, 2016 at 10:49 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote:
On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
That's not a vulnerability, and it's something I already explicitly mentioned - if you can get a function to return an object's __dict__ then you win. The question is: can you do that?
The question is, rather: Can you prove that we cannot?
I refer you to the answer given previously. Can you prove you cannot write code to escape JavaScript sandboxes? No? Then why have you not disabled JavaScript in your browser?
I personally cannot, any more than I can prove that SSL is secure or that my Linux+Apache system doesn't allow remote code execution [1]. I trust other people to, and then make a value judgement: is it worth breaking all the web sites that depend on it? (And sometimes the answer is "yes".) One of the key differences with scripts in web browsers is that there *is* no "outer environment" to access. Remember what I said about the difference between Python-in-Python sandboxing and, say, Lua-in-Python? One tiny exploit in Python-in-Python and you suddenly gain access to the entire outer environment, and it's game over. One tiny exploit in Lua-in-Python and you have whatever that exploit gave you, nothing more. In fact, if you're prepared to forfeit almost all of Python's power to achieve security, you probably should look into embedding a JavaScript or Lua engine in your Python code. You'll get a comparable expression evaluator, and most people won't be able to tell the difference. You've already cut the set of modules down to just cmath, datetime, math, and re; I suspect re is next on the chopping block (it has a global cache - if the outer system uses a regular expression more than once, it would potentially be possible to mess with it in the cache, and then next time it gets used, the injected code gets run), and datetime might not be that far behind. And if they do go, all you have left is a scientific calculator. You can implement that in any language you like. ChrisA [1] And if anyone mentions PHP, I will set him to work on the hardest PHP problem I know of - no, not securing it. I mean convincing end users that it's not necessary. Securing it is trivial by comparison.

On Tue, Apr 12, 2016 at 11:03:11PM +1000, Chris Angelico wrote:
One of the key differences with scripts in web browsers is that there *is* no "outer environment" to access.
If you think that then I think you considerably misunderstand how modern browsers work.
Remember what I said about the difference between Python-in-Python sandboxing and, say, Lua-in-Python? One tiny exploit in Python-in-Python and you suddenly gain access to the entire outer environment, and it's game over. One tiny exploit in Lua-in-Python and you have whatever that exploit gave you, nothing more.
Are you imagining the Lua-in-Python as being completely isolated from the Python namespace then?
In fact, if you're prepared to forfeit almost all of Python's power to achieve security, you probably should look into embedding a JavaScript or Lua engine in your Python code.
Yes, I have in fact already done this (JavaScript using SpiderMonkey). It allows the JavaScript to access Python objects and methods directly from JavaScript so it doesn't actually help, but I think I could put limits on that (e.g. making things read-only) and unlike most of this Python stuff, that could be made a solid rule with no clever ways around it.
I suspect re is next on the chopping block (it has a global cache - if the outer system uses a regular expression more than once, it would potentially be possible to mess with it in the cache, and then next time it gets used, the injected code gets run),
All you could do would be to give misleading results from the regular expression methods, but yes that is a good point. I regret that I added the import stuff at all now - it has just been a distraction from my original point.
[1] And if anyone mentions PHP, I will set him to work on the hardest PHP problem I know of - no, not securing it. I mean convincing end users that it's not necessary. Securing it is trivial by comparison.
Fortunately I have managed to exclude PHP completely these days from any system I have anything to do with!

On Tue, Apr 12, 2016 at 02:16:57PM +0200, Victor Stinner wrote:
I read your code and the code of CPython. I found many issues.
Thanks for your efforts.
Your "safe import" hides real functions with a proxy. Ok. But the code of modules is still run in the real namespace,
Yes, that was the intention.
I found functools.update_wrapper(). I was very surprised because this function calls getattr() and setattr(), whereas your sandbox replaces these builtin functions.
Good point. It seems it was almost certainly foolish of me to add 'import' back in in response to peoples' comments while my original concept was still being discussed.
So here you have: --- import functools
Thanks, that was pretty clever. I've of course fixed it by reducing the list of imports (a lot, since I had really audited them at all). But you make a good point.
participants (20)
-
Arthur Darcet
-
Chris Angelico
-
David Wilson
-
Greg Ewing
-
Isaac Morland
-
Jon Ribbens
-
Jonathan Goble
-
Maciej Fijalkowski
-
Marcin Kościelnicki
-
Nick Coghlan
-
Nikolaus Rath
-
Oleg Broytman
-
Oscar Benjamin
-
Paul Moore
-
Robert Collins
-
Serhiy Storchaka
-
Steven D'Aprano
-
Tres Seaver
-
Victor Stinner
-
Wes Turner