new security doc using object-capabilities
After various people suggesting object-capabilities, takling with Mark S. Miller of the E programming language, and the people Mark works with at HP Labs (who have been giving talks every week during this month here at Google on object-capabilities), I have decided to go with object-capabilities for securing interpreters. I have rewritten my design doc from scratch and deleted the old one. The new doc is named securing_python.txt and can be found through the svn web interface at http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log. I have pretty much ignored any concrete API and such and gone more with a conceptual doc to make sure the API does not get in the way of the core security model. Using object-capabilities should make the implementation much cleaner. There is much less work directly on the interpreter and more of it gets pushed up to extension modules. I also have the okay of my supervisor to use this approach in my dissertation so this will get done. Two things do fall out of all of this which will make development much more modular and easier. First, the memory cap work just becomes a special build on its own; no need to tie into the security work. So I will be cleaning up the bcannon-sandboxing branch code as it stands, and then either create a separate branch for the object-capabilities work, or create another branch for the memory cap stuff and shift the changes over there. I will most likely do the former so as to not lose the history on the checkins. I also plan to rewrite the import machinery in pure Python. This will make the code much more maintainable and make creating proxies for the import machinery much easier. I will be doing that in a directory in the sandbox initially since it needs to work from what Python has now (and possibly some new extension module code) before it can be integrated into the interpreter directly. Anyone who wants to help with that can. I already have some perliminary notes on the whole thing and I think it will be reasonably doable. Anyway, there you go. Here is to hoping I have thought this all through properly. =) -Brett
On Wed, 19 Jul 2006, Brett Cannon wrote:
I have decided to go with object-capabilities for securing interpreters. I have rewritten my design doc from scratch and deleted the old one. The new doc is named securing_python.txt and can be found through the svn web interface at http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log.
This is amazing news!! I'm going off to read your document right now. -- ?!ng
Brett Cannon wrote:
After various people suggesting object-capabilities, takling with Mark S. Miller of the E programming language, and the people Mark works with at HP Labs (who have been giving talks every week during this month here at Google on object-capabilities), I have decided to go with object-capabilities for securing interpreters. I have rewritten my design doc from scratch and deleted the old one. The new doc is named securing_python.txt and can be found through the svn web interface at http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log <http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log> . I have pretty much ignored any concrete API and such and gone more with a conceptual doc to make sure the API does not get in the way of the core security model.
This may not be relevant or possible, in which case I apologise, but the .NET model of creating application domains is extremely useful. It allows you to assign domains and run code within those domains. This means, for example, you can create a plugin system and run the plugins in a secure domain. I realise that this was the intent of the original rexec module, and your proposed new design (which is very exciting) overcomes the difficulties in that approach. The only approach using the new system would be interprocess communication (?) with a trusted interpreter communicating with an un-trusted one. Would the communication layer need to be implemented as a C extension, or will a standard Python API be possible ? Hmmm.... maybe I should read your doc. :-) Michael Foord http://www.voidspace.org.uk/python/index.shtml
Using object-capabilities should make the implementation much cleaner. There is much less work directly on the interpreter and more of it gets pushed up to extension modules. I also have the okay of my supervisor to use this approach in my dissertation so this will get done.
Two things do fall out of all of this which will make development much more modular and easier. First, the memory cap work just becomes a special build on its own; no need to tie into the security work. So I will be cleaning up the bcannon-sandboxing branch code as it stands, and then either create a separate branch for the object-capabilities work, or create another branch for the memory cap stuff and shift the changes over there. I will most likely do the former so as to not lose the history on the checkins.
I also plan to rewrite the import machinery in pure Python. This will make the code much more maintainable and make creating proxies for the import machinery much easier. I will be doing that in a directory in the sandbox initially since it needs to work from what Python has now (and possibly some new extension module code) before it can be integrated into the interpreter directly. Anyone who wants to help with that can. I already have some perliminary notes on the whole thing and I think it will be reasonably doable.
Anyway, there you go. Here is to hoping I have thought this all through properly. =)
-Brett ------------------------------------------------------------------------
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
Michael Foord wrote:
Brett Cannon wrote:
After various people suggesting object-capabilities, takling with Mark S. Miller of the E programming language, and the people Mark works with at HP Labs (who have been giving talks every week during this month here at Google on object-capabilities), I have decided to go with object-capabilities for securing interpreters. I have rewritten my design doc from scratch and deleted the old one. The new doc is named securing_python.txt and can be found through the svn web interface at http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log <http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log> . I have pretty much ignored any concrete API and such and gone more with a conceptual doc to make sure the API does not get in the way of the core security model.
This may not be relevant or possible, in which case I apologise, but the .NET model of creating application domains is extremely useful. It allows you to assign domains and run code within those domains. This means, for example, you can create a plugin system and run the plugins in a secure domain.
I realise that this was the intent of the original rexec module, and your proposed new design (which is very exciting) overcomes the difficulties in that approach. The only approach using the new system would be interprocess communication (?) with a trusted interpreter communicating with an un-trusted one. Would the communication layer need to be implemented as a C extension, or will a standard Python API be possible ? Hmmm.... maybe I should read your doc. :-)
Ok, started to read the doc - and realise it specifically addresses these issues. My apologies :-) Michael http://www.voidspace.org.uk/python/index.shtml
Michael Foord http://www.voidspace.org.uk/python/index.shtml
Using object-capabilities should make the implementation much cleaner. There is much less work directly on the interpreter and more of it gets pushed up to extension modules. I also have the okay of my supervisor to use this approach in my dissertation so this will get done.
Two things do fall out of all of this which will make development much more modular and easier. First, the memory cap work just becomes a special build on its own; no need to tie into the security work. So I will be cleaning up the bcannon-sandboxing branch code as it stands, and then either create a separate branch for the object-capabilities work, or create another branch for the memory cap stuff and shift the changes over there. I will most likely do the former so as to not lose the history on the checkins.
I also plan to rewrite the import machinery in pure Python. This will make the code much more maintainable and make creating proxies for the import machinery much easier. I will be doing that in a directory in the sandbox initially since it needs to work from what Python has now (and possibly some new extension module code) before it can be integrated into the interpreter directly. Anyone who wants to help with that can. I already have some perliminary notes on the whole thing and I think it will be reasonably doable.
Anyway, there you go. Here is to hoping I have thought this all through properly. =)
-Brett ------------------------------------------------------------------------
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
That's great. I just read your draft but I have little comments to do but before let me say that I liked the idea to borrow concepts from E. I've crossed the E's path in the beginning of this year and I found it a pot of really nice ideas (for promises and capabilities). Here are my comments about the draft: - it's not really clear to me what the "powerbox" is. I think I got the concept of "super process" but maybe it's to be clarified, isn't it? It become clear in the "threat model" paragraph - I hope no Rubystas will read the "Problem of No Private Namespace" section because they have private/protected keywords to enforce this stuff :-) Writing proxies in C will slow down the dev process (altough will speed up the performance maybe) but in a far future someone will come up with an alternative closer to the Python level - Can you write down a simple example of what you mean with "changing something of the built-in objects"? (in "Problem of mutable shared state") - What about the performance issues of the capabilities model overall? - I know what you meant to say but the paragraph about pythonicness and the security model seems a little "fuzzy" to me. Which are the boundaries of the allowed changes for the security stuff? - You don't say anything about networking and networked resources in the list of the standard sandboxed interpreter - Suppose we have a .py module. Based on your security model we can import it, right? When imported it generates a .pyc file. The second time we import it what happens? .pyc is ignored? import is not allowed at all? We can't rely on the name of the file.pyc because an attacker who knows the file.py is secure and the second import is done against file.pyc can replace the "secure" file.pyc with an implementation not secure and can do some kind of harm to the sandbox - About "Filesystem information". Does the sandboxed interpreter need to know all that information about file paths, files and so on? Can't we reset those attributes to something arbitrary? - About sys module: I think the best way is to have a purged fake sys module with only the stuff you need. pypy has the concept of faked modules too (altough for a different reason) - About networking: what do you think about the E's model of really safe networking, protected remotable objects and safe RPC? Is that model applicable to Python's in some way? We can't use the E's model as a whole (ask people to generate a safe key and send it by email is unfeasible) - is the protected memory model a some kind of memory monitor system? I think that's all for the draft. I wrote these comments during the reading of the document. Hope some of these help -- Lawrence http://www.oluyede.org/blog
On 7/20/06, Lawrence Oluyede <l.oluyede@gmail.com> wrote:
That's great. I just read your draft but I have little comments to do but before let me say that I liked the idea to borrow concepts from E. I've crossed the E's path in the beginning of this year and I found it a pot of really nice ideas (for promises and capabilities). Here are my comments about the draft:
- it's not really clear to me what the "powerbox" is. I think I got the concept of "super process" but maybe it's to be clarified, isn't it? It become clear in the "threat model" paragraph
The powerbox is the thing that gives your security domains their initial abilities. The OS gives the process its abilities, but it does not directly work with the interpreter. Since the process does, though, it is considered the powerbox and farms out abilities that it has been given by the OS. I have tried to clarify the definition at the start of the doc. - I hope no Rubystas will read the "Problem of No Private Namespace"
section because they have private/protected keywords to enforce this stuff :-) Writing proxies in C will slow down the dev process (altough will speed up the performance maybe) but in a far future someone will come up with an alternative closer to the Python level
Maybe. As I said in the doc, any changes must be Pythonic and adding private namespaces right now wouldn't be without much more thought and work. And if Ruby ends up with this security model but more thoroughly, more power to them. Their language is different in the right ways to support it. As for coding in C, thems the breaks. I plan in adding stuff to the stdlib for the common case. I might eventually think of a good, generic proxy object that could be used, but as of right now I am not worrying about that since it would be icing on the cake. - Can you write down a simple example of what you mean with "changing
something of the built-in objects"? (in "Problem of mutable shared state")
Done. - What about the performance issues of the capabilities model overall? Should be faster than an IBAC model since certain calls will not need to check the identity of the caller every time. But I am not worrying about performance, I am worrying about correctness, so I did not try to make any performance claims. - I know what you meant to say but the paragraph about pythonicness
and the security model seems a little "fuzzy" to me. Which are the boundaries of the allowed changes for the security stuff?
Being "pythonic" is a fuzzy term in itself and Guido is the only person who can make definitive claims over what is and is not Pythonic. As I have said, this doc was mostly written with python-dev in mind since they are the ones I have to convince to let this into the core and they all know the term. But I have tacked in a sentence on what the term means. - You don't say anything about networking and networked resources in
the list of the standard sandboxed interpreter
Nope. Have not started worrying about that yet. Just trying to get the basic model laid out. - Suppose we have a .py module. Based on your security model we can
import it, right? When imported it generates a .pyc file. The second time we import it what happens? .pyc is ignored? import is not allowed at all? We can't rely on the name of the file.pyc because an attacker who knows the file.py is secure and the second import is done against file.pyc can replace the "secure" file.pyc with an implementation not secure and can do some kind of harm to the sandbox
It will be ignored. But I am hoping that through rewriting the import machinery more control over generating .pyc files can be had (see Skip Montanaro's PEP on this; forget the number). This is why exact details were left out of the implementation details. I just wanted people understand the approach to everything, not the concrete details of how it will be coded up. - About "Filesystem information". Does the sandboxed interpreter need
to know all that information about file paths, files and so on? Can't we reset those attributes to something arbitrary?
That is the point. It is not that the sandbox needs to know it, its that it needs to be hidden from the sandbox. - About sys module: I think the best way is to have a purged fake sys
module with only the stuff you need. pypy has the concept of faked modules too (altough for a different reason)
OK. - About networking: what do you think about the E's model of really
safe networking, protected remotable objects and safe RPC? Is that model applicable to Python's in some way? We can't use the E's model as a whole (ask people to generate a safe key and send it by email is unfeasible)
I have not looked at it. I am also not trying to build an RPC system *and* a security model for Python. That is just too much work right now. - is the protected memory model a some kind of memory monitor system? Basically. It just keeps a size_t on the memory cap and another on memory usage, and when memory is requested it makes sure that it won't go over the cap. And when memory is freed the usage goes down. It's very rough (hard to account for padding bits, etc. in C structs), but it should be good enough to prevent a program from hitting 800 MB when you really just wanted it to have 5 MB. I think that's all for the draft. I wrote these comments during the
reading of the document.
Hope some of these help
Thanks, Lawrence. -Brett
Should be faster than an IBAC model since certain calls will not need to check the identity of the caller every time.
But I am not worrying about performance, I am worrying about correctness, so I did not try to make any performance claims.
Got that.
Nope. Have not started worrying about that yet. Just trying to get the basic model laid out.
Ok sorry to have bothered
That is the point. It is not that the sandbox needs to know it, its that it needs to be hidden from the sandbox.
So I think that's a "simple" step during the importing step.
I have not looked at it. I am also not trying to build an RPC system *and* a security model for Python. That is just too much work right now.
Ok sorry :-)
Thanks, Lawrence.
Thank you! -- Lawrence http://www.oluyede.org/blog
"Lawrence Oluyede" <l.oluyede@gmail.com> wrote in message news:9eebf5740607200117r4d4613e2i91665ea211bab46@mail.gmail.com...
- I know what you meant to say but the paragraph about pythonicness and the security model seems a little "fuzzy" to me.
I agree that this paragraph is weak and recommend that it be rewritten. In particular, I think the 'pythonic*' words should go, especially if you expect this document to be read by anyone other than dedicated pythonistas. I would start with something like "It is my goal that my thesis work be incorporated in some future version of the Python distribution. This has two constraints. First, changes to the core must not slow down normal operation. Second, visible changes must not violate the spirit and style of Python that make it a distinctive language." This alludes to the fact that your proposal discusses two highly overlapping yet separate projects: write a thesis that gains you a PhD degree; and produce an accepted patch set that give Python a useful security capability it does not now have. They have to be thought of as somewhat separate because you have two sets of 'overseers' and approvers: your thesis advisor and committee for the first; and Guido and other Python developers for the second. I think your thesis should currently be your first priority. Your current paragraph implied to me that you would not follow a promising line of research if you could not see how to make it 'pythonic'. If I were on your thesis committee, I think that would bother me ;-). In any case, I wish you the best with a double project that is obviously not a 'gimme'. Terry Jan Reedy (PhD, though not on any thesis committees)
Brett Cannon wrote:
The new doc is named securing_python.txt and can be found through the svn web interface at
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log.
How do you plan to handle CPU-hogs? Stuff like execution of a gigantic integer multiplication. This recipe for safe_eval: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746 which is otherwise very cute, does not handle this case as well: it tries to catch and interrupt long-running operations through a secondary thread, but fails on a single long operation because the GIL is not released and the alarm thread does not get its chance to run. -- Giovanni Bajo
On 7/20/06, Giovanni Bajo <rasky@develer.com> wrote:
Brett Cannon wrote:
The new doc is named securing_python.txt and can be found through the svn web interface at
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log .
How do you plan to handle CPU-hogs? Stuff like execution of a gigantic integer multiplication.
I don't. =) Protecting the CPU is damn hard to do in any form of portable fashion. And even getting it to work on an OS you do know the details of leads to probably an interrupt implementation and that doesn't sound fun. -Brett
Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log
.
How do you plan to handle CPU-hogs? Stuff like execution of a gigantic integer multiplication.
I don't. =) Protecting the CPU is damn hard to do in any form of portable fashion. And even getting it to work on an OS you do know the details of leads to probably an interrupt implementation and that doesn't sound fun.
I think the trick used by the safe_eval recipe (a separate thread which interrupts the script through thread.interrupt_main()) shows that, in most cases, it's possible to make sure that an embedded script does not take too long to execute. Do you agree that this usage case ("allow me to timeout an embedded script") is something which would be a very good start in the right direction? Now, I wonder, in a restricted execution environment such as that depicted in your document, how many different ways are there to make the Python interpreter enter a long calcolation loop which does not release the GIL? I can think of bignum*bignum, bignum**bignum or similar mathematical operations, but there are really a few. If we could make those release the GIL (or poll some kind of watchdog used to abort them, pretty much like they normally poll CTRL+C), then the same trick used by the recipe could be used. -- Giovanni Bajo
On 7/20/06, Giovanni Bajo <rasky@develer.com> wrote:
Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log
.
How do you plan to handle CPU-hogs? Stuff like execution of a gigantic integer multiplication.
I don't. =) Protecting the CPU is damn hard to do in any form of portable fashion. And even getting it to work on an OS you do know the details of leads to probably an interrupt implementation and that doesn't sound fun.
I think the trick used by the safe_eval recipe (a separate thread which interrupts the script through thread.interrupt_main()) shows that, in most cases, it's possible to make sure that an embedded script does not take too long to execute. Do you agree that this usage case ("allow me to timeout an embedded script") is something which would be a very good start in the right direction?
Probably. I just don't feel like worrying about it right now. =) Now, I wonder, in a restricted execution environment such as that depicted
in your document, how many different ways are there to make the Python interpreter enter a long calcolation loop which does not release the GIL? I can think of bignum*bignum, bignum**bignum or similar mathematical operations, but there are really a few. If we could make those release the GIL (or poll some kind of watchdog used to abort them, pretty much like they normally poll CTRL+C), then the same trick used by the recipe could be used.
Well, any work that does most of its calculation within C code and that does not touch base with the interpreter on a semi-regular basis would need to relesae the GIL. -Brett
For code objects, their construction is already commonly written as "compile(source)". For type objects, the constructor doesn't let you do anything you can't already do with a class statement. It doesn't need securing. For rewriting import.c in Python, the PEP 302 compliant import system API in pkgutil would be a good starting point. Your doc also asks about the imp.get_suffixes() list, and wonder where to set it from Python. As far as I am aware, you can't. get_suffixes() is built from _PyImport_FileTab, which is a C array. A switch statement is used to get from the file table entries to the appropriate handler functions. Quoting from the suggestions I put to the Py3k list: Use smarter data structures --------------------------- Currently, the individual handlers to load a fully identified module are exposed to Python code in a way that reflects the C-style data structures used in the current implementation. Simply switching to more powerful data structures for the file type handlers (i.e. use a PyTuple for filedescr values, a PyList for _PyImport_FileTab, and a PyDict instead of a switch statement to go from filedescr values to module loading/initialisation functions) and manipulating them all as normal Python objects could make the code in import.c much easier to follow. Extensible file type handling ----------------------------- If the file type handlers are stored in normal Python data structures as described above, it becomes feasible to make the import system extensible to different file types as well as to different file locations. This could be handled on a per-package basis, e.g. via a __file_types__ special attribute in packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
On 7/20/06, Nick Coghlan <ncoghlan@gmail.com> wrote:
For code objects, their construction is already commonly written as "compile(source)".
Right, but some people like to construct directly from bytecode. For type objects, the constructor doesn't let you do anything you can't
already do with a class statement. It doesn't need securing.
I figured as much, but when I was making the list I was not sure and didn't want to stop my writing momentum to check. For rewriting import.c in Python, the PEP 302 compliant import system API in
pkgutil would be a good starting point.
Yep. Plan on looking at all of the various modules in the stdlib that assist with importing, package PEP (I think there is one), and PEP 302. Your doc also asks about the imp.get_suffixes() list, and wonder where to
set it from Python.
As far as I am aware, you can't. get_suffixes() is built from _PyImport_FileTab, which is a C array. A switch statement is used to get from the file table entries to the appropriate handler functions.
Ah, OK. Quoting from the suggestions I put to the Py3k list:
Use smarter data structures --------------------------- Currently, the individual handlers to load a fully identified module are exposed to Python code in a way that reflects the C-style data structures used in the current implementation.
Simply switching to more powerful data structures for the file type handlers (i.e. use a PyTuple for filedescr values, a PyList for _PyImport_FileTab, and a PyDict instead of a switch statement to go from filedescr values to module loading/initialisation functions) and manipulating them all as normal Python objects could make the code in import.c much easier to follow.
Yep. I just kind of glanced at the rest of your suggestions, Nick, since I assumed a lot of it would change (or could be changed) if import was redone in as much Python as possible. Extensible file type handling
----------------------------- If the file type handlers are stored in normal Python data structures as described above, it becomes feasible to make the import system extensible to different file types as well as to different file locations.
Yep. Although I am more interested in restricting than broadening the file types. This could be handled on a per-package basis, e.g. via a __file_types__
special attribute in packages.
Maybe. I don't want to get into introducing new abilities to start, though. -Brett
Brett Cannon wrote:
Extensible file type handling ----------------------------- If the file type handlers are stored in normal Python data structures as described above, it becomes feasible to make the import system extensible to different file types as well as to different file locations.
Yep. Although I am more interested in restricting than broadening the file types.
Either way you'd be mutating the list of recognised file types :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Hi Brett, On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
I also plan to rewrite the import machinery in pure Python.
http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/importing.py A bientot, Armin
On 7/22/06, Armin Rigo <arigo@tunes.org> wrote:
Hi Brett,
On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
I also plan to rewrite the import machinery in pure Python.
http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/importing.py
Thanks for the link, Armin. Since you guys don't have the import restrictions the CPython version would have and just have different coding needs for RPython obviously I can't just do a blind copy. But I will definitely take a look as I develop. Maybe you guys can even help to lower the duplication if it makes sense for you. BTW, do you guys happen to have extra tests from import? -Brett
Hi Brett, On Sat, Jul 22, 2006 at 10:33:19AM -0700, Brett Cannon wrote:
Thanks for the link, Armin. Since you guys don't have the import restrictions the CPython version would have and just have different coding needs for RPython obviously I can't just do a blind copy. But I will definitely take a look as I develop. Maybe you guys can even help to lower the duplication if it makes sense for you.
Yes, it should be possible to abstract the common logic in some way, using some kind of interface for all OS inspection and 'sys.modules' manipulations.
BTW, do you guys happen to have extra tests from import?
Yes, there is http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/test/test_import.... which will also need a bit of rewriting, but that should be straightforward. A bientot, Armin
Re-hi, On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log.
I'm not sure I understand what you propose to fix holes like constructors and __subclasses__: it seems that you want to remove them altogether (and e.g. make factory functions instead). That would completely break all programs, right? I mean, there is no way such changes would go into mainstream CPython. Or do you propose to maintain a CPython branch manually for the foreseeable future? (From experience this is a bad idea...) A bientot, Armin
On 7/22/06, Armin Rigo <arigo@tunes.org> wrote:
Re-hi,
On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log .
I'm not sure I understand what you propose to fix holes like constructors and __subclasses__: it seems that you want to remove them altogether (and e.g. make factory functions instead). That would completely break all programs, right?
Not altogether, just constructors on select types who are considered dangerous from a security standpoint. The breakage won't be horrible, but it will be there for advanced Python code. I will try to make the wording more clear when I get back to work on Tuesday.
I mean, there is no way such changes would go into mainstream CPython.
If this has to wait until Py3k then so be it.
Or do you propose to maintain a CPython branch manually for the foreseeable future? (From experience this is a bad idea...)
Yeah, not my idea of fun either, but since this is a long term project, I will at least need to for the foreseeable future. -Brett
Armin Rigo wrote:
Re-hi,
On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log.
I'm not sure I understand what you propose to fix holes like constructors and __subclasses__: it seems that you want to remove them altogether (and e.g. make factory functions instead). That would completely break all programs, right? I mean, there is no way such changes would go into mainstream CPython.
If I understand correctly, the proposal is that any incompatible changes to the language would apply only in "sandboxed" interpreters. So there is no reason why support for these couldn't go into the main branch. Of course we want to minimize the changes that will need to be made to programs and libraries to make them work in a sandboxed interpreter, but not at the expense of security. Some incompatible changes will be necessary. -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Hi David, hi Brett, On Sun, Jul 23, 2006 at 02:18:48AM +0100, David Hopwood wrote:
If I understand correctly, the proposal is that any incompatible changes to the language would apply only in "sandboxed" interpreters. So there is no reason why support for these couldn't go into the main branch.
That's what I originally thought too, but Brett writes: Implementation Details ======================== An important point to keep in mind when reading about the implementation details for the security model is that these are general changes and are not special to any type of interpreter, sandboxed or otherwise. That means if a change to a built-in type is suggested and it does not involve a proxy, that change is meant Python-wide for *all* interpreters. So that's why I'm starting to worry that Brett is proposing to change the regular Python language too. However, Brett, you also say somewhere else that backward compatibility is not an issue. So I'm a bit confused actually... Also, I hate to sound self-centered, but I should point out somewhere that PyPy was started by people who no longer wanted to maintain a fork of CPython, and preferred to work on building CPython-like variants automatically. Many of the security features you list would be quite easier to implement and maintain in PyPy than CPython -- also from a security perspective: it is easier to be sure that some protection is complete, and remains complete over time, if it is systematically generated instead of hand-patched in a dozen places. A bientot, Armin
On 7/23/06, Armin Rigo <arigo@tunes.org> wrote:
Hi David, hi Brett,
On Sun, Jul 23, 2006 at 02:18:48AM +0100, David Hopwood wrote:
If I understand correctly, the proposal is that any incompatible changes to the language would apply only in "sandboxed" interpreters. So there is no reason why support for these couldn't go into the main branch.
That's what I originally thought too, but Brett writes:
Implementation Details ========================
An important point to keep in mind when reading about the implementation details for the security model is that these are general changes and are not special to any type of interpreter, sandboxed or otherwise. That means if a change to a built-in type is suggested and it does not involve a proxy, that change is meant Python-wide for *all* interpreters.
So that's why I'm starting to worry that Brett is proposing to change the regular Python language too.
Yes, I am proposing changing some constructors and methods on some built-in types for the regular languages. That's it. No new keywords or major semantic changes and such. If I make changes just for sandboxed interpreters it changes the general approach of the security model by then requiring an identity check to see if the interpreter is sandboxed or not. However, Brett, you also say somewhere
else that backward compatibility is not an issue. So I'm a bit confused actually...
Since this is my Ph.D. dissertation first and foremost, I am not going to tie my hands in such a way that I have to make too much of a compromise in order for this to work. I obviously don't want to change the feel of Python, but if I have to remove the constructor for code objects to prevent evil bytecode or __subclasses__() from object to prevent poking around stuff, then so be it. For this project, security is trumpeting backwards-compatibility when the latter is impossible in order to have the former. I will obviously try to minimize it, but something that works at such a basic level of the language is just going to require some changes for it to work. Also, I hate to sound self-centered, but I should point out somewhere
that PyPy was started by people who no longer wanted to maintain a fork of CPython, and preferred to work on building CPython-like variants automatically. Many of the security features you list would be quite easier to implement and maintain in PyPy than CPython -- also from a security perspective: it is easier to be sure that some protection is complete, and remains complete over time, if it is systematically generated instead of hand-patched in a dozen places.
It doesn't sound self-centered. =) Problem is that my knowledge base is obviously all in CPython so my startup costs are much lower than if I tried this in PyPy. Plus there is the point of embedding this into Firefox (possibly) eventually. Does PyPy support embedding yet at the C level? -Brett
Brett Cannon wrote:
On 7/23/06, Armin Rigo <arigo@tunes.org> wrote:
Hi David, hi Brett,
On Sun, Jul 23, 2006 at 02:18:48AM +0100, David Hopwood wrote:
If I understand correctly, the proposal is that any incompatible changes to the language would apply only in "sandboxed" interpreters. So there is no reason why support for these couldn't go into the main branch.
That's what I originally thought too, but Brett writes:
Implementation Details ========================
An important point to keep in mind when reading about the implementation details for the security model is that these are general changes and are not special to any type of interpreter, sandboxed or otherwise. That means if a change to a built-in type is suggested and it does not involve a proxy, that change is meant Python-wide for *all* interpreters.
So that's why I'm starting to worry that Brett is proposing to change the regular Python language too.
Yes, I am proposing changing some constructors and methods on some built-in types for the regular languages. That's it. No new keywords or major semantic changes and such. If I make changes just for sandboxed interpreters it changes the general approach of the security model by then requiring an identity check to see if the interpreter is sandboxed or not.
I assume that the extent of incompatible changes would be limited as much as possible. So the only checks would be in operations that are directly affected by whatever incompatible changes are made. The performance and complexity costs of this are likely to be small -- or at least should not be assumed to be large before having hammered out a more detailed design. Suppose, for the sake of argument, that we introduced private methods and attributes. If an attribute in an existing standard library class was changed to be private, then code depending on it would break. But if there were a notion of a "compatibility private" attribute that acts as private only in a sandboxed interpreter, then no code running in an unprotected interpreter would break. -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Brett Cannon wrote:
On 7/23/06, *Armin Rigo* <arigo@tunes.org <mailto:arigo@tunes.org>> wrote: Also, I hate to sound self-centered, but I should point out somewhere that PyPy was started by people who no longer wanted to maintain a fork of CPython, and preferred to work on building CPython-like variants automatically. Many of the security features you list would be quite easier to implement and maintain in PyPy than CPython -- also from a security perspective: it is easier to be sure that some protection is complete, and remains complete over time, if it is systematically generated instead of hand-patched in a dozen places.
It doesn't sound self-centered. =) Problem is that my knowledge base is obviously all in CPython so my startup costs are much lower than if I tried this in PyPy. Plus there is the point of embedding this into Firefox (possibly) eventually. Does PyPy support embedding yet at the C level?
Another rationale for basing the work on CPython is that it should be possible to implement the resulting security model regardless of the implementation language used for the interpreter core (C/Python, Java/Python, C#/Python, RPython/Python). If you can figure out how to do it in C, it should be feasible to do it in the others. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Armin Rigo wrote:
I'm not sure I understand what you propose to fix holes like constructors and __subclasses__: it seems that you want to remove them altogether (and e.g. make factory functions instead). That would completely break all programs, right? I mean, there is no way such changes would go into mainstream CPython.
How much code is actually out there that uses __subclasses__? It seems like a fairly esoteric corner of the language to me. In any case, I think this approach should certainly be tried, and if it works out, considered for Py3k. -- Greg
Hi Brett, Here are some comments on your proposal. Sorry this took so long. I apologize if any of these comments are out of date (but also look forward to your answers to some of the questions, as they'll help me understand some more of the details of your proposal). Thanks!
Introduction /////////////////////////////////////// [...] Throughout this document several terms are going to be used. A "sandboxed interpreter" is one where the built-in namespace is not the same as that of an interpreter whose built-ins were unaltered, which is called an "unprotected interpreter".
Is this a definition or an implementation choice? As in, are you defining "sandboxed" to mean "with altered built-ins" or just "restricted in some way", and does the above mean to imply that altering the built-ins is what triggers other kinds of restrictions (as it did in Python's old restricted execution mode)?
A "bare interpreter" is one where the built-in namespace has been stripped down the bare minimum needed to run any form of basic Python program. This means that all atomic types (i.e., syntactically supported types), ``object``, and the exceptions provided by the ``exceptions`` module are considered in the built-in namespace. There have also been no imports executed in the interpreter.
Is a "bare interpreter" just one example of a sandboxed interpreter, or are all sandboxed interpreters in your design initially bare (i.e. "sandboxed" = "bare" + zero or more granted authorities)?
The "security domain" is the boundary at which security is cared about. For this dicussion, it is the interpreter.
It might be clearer to say (if i understand correctly) "Each interpreter is a separate security domain." Many interpreters can run within a single operating system process, right? Could you say a bit about what sort of concurrency model you have in mind? How would this interact (if at all) with use of the existing threading functionality?
The "powerbox" is the thing that possesses the ultimate power in the system. In our case it is the Python process.
This could also be the application process, right?
Rationale /////////////////////////////////////// [...] For instance, think of an application that supports a plug-in system with Python as the language used for writing plug-ins. You do not want to have to examine every plug-in you download to make sure that it does not alter your filesystem if you can help it. With a proper security model and implementation in place this hinderance of having to examine all code you execute should be alleviated.
I'm glad to have this use case set out early in the document, so the reader can keep it in mind as an example while reading about the model.
Approaches to Security ///////////////////////////////////////
There are essentially two types of security: who-I-am (permissions-based) security and what-I-have (authority-based) security.
As Mark Miller mentioned in another message, your descriptions of "who-I-am" security and "what-I-have" security make sense, but they don't correspond to "permission" vs. "authority". They correspond to "identity-based" vs. "authority-based" security.
Difficulties in Python for Object-Capabilities ////////////////////////////////////////////// [...] Three key requirements for providing a proper perimeter defence is private namespaces, immutable shared state across domains, and unforgeable references.
Nice summary.
Problem of No Private Namespace =============================== [...] The Python language has no such thing as a private namespace.
Don't local scopes count as private namespaces? It seems clear that they aren't designed with the intention of being exposed, unlike other namespaces in Python.
It also makes providing security at the object level using object-capabilities non-existent in pure Python code.
I don't think this is necessarily the case. No Python code i've ever seen expects to be able to invade the local scopes of other functions, so you could use them as private namespaces. There are two ways i've seen to invade local scopes: (a) Use gc.get_referents to get back from a cell object to its contents. (b) Compare the cell object to another cell object, thereby causing __eq__ to be invoked to compare the contents of the cells. So you could protect local scopes by prohibiting these or by simply turning off access to func_closure. It's clear that hardly any code depends on these introspection featuresl, so it would be reasonble to turn them off in a sandboxed interpreter. (It seems you would have to turn off some introspection features anyway in order to have reliable import guards.)
Problem of Mutable Shared State =============================== [...] Regardless, sharing of state that can be influenced by another interpreter is not safe for object-capabilities.
Yup.
Threat Model ///////////////////////////////////////
Good to see this specified here. I like the way you've broken this down.
* An interpreter cannot gain abilties the Python process possesses without explicitly being given those abilities.
It would be good to enumerate which abilities you're referring to in this item. For example, a bare interpreter should be able to allocate memory and call most of the built-in functions, but should not be able to open network connections.
* An interpreter cannot influence another interpreter directly at the Python level without explicitly allowing it.
You mean, without some other entity explicitly allowing it, right? What would that other entity be -- presumably the interpreter that spawned both of these sub-interpreters?
* An interpreter cannot use operating system resources without being explicitly given those resources.
Okay.
* A bare Python interpreter is always trusted.
What does "trusted" mean in the above?
* Python bytecode is always distrusted. * Pure Python source code is always safe on its own.
It would be helpful to clarify "safe" here. I assume by "safe" you mean that the Python source code can express whatever it wants, including potentially dangerous activities, but when run in a bare or sandboxed interpreter it cannot have harmful effects. But then in what sense does the "safety" have to do with the Python source code rather than the restrictions on the interpreter? Would it be correct to say: + We want to guarantee that Python source code cannot violate the restrictions in a restricted or bare interpreter. + We do not prevent arbitrary Python bytecode from violating these restrictions, and assume that it can.
+ Malicious abilities are derived from C extension modules, built-in modules, and unsafe types implemented in C, not from pure Python source.
By "malicious" do you just mean "anything that isn't accessible to a bare interpreter"?
* A sub-interpreter started by another interpreter does not inherit any state.
Do you envision a tree of interpreters and sub-interpreters? Can the levels of spawning get arbitrarily deep? If i am visualizing your model correctly, maybe it would be useful to introduce the term "parent", where each interpreter has as its parent either the Python process or another interpreter. Then you could say that each interpreter acquires authority only by explicit granting from its parent. Then i have another question: can an interpreter acquire authorities only when it is started, or can it acquire them while it is running, and how?
Implementation ///////////////////////////////////////
Guiding Principles ========================
To begin, the Python process garners all power as the powerbox. It is up to the process to initially hand out access to resources and abilities to interpreters. This might take the form of an interpreter with all abilities granted (i.e., a standard interpreter as launched when you execute Python), which then creates sub-interpreters with sandboxed abilities. Another alternative is only creating interpreters with sandboxed abilities (i.e., Python being embedded in an application that only uses sandboxed interpreters).
This sounds like part of your design to me. It might help to have this earlier in the document (maybe even with an example diagram of a tree of interpreters).
All security measures should never have to ask who an interpreter is. This means that what abilities an interpreter has should not be stored at the interpreter level when the security can use a proxy to protect a resource. This means that while supporting a memory cap can have a per-interpreter setting that is checked (because access to the operating system's memory allocator is not supported at the program level), protecting files and imports should not such a per-interpreter protection at such a low level (because those can have extension module proxies to provide the security).
It might be good to declare two categories of resources -- those protected by object hiding and those protected by a per-interpreter setting -- and make lists.
Backwards-compatibility will not be a hindrance upon the design or implementation of the security model. Because the security model will inherently remove resources and abilities that existing code expects, it is not reasonable to expect existing code to work in a sandboxed interpreter.
You might qualify the last statement a bit. For example, a Python implementation of a pure algorithm (e.g. string processing, data compression, etc.) would still work in a sandboxed interpreter.
Keeping Python "pythonic" is required for all design decisions.
As Lawrence Oluyede also mentioned, it would be helpful to say a little more about what "pythonic" means.
Restricting what is in the built-in namespace and the safe-guarding the interpreter (which includes safe-guarding the built-in types) is where security will come from.
Sounds good.
Abilities of a Standard Sandboxed Interpreter =============================================
[...]
* You cannot open any files directly. * Importation + You can import any pure Python module. + You cannot import any Python bytecode module. + You cannot import any C extension module. + You cannot import any built-in module. * You cannot find out any information about the operating system you are running on. * Only safe built-ins are provided.
This looks reasonable. This is probably a good place to itemize exactly which built-ins are considered safe.
Imports -------
A proxy for protecting imports will be provided. This is done by setting the ``__import__()`` function in the built-in namespace of the sandboxed interpreter to a proxied version of the function.
The planned proxy will take in a passed-in function to use for the import and a whitelist of C extension modules and built-in modules to allow importation of.
Presumably these are passed in to the proxy's constructor.
If an import would lead to loading an extension or built-in module, it is checked against the whitelist and allowed to be imported based on that list. All .pyc and .pyo file will not be imported. All .py files will be imported.
I'm unclear about this. Is the whitelist a list of module names only, or of filenames with extensions? Does the normal path-searching process take place or can it be restricted in some way? Would it simplify the security analysis to have the whitelist be a dictionary that maps module names to absolute pathnames? If both the .py and .pyc are present, the normal import would find the .pyc file; would the import proxy reject such an import or ignore it and recompile the .py instead?
It must be warned that importing any C extension module is dangerous.
Right.
Implementing Import in Python +++++++++++++++++++++++++++++
To help facilitate in the exposure of more of what importation requires (and thus make implementing a proxy easier), the import machinery should be rewritten in Python.
This seems like a good idea. Can you identify which minimum essential pieces of the import machinery have to be written in C?
Sanitizing Built-In Types ------------------------- [...] Constructors ++++++++++++
Almost all of Python's built-in types contain a constructor that allows code to create a new instance of a type as long as you have the type itself. Unfortunately this does not work in an object-capabilities system without either providing a proxy to the constructor or just turning it off.
The existence of the constructor isn't (by itself) the problem. The problem is that both of the following are true: (a) From any object you can get its type object. (b) Using any type object you can construct a new instance. So, you can control this either by hiding the type object, separating the constructor from the type, or disabling the constructor.
Types whose constructors are considered dangerous are:
* ``file`` + Will definitely use the ``open()`` built-in. * code objects * XXX sockets? * XXX type? * XXX
Looks good so far. Not sure i see what's dangerous about 'type'.
Filesystem Information ++++++++++++++++++++++
When running code in a sandboxed interpreter, POLA suggests that you do not want to expose information about your environment on top of protecting its use. This means that filesystem paths typically should not be exposed. Unfortunately, Python exposes file paths all over the place:
* Modules + ``__file__`` attribute * Code objects + ``co_filename`` attribute * Packages + ``__path__`` attribute * XXX
XXX how to expose safely?
It seems that in most cases, a single Python object is associated with a single pathname. If that's true in general, one solution would be to provide an introspection function named 'getpath' or something similar that would get the path associated with any object. This function might go in a module containing all the introspection functions, so imports of that module could be easily restricted.
Mutable Shared State ++++++++++++++++++++
Because built-in types are shared between interpreters, they cannot expose any mutable shared state. Unfortunately, as it stands, some do. Below is a list of types that share some form of dangerous state, how they share it, and how to fix the problem:
* ``object`` + ``__subclasses__()`` function - Remove the function; never seen used in real-world code. * XXX
Okay, more to work out here. :)
Perimeter Defences Between a Created Interpreter and Its Creator ----------------------------------------------------------------
The plan is to allow interpreters to instantiate sandboxed interpreters safely. By using the creating interpreter's abilities to provide abilities to the created interpreter, you make sure there is no escalation in abilities.
Good.
* ``__del__`` created in sandboxed interpreter but object is cleaned up in unprotected interpreter.
How do you envision the launching of a sandboxed interpreter to look? Could you sketch out some rough code examples? Were you thinking of something like: sys.spawn(code, dict) code: a string containing Python source code dict: the global namespace in which to run the code If you allow the parent interpreter to pass mutable objects into the child interpreter, then the parent and child can already communicate via the object, so '__del__' is a moot issue. Do you want to prevent all communication between parent and child? It's not obvious to me why that would be necessary.
* Using frames to walk the frame stack back to another interpreter.
Could you just disable introspection of the frame stack?
Making the ``sys`` Module Safe ------------------------------ [...] This means that the ``sys`` module needs to have its safe information separated out from the unsafe settings.
Yes.
XXX separate modules, ``sys.settings`` and ``sys.info``, or strip ``sys`` to settings and put info somewhere else? Or provide a method that will create a faked sys module that has the safe values copied into it?
I think the last suggestion above would lead to confusion. The two groups should have two distinct names and it should be clear which attribute goes with which group.
Protecting I/O ++++++++++++++
The ``print`` keyword and the built-ins ``raw_input()`` and ``input()`` use the values stored in ``sys.stdout`` and ``sys.stdin``. By exposing these attributes to the creating interpreter, one can set them to safe objects, such as instances of ``StringIO``.
Sounds good.
Safe Networking ---------------
XXX proxy on socket module, modify open() to be the constructor, etc.
Lots more to think about here. :)
Protecting Memory Usage -----------------------
To protect memory, low-level hooks into the memory allocator for Python is needed. By hooking into the C API for memory allocation and deallocation a very rough running count of used memory can kept. This can be used to prevent sandboxed interpreters from using so much memory that it impacts the overall performance of the system.
Preventing denial-of-service is in general quite difficult, but i applaud the attempt. I agree with your decision to separate this work from the rest of the security model. -- ?!ng
On 9/6/06, Ka-Ping Yee <python-dev@zesty.ca> wrote:
Hi Brett,
Here are some comments on your proposal. Sorry this took so long. I apologize if any of these comments are out of date (but also look forward to your answers to some of the questions, as they'll help me understand some more of the details of your proposal). Thanks!
I think they are slightly outdated. The latest version of the doc is in the bcannon-objcap branch and is named securing_python.txt ( http://svn.python.org/view/python/branches/bcannon-objcap/securing_python.tx... ).
Introduction
/////////////////////////////////////// [...] Throughout this document several terms are going to be used. A "sandboxed interpreter" is one where the built-in namespace is not the same as that of an interpreter whose built-ins were unaltered, which is called an "unprotected interpreter".
Is this a definition or an implementation choice? As in, are you defining "sandboxed" to mean "with altered built-ins" or just "restricted in some way", and does the above mean to imply that altering the built-ins is what triggers other kinds of restrictions (as it did in Python's old restricted execution mode)?
There is no "triggering" of other restrictions. This is an implementation choice. "Sandboxed" means "with altered built-ins".
A "bare interpreter" is one where the built-in namespace has been
stripped down the bare minimum needed to run any form of basic Python program. This means that all atomic types (i.e., syntactically supported types), ``object``, and the exceptions provided by the ``exceptions`` module are considered in the built-in namespace. There have also been no imports executed in the interpreter.
Is a "bare interpreter" just one example of a sandboxed interpreter, or are all sandboxed interpreters in your design initially bare (i.e. "sandboxed" = "bare" + zero or more granted authorities)?
You build up from a bare interpreter by adding in authorities (e.g., providing a wrapped version of open()) to reach the level of security you want.
The "security domain" is the boundary at which security is cared
about. For this dicussion, it is the interpreter.
It might be clearer to say (if i understand correctly) "Each interpreter is a separate security domain."
Many interpreters can run within a single operating system process, right?
Yes. Could you say a bit about what sort of concurrency model you
have in mind?
None specifically. Each new interpreter automatically runs in its own Python thread, so they have essentially the same concurrency as using the 'thread' module. How would this interact (if at all) with use of the
existing threading functionality?
See above.
The "powerbox" is the thing that possesses the ultimate power in the
system. In our case it is the Python process.
This could also be the application process, right?
If Python is embedded, yes.
Rationale
/////////////////////////////////////// [...] For instance, think of an application that supports a plug-in system with Python as the language used for writing plug-ins. You do not want to have to examine every plug-in you download to make sure that it does not alter your filesystem if you can help it. With a proper security model and implementation in place this hinderance of having to examine all code you execute should be alleviated.
I'm glad to have this use case set out early in the document, so the reader can keep it in mind as an example while reading about the model.
Approaches to Security ///////////////////////////////////////
There are essentially two types of security: who-I-am (permissions-based) security and what-I-have (authority-based) security.
As Mark Miller mentioned in another message, your descriptions of "who-I-am" security and "what-I-have" security make sense, but they don't correspond to "permission" vs. "authority". They correspond to "identity-based" vs. "authority-based" security.
Right. This was fixed the day Mark and Alan Karp made the comment.
Difficulties in Python for Object-Capabilities
////////////////////////////////////////////// [...] Three key requirements for providing a proper perimeter defence is private namespaces, immutable shared state across domains, and unforgeable references.
Nice summary.
Problem of No Private Namespace =============================== [...] The Python language has no such thing as a private namespace.
Don't local scopes count as private namespaces? It seems clear that they aren't designed with the intention of being exposed, unlike other namespaces in Python.
Sort of. But you can still get access to them if you have an execution frame and they are not persistent. Generators are are worse since they store their execution frame with the generator itself, completely exposing the local namespace.
It also makes providing security at the object level using
object-capabilities non-existent in pure Python code.
I don't think this is necessarily the case. No Python code i've
ever seen expects to be able to invade the local scopes of other functions, so you could use them as private namespaces. There are two ways i've seen to invade local scopes:
(a) Use gc.get_referents to get back from a cell object to its contents.
(b) Compare the cell object to another cell object, thereby causing __eq__ to be invoked to compare the contents of the cells.
Or the execution frame which is exposed directly on generators. But regardless, the comment was meant to apply to Python as it stands, not that it couldn't be possibly tweaked somehow. So you could protect local scopes by prohibiting these or by
simply turning off access to func_closure. It's clear that hardly any code depends on these introspection featuresl, so it would be reasonble to turn them off in a sandboxed interpreter. (It seems you would have to turn off some introspection features anyway in order to have reliable import guards.)
Maybe this can be changed in the future, but this more than I need at the moment so I am not going to go down that path right now. But I added a quick mention of this.
Problem of Mutable Shared State
=============================== [...] Regardless, sharing of state that can be influenced by another interpreter is not safe for object-capabilities.
Yup.
Threat Model ///////////////////////////////////////
Good to see this specified here. I like the way you've broken this down.
The current version has more details per point than the one you read.
* An interpreter cannot gain abilties the Python process possesses
without explicitly being given those abilities.
It would be good to enumerate which abilities you're referring to in this item. For example, a bare interpreter should be able to allocate memory and call most of the built-in functions, but should not be able to open network connections.
* An interpreter cannot influence another interpreter directly at the Python level without explicitly allowing it.
You mean, without some other entity explicitly allowing it, right?
Yep. What would that other entity be -- presumably the interpreter that
spawned both of these sub-interpreters?
Sure. You could stick something in the built-in namespace of the sub-interpreter to use for communicating.
* An interpreter cannot use operating system resources without being
explicitly given those resources.
Okay.
* A bare Python interpreter is always trusted.
What does "trusted" mean in the above?
It means that if Python source code can execute within a bare interpreter it is considered safe code. This is covered in the new version of the doc.
* Python bytecode is always distrusted.
* Pure Python source code is always safe on its own.
It would be helpful to clarify "safe" here. I assume by "safe" you mean that the Python source code can express whatever it wants, including potentially dangerous activities, but when run in a bare or sandboxed interpreter it cannot have harmful effects. But then in what sense does the "safety" have to do with the Python source code rather than the restrictions on the interpreter?
Would it be correct to say: + We want to guarantee that Python source code cannot violate the restrictions in a restricted or bare interpreter. + We do not prevent arbitrary Python bytecode from violating these restrictions, and assume that it can.
+ Malicious abilities are derived from C extension modules,
built-in modules, and unsafe types implemented in C, not from pure Python source.
By "malicious" do you just mean "anything that isn't accessible to a bare interpreter"?
Anything that could harm the system or interpreter.
* A sub-interpreter started by another interpreter does not inherit
any state.
Do you envision a tree of interpreters and sub-interpreters? Can the levels of spawning get arbitrarily deep?
Yes and yes. If i am visualizing your model correctly, maybe it would be useful to
introduce the term "parent", where each interpreter has as its parent either the Python process or another interpreter. Then you could say that each interpreter acquires authority only by explicit granting from its parent.
You could, although there is not hierarchy at the implementation level. But it works in terms of who has a reference to whom and who gives each interpreter their authority. Then i have another question: can an interpreter acquire
authorities only when it is started, or can it acquire them while it is running, and how?
Well, whatever you want to do through the built-in namespace. So if you pass in a mutable object like a dict and add stuff to it on the fly, I don't see why you couldn't give new authorities on the fly.
Implementation
///////////////////////////////////////
Guiding Principles ========================
To begin, the Python process garners all power as the powerbox. It is up to the process to initially hand out access to resources and abilities to interpreters. This might take the form of an interpreter with all abilities granted (i.e., a standard interpreter as launched when you execute Python), which then creates sub-interpreters with sandboxed abilities. Another alternative is only creating interpreters with sandboxed abilities (i.e., Python being embedded in an application that only uses sandboxed interpreters).
This sounds like part of your design to me. It might help to have this earlier in the document (maybe even with an example diagram of a tree of interpreters).
Made Guiding Principles its own section and split off the bottom part of the section and put it under Implementation.
All security measures should never have to ask who an interpreter is.
This means that what abilities an interpreter has should not be stored at the interpreter level when the security can use a proxy to protect a resource. This means that while supporting a memory cap can have a per-interpreter setting that is checked (because access to the operating system's memory allocator is not supported at the program level), protecting files and imports should not such a per-interpreter protection at such a low level (because those can have extension module proxies to provide the security).
It might be good to declare two categories of resources -- those protected by object hiding and those protected by a per-interpreter setting -- and make lists.
That is rather unknown since I am constantly finding stuff that is global to the process compared to the interpreter, so making the list seems premature.
Backwards-compatibility will not be a hindrance upon the design or
implementation of the security model. Because the security model will inherently remove resources and abilities that existing code expects, it is not reasonable to expect existing code to work in a sandboxed interpreter.
You might qualify the last statement a bit. For example, a Python implementation of a pure algorithm (e.g. string processing, data compression, etc.) would still work in a sandboxed interpreter.
I tossed in "all" to clarify.
Keeping Python "pythonic" is required for all design decisions.
As Lawrence Oluyede also mentioned, it would be helpful to say a little more about what "pythonic" means.
Done in the current version.
Restricting what is in the built-in namespace and the safe-guarding
the interpreter (which includes safe-guarding the built-in types) is where security will come from.
Sounds good.
Abilities of a Standard Sandboxed Interpreter =============================================
[...]
* You cannot open any files directly. * Importation + You can import any pure Python module. + You cannot import any Python bytecode module. + You cannot import any C extension module. + You cannot import any built-in module. * You cannot find out any information about the operating system you are running on. * Only safe built-ins are provided.
This looks reasonable. This is probably a good place to itemize exactly which built-ins are considered safe.
Imports -------
A proxy for protecting imports will be provided. This is done by setting the ``__import__()`` function in the built-in namespace of the sandboxed interpreter to a proxied version of the function.
The planned proxy will take in a passed-in function to use for the import and a whitelist of C extension modules and built-in modules to allow importation of.
Presumably these are passed in to the proxy's constructor.
Current plan is to expose the built-in namespace, imported modules, and sys module dict when creating an Interpreter instance.
If an import would lead to loading an extension
or built-in module, it is checked against the whitelist and allowed to be imported based on that list. All .pyc and .pyo file will not be imported. All .py files will be imported.
I'm unclear about this. Is the whitelist a list of module names only, or of filenames with extensions?
Have not deciced, but probably module name. Does the normal path-searching process
take place or can it be restricted in some way?
Have not decided. Would it simplify the
security analysis to have the whitelist be a dictionary that maps module names to absolute pathnames?
Don't know. Protecting imports is the last thing I am going to implement since it is the trickiest. If both the .py and .pyc are present, the normal import would find the
.pyc file; would the import proxy reject such an import or ignore it and recompile the .py instead?
Somethign along those lines.
It must be warned that importing any C extension module is dangerous.
Right.
Implementing Import in Python +++++++++++++++++++++++++++++
To help facilitate in the exposure of more of what importation requires (and thus make implementing a proxy easier), the import machinery should be rewritten in Python.
This seems like a good idea. Can you identify which minimum essential pieces of the import machinery have to be written in C?
Loading of C extensions, stating files, reading files, etc. Pretty much that requires help from the OS.
Sanitizing Built-In Types
------------------------- [...] Constructors ++++++++++++
Almost all of Python's built-in types contain a constructor that allows code to create a new instance of a type as long as you have the type itself. Unfortunately this does not work in an object-capabilities system without either providing a proxy to the constructor or just turning it off.
The existence of the constructor isn't (by itself) the problem. The problem is that both of the following are true:
(a) From any object you can get its type object. (b) Using any type object you can construct a new instance.
So, you can control this either by hiding the type object, separating the constructor from the type, or disabling the constructor.
I separated the constructor or initializer (tp_new or tp_init) into a factory function.
Types whose constructors are considered dangerous are:
* ``file`` + Will definitely use the ``open()`` built-in. * code objects * XXX sockets? * XXX type? * XXX
Looks good so far. Not sure i see what's dangerous about 'type'.
That's why it has the question mark. =)
Filesystem Information
++++++++++++++++++++++
When running code in a sandboxed interpreter, POLA suggests that you do not want to expose information about your environment on top of protecting its use. This means that filesystem paths typically should not be exposed. Unfortunately, Python exposes file paths all over the place:
* Modules + ``__file__`` attribute * Code objects + ``co_filename`` attribute * Packages + ``__path__`` attribute * XXX
XXX how to expose safely?
It seems that in most cases, a single Python object is associated with a single pathname. If that's true in general, one solution would be to provide an introspection function named 'getpath' or something similar that would get the path associated with any object. This function might go in a module containing all the introspection functions, so imports of that module could be easily restricted.
That is the current thinking.
Mutable Shared State
++++++++++++++++++++
Because built-in types are shared between interpreters, they cannot expose any mutable shared state. Unfortunately, as it stands, some do. Below is a list of types that share some form of dangerous state, how they share it, and how to fix the problem:
* ``object`` + ``__subclasses__()`` function - Remove the function; never seen used in real-world code. * XXX
Okay, more to work out here. :)
Possibly. I might have to wait until I am much closer to being done to discover more places where mutable shared state is exposed in a bare interpreter because I have not been able to think of anymore.
Perimeter Defences Between a Created Interpreter and Its Creator
----------------------------------------------------------------
The plan is to allow interpreters to instantiate sandboxed interpreters safely. By using the creating interpreter's abilities to provide abilities to the created interpreter, you make sure there is no escalation in abilities.
Good.
* ``__del__`` created in sandboxed interpreter but object is cleaned up in unprotected interpreter.
How do you envision the launching of a sandboxed interpreter to look? Could you sketch out some rough code examples?
interp = interpreter.Interpreter() interp.builtins['open'] = wrapped_open() interp.sys_dict['path'] = [] interp.exec("2 + 3")
Were you thinking of
something like:
sys.spawn(code, dict) code: a string containing Python source code dict: the global namespace in which to run the code
If you allow the parent interpreter to pass mutable objects into the child interpreter, then the parent and child can already communicate via the object, so '__del__' is a moot issue. Do you want to prevent all communication between parent and child? It's not obvious to me why that would be necessary.
No, I don't since there should be a secure way to allow that. The __del__ worry came up from Guido pointing out you might be able to screw with it. But if you pass in something implemented in C you should be okay.
* Using frames to walk the frame stack back to another interpreter.
Could you just disable introspection of the frame stack?
If you don't allow importing of 'sys' then yes, and that is planned. I just wanted to make sure I didn't forget this needs to be protected. I do need to check what a generator's frame exposes, though.
Making the ``sys`` Module Safe
------------------------------ [...] This means that the ``sys`` module needs to have its safe information separated out from the unsafe settings.
Yes.
XXX separate modules, ``sys.settings`` and ``sys.info``, or strip ``sys`` to settings and put info somewhere else? Or provide a method that will create a faked sys module that has the safe values copied into it?
I think the last suggestion above would lead to confusion. The two groups should have two distinct names and it should be clear which attribute goes with which group.
This is also more complicated by the fact that some things are for the entire process while others are per interpreter. Might have to separate things out even more.
Protecting I/O
++++++++++++++
The ``print`` keyword and the built-ins ``raw_input()`` and ``input()`` use the values stored in ``sys.stdout`` and ``sys.stdin``. By exposing these attributes to the creating interpreter, one can set them to safe objects, such as instances of ``StringIO``.
Sounds good.
Safe Networking ---------------
XXX proxy on socket module, modify open() to be the constructor, etc.
Lots more to think about here. :)
Oh yeah. =)
Protecting Memory Usage
-----------------------
To protect memory, low-level hooks into the memory allocator for Python is needed. By hooking into the C API for memory allocation and deallocation a very rough running count of used memory can kept. This can be used to prevent sandboxed interpreters from using so much memory that it impacts the overall performance of the system.
Preventing denial-of-service is in general quite difficult, but i applaud the attempt. I agree with your decision to separate this
The memory tracking has a proof-of-concept done in the bcannon-sandboxing branch. Not perfect, but it does show how one could go about accounting for every byte of data in terms of what it is basically used for. -Brett
participants (10)
-
Armin Rigo
-
Brett Cannon
-
David Hopwood
-
Giovanni Bajo
-
Greg Ewing
-
Ka-Ping Yee
-
Lawrence Oluyede
-
Michael Foord
-
Nick Coghlan
-
Terry Reedy