new security doc using object-capabilities

After various people suggesting object-capabilities, takling with Mark S. Miller of the E programming language, and the people Mark works with at HP Labs (who have been giving talks every week during this month here at Google on object-capabilities), I have decided to go with object-capabilities for securing interpreters. I have rewritten my design doc from scratch and deleted the old one. The new doc is named securing_python.txt and can be found through the svn web interface at http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log. I have pretty much ignored any concrete API and such and gone more with a conceptual doc to make sure the API does not get in the way of the core security model. Using object-capabilities should make the implementation much cleaner. There is much less work directly on the interpreter and more of it gets pushed up to extension modules. I also have the okay of my supervisor to use this approach in my dissertation so this will get done. Two things do fall out of all of this which will make development much more modular and easier. First, the memory cap work just becomes a special build on its own; no need to tie into the security work. So I will be cleaning up the bcannon-sandboxing branch code as it stands, and then either create a separate branch for the object-capabilities work, or create another branch for the memory cap stuff and shift the changes over there. I will most likely do the former so as to not lose the history on the checkins. I also plan to rewrite the import machinery in pure Python. This will make the code much more maintainable and make creating proxies for the import machinery much easier. I will be doing that in a directory in the sandbox initially since it needs to work from what Python has now (and possibly some new extension module code) before it can be integrated into the interpreter directly. Anyone who wants to help with that can. I already have some perliminary notes on the whole thing and I think it will be reasonably doable. Anyway, there you go. Here is to hoping I have thought this all through properly. =) -Brett

Brett Cannon wrote:
This may not be relevant or possible, in which case I apologise, but the .NET model of creating application domains is extremely useful. It allows you to assign domains and run code within those domains. This means, for example, you can create a plugin system and run the plugins in a secure domain. I realise that this was the intent of the original rexec module, and your proposed new design (which is very exciting) overcomes the difficulties in that approach. The only approach using the new system would be interprocess communication (?) with a trusted interpreter communicating with an un-trusted one. Would the communication layer need to be implemented as a C extension, or will a standard Python API be possible ? Hmmm.... maybe I should read your doc. :-) Michael Foord http://www.voidspace.org.uk/python/index.shtml

Michael Foord wrote:
Ok, started to read the doc - and realise it specifically addresses these issues. My apologies :-) Michael http://www.voidspace.org.uk/python/index.shtml

That's great. I just read your draft but I have little comments to do but before let me say that I liked the idea to borrow concepts from E. I've crossed the E's path in the beginning of this year and I found it a pot of really nice ideas (for promises and capabilities). Here are my comments about the draft: - it's not really clear to me what the "powerbox" is. I think I got the concept of "super process" but maybe it's to be clarified, isn't it? It become clear in the "threat model" paragraph - I hope no Rubystas will read the "Problem of No Private Namespace" section because they have private/protected keywords to enforce this stuff :-) Writing proxies in C will slow down the dev process (altough will speed up the performance maybe) but in a far future someone will come up with an alternative closer to the Python level - Can you write down a simple example of what you mean with "changing something of the built-in objects"? (in "Problem of mutable shared state") - What about the performance issues of the capabilities model overall? - I know what you meant to say but the paragraph about pythonicness and the security model seems a little "fuzzy" to me. Which are the boundaries of the allowed changes for the security stuff? - You don't say anything about networking and networked resources in the list of the standard sandboxed interpreter - Suppose we have a .py module. Based on your security model we can import it, right? When imported it generates a .pyc file. The second time we import it what happens? .pyc is ignored? import is not allowed at all? We can't rely on the name of the file.pyc because an attacker who knows the file.py is secure and the second import is done against file.pyc can replace the "secure" file.pyc with an implementation not secure and can do some kind of harm to the sandbox - About "Filesystem information". Does the sandboxed interpreter need to know all that information about file paths, files and so on? Can't we reset those attributes to something arbitrary? - About sys module: I think the best way is to have a purged fake sys module with only the stuff you need. pypy has the concept of faked modules too (altough for a different reason) - About networking: what do you think about the E's model of really safe networking, protected remotable objects and safe RPC? Is that model applicable to Python's in some way? We can't use the E's model as a whole (ask people to generate a safe key and send it by email is unfeasible) - is the protected memory model a some kind of memory monitor system? I think that's all for the draft. I wrote these comments during the reading of the document. Hope some of these help -- Lawrence http://www.oluyede.org/blog

On 7/20/06, Lawrence Oluyede <l.oluyede@gmail.com> wrote:
The powerbox is the thing that gives your security domains their initial abilities. The OS gives the process its abilities, but it does not directly work with the interpreter. Since the process does, though, it is considered the powerbox and farms out abilities that it has been given by the OS. I have tried to clarify the definition at the start of the doc. - I hope no Rubystas will read the "Problem of No Private Namespace"
Maybe. As I said in the doc, any changes must be Pythonic and adding private namespaces right now wouldn't be without much more thought and work. And if Ruby ends up with this security model but more thoroughly, more power to them. Their language is different in the right ways to support it. As for coding in C, thems the breaks. I plan in adding stuff to the stdlib for the common case. I might eventually think of a good, generic proxy object that could be used, but as of right now I am not worrying about that since it would be icing on the cake. - Can you write down a simple example of what you mean with "changing
something of the built-in objects"? (in "Problem of mutable shared state")
Done. - What about the performance issues of the capabilities model overall? Should be faster than an IBAC model since certain calls will not need to check the identity of the caller every time. But I am not worrying about performance, I am worrying about correctness, so I did not try to make any performance claims. - I know what you meant to say but the paragraph about pythonicness
and the security model seems a little "fuzzy" to me. Which are the boundaries of the allowed changes for the security stuff?
Being "pythonic" is a fuzzy term in itself and Guido is the only person who can make definitive claims over what is and is not Pythonic. As I have said, this doc was mostly written with python-dev in mind since they are the ones I have to convince to let this into the core and they all know the term. But I have tacked in a sentence on what the term means. - You don't say anything about networking and networked resources in
the list of the standard sandboxed interpreter
Nope. Have not started worrying about that yet. Just trying to get the basic model laid out. - Suppose we have a .py module. Based on your security model we can
It will be ignored. But I am hoping that through rewriting the import machinery more control over generating .pyc files can be had (see Skip Montanaro's PEP on this; forget the number). This is why exact details were left out of the implementation details. I just wanted people understand the approach to everything, not the concrete details of how it will be coded up. - About "Filesystem information". Does the sandboxed interpreter need
to know all that information about file paths, files and so on? Can't we reset those attributes to something arbitrary?
That is the point. It is not that the sandbox needs to know it, its that it needs to be hidden from the sandbox. - About sys module: I think the best way is to have a purged fake sys
module with only the stuff you need. pypy has the concept of faked modules too (altough for a different reason)
OK. - About networking: what do you think about the E's model of really
I have not looked at it. I am also not trying to build an RPC system *and* a security model for Python. That is just too much work right now. - is the protected memory model a some kind of memory monitor system? Basically. It just keeps a size_t on the memory cap and another on memory usage, and when memory is requested it makes sure that it won't go over the cap. And when memory is freed the usage goes down. It's very rough (hard to account for padding bits, etc. in C structs), but it should be good enough to prevent a program from hitting 800 MB when you really just wanted it to have 5 MB. I think that's all for the draft. I wrote these comments during the
reading of the document.
Hope some of these help
Thanks, Lawrence. -Brett

Got that.
Nope. Have not started worrying about that yet. Just trying to get the basic model laid out.
Ok sorry to have bothered
That is the point. It is not that the sandbox needs to know it, its that it needs to be hidden from the sandbox.
So I think that's a "simple" step during the importing step.
I have not looked at it. I am also not trying to build an RPC system *and* a security model for Python. That is just too much work right now.
Ok sorry :-)
Thanks, Lawrence.
Thank you! -- Lawrence http://www.oluyede.org/blog

"Lawrence Oluyede" <l.oluyede@gmail.com> wrote in message news:9eebf5740607200117r4d4613e2i91665ea211bab46@mail.gmail.com...
- I know what you meant to say but the paragraph about pythonicness and the security model seems a little "fuzzy" to me.
I agree that this paragraph is weak and recommend that it be rewritten. In particular, I think the 'pythonic*' words should go, especially if you expect this document to be read by anyone other than dedicated pythonistas. I would start with something like "It is my goal that my thesis work be incorporated in some future version of the Python distribution. This has two constraints. First, changes to the core must not slow down normal operation. Second, visible changes must not violate the spirit and style of Python that make it a distinctive language." This alludes to the fact that your proposal discusses two highly overlapping yet separate projects: write a thesis that gains you a PhD degree; and produce an accepted patch set that give Python a useful security capability it does not now have. They have to be thought of as somewhat separate because you have two sets of 'overseers' and approvers: your thesis advisor and committee for the first; and Guido and other Python developers for the second. I think your thesis should currently be your first priority. Your current paragraph implied to me that you would not follow a promising line of research if you could not see how to make it 'pythonic'. If I were on your thesis committee, I think that would bother me ;-). In any case, I wish you the best with a double project that is obviously not a 'gimme'. Terry Jan Reedy (PhD, though not on any thesis committees)

Brett Cannon wrote:
How do you plan to handle CPU-hogs? Stuff like execution of a gigantic integer multiplication. This recipe for safe_eval: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746 which is otherwise very cute, does not handle this case as well: it tries to catch and interrupt long-running operations through a secondary thread, but fails on a single long operation because the GIL is not released and the alarm thread does not get its chance to run. -- Giovanni Bajo

Brett Cannon wrote:
I think the trick used by the safe_eval recipe (a separate thread which interrupts the script through thread.interrupt_main()) shows that, in most cases, it's possible to make sure that an embedded script does not take too long to execute. Do you agree that this usage case ("allow me to timeout an embedded script") is something which would be a very good start in the right direction? Now, I wonder, in a restricted execution environment such as that depicted in your document, how many different ways are there to make the Python interpreter enter a long calcolation loop which does not release the GIL? I can think of bignum*bignum, bignum**bignum or similar mathematical operations, but there are really a few. If we could make those release the GIL (or poll some kind of watchdog used to abort them, pretty much like they normally poll CTRL+C), then the same trick used by the recipe could be used. -- Giovanni Bajo

On 7/20/06, Giovanni Bajo <rasky@develer.com> wrote:
Probably. I just don't feel like worrying about it right now. =) Now, I wonder, in a restricted execution environment such as that depicted
Well, any work that does most of its calculation within C code and that does not touch base with the interpreter on a semi-regular basis would need to relesae the GIL. -Brett

For code objects, their construction is already commonly written as "compile(source)". For type objects, the constructor doesn't let you do anything you can't already do with a class statement. It doesn't need securing. For rewriting import.c in Python, the PEP 302 compliant import system API in pkgutil would be a good starting point. Your doc also asks about the imp.get_suffixes() list, and wonder where to set it from Python. As far as I am aware, you can't. get_suffixes() is built from _PyImport_FileTab, which is a C array. A switch statement is used to get from the file table entries to the appropriate handler functions. Quoting from the suggestions I put to the Py3k list: Use smarter data structures --------------------------- Currently, the individual handlers to load a fully identified module are exposed to Python code in a way that reflects the C-style data structures used in the current implementation. Simply switching to more powerful data structures for the file type handlers (i.e. use a PyTuple for filedescr values, a PyList for _PyImport_FileTab, and a PyDict instead of a switch statement to go from filedescr values to module loading/initialisation functions) and manipulating them all as normal Python objects could make the code in import.c much easier to follow. Extensible file type handling ----------------------------- If the file type handlers are stored in normal Python data structures as described above, it becomes feasible to make the import system extensible to different file types as well as to different file locations. This could be handled on a per-package basis, e.g. via a __file_types__ special attribute in packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On 7/20/06, Nick Coghlan <ncoghlan@gmail.com> wrote:
For code objects, their construction is already commonly written as "compile(source)".
Right, but some people like to construct directly from bytecode. For type objects, the constructor doesn't let you do anything you can't
already do with a class statement. It doesn't need securing.
I figured as much, but when I was making the list I was not sure and didn't want to stop my writing momentum to check. For rewriting import.c in Python, the PEP 302 compliant import system API in
pkgutil would be a good starting point.
Yep. Plan on looking at all of the various modules in the stdlib that assist with importing, package PEP (I think there is one), and PEP 302. Your doc also asks about the imp.get_suffixes() list, and wonder where to
Ah, OK. Quoting from the suggestions I put to the Py3k list:
Yep. I just kind of glanced at the rest of your suggestions, Nick, since I assumed a lot of it would change (or could be changed) if import was redone in as much Python as possible. Extensible file type handling
Yep. Although I am more interested in restricting than broadening the file types. This could be handled on a per-package basis, e.g. via a __file_types__
special attribute in packages.
Maybe. I don't want to get into introducing new abilities to start, though. -Brett

Brett Cannon wrote:
Either way you'd be mutating the list of recognised file types :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Hi Brett, On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
I also plan to rewrite the import machinery in pure Python.
http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/importing.py A bientot, Armin

On 7/22/06, Armin Rigo <arigo@tunes.org> wrote:
Thanks for the link, Armin. Since you guys don't have the import restrictions the CPython version would have and just have different coding needs for RPython obviously I can't just do a blind copy. But I will definitely take a look as I develop. Maybe you guys can even help to lower the duplication if it makes sense for you. BTW, do you guys happen to have extra tests from import? -Brett

Hi Brett, On Sat, Jul 22, 2006 at 10:33:19AM -0700, Brett Cannon wrote:
Yes, it should be possible to abstract the common logic in some way, using some kind of interface for all OS inspection and 'sys.modules' manipulations.
BTW, do you guys happen to have extra tests from import?
Yes, there is http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/test/test_import.... which will also need a bit of rewriting, but that should be straightforward. A bientot, Armin

Re-hi, On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log.
I'm not sure I understand what you propose to fix holes like constructors and __subclasses__: it seems that you want to remove them altogether (and e.g. make factory functions instead). That would completely break all programs, right? I mean, there is no way such changes would go into mainstream CPython. Or do you propose to maintain a CPython branch manually for the foreseeable future? (From experience this is a bad idea...) A bientot, Armin

On 7/22/06, Armin Rigo <arigo@tunes.org> wrote:
Not altogether, just constructors on select types who are considered dangerous from a security standpoint. The breakage won't be horrible, but it will be there for advanced Python code. I will try to make the wording more clear when I get back to work on Tuesday.
I mean, there is no way such changes would go into mainstream CPython.
If this has to wait until Py3k then so be it.
Yeah, not my idea of fun either, but since this is a long term project, I will at least need to for the foreseeable future. -Brett

Armin Rigo wrote:
If I understand correctly, the proposal is that any incompatible changes to the language would apply only in "sandboxed" interpreters. So there is no reason why support for these couldn't go into the main branch. Of course we want to minimize the changes that will need to be made to programs and libraries to make them work in a sandboxed interpreter, but not at the expense of security. Some incompatible changes will be necessary. -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

Hi David, hi Brett, On Sun, Jul 23, 2006 at 02:18:48AM +0100, David Hopwood wrote:
That's what I originally thought too, but Brett writes: Implementation Details ======================== An important point to keep in mind when reading about the implementation details for the security model is that these are general changes and are not special to any type of interpreter, sandboxed or otherwise. That means if a change to a built-in type is suggested and it does not involve a proxy, that change is meant Python-wide for *all* interpreters. So that's why I'm starting to worry that Brett is proposing to change the regular Python language too. However, Brett, you also say somewhere else that backward compatibility is not an issue. So I'm a bit confused actually... Also, I hate to sound self-centered, but I should point out somewhere that PyPy was started by people who no longer wanted to maintain a fork of CPython, and preferred to work on building CPython-like variants automatically. Many of the security features you list would be quite easier to implement and maintain in PyPy than CPython -- also from a security perspective: it is easier to be sure that some protection is complete, and remains complete over time, if it is systematically generated instead of hand-patched in a dozen places. A bientot, Armin

On 7/23/06, Armin Rigo <arigo@tunes.org> wrote:
Yes, I am proposing changing some constructors and methods on some built-in types for the regular languages. That's it. No new keywords or major semantic changes and such. If I make changes just for sandboxed interpreters it changes the general approach of the security model by then requiring an identity check to see if the interpreter is sandboxed or not. However, Brett, you also say somewhere
else that backward compatibility is not an issue. So I'm a bit confused actually...
Since this is my Ph.D. dissertation first and foremost, I am not going to tie my hands in such a way that I have to make too much of a compromise in order for this to work. I obviously don't want to change the feel of Python, but if I have to remove the constructor for code objects to prevent evil bytecode or __subclasses__() from object to prevent poking around stuff, then so be it. For this project, security is trumpeting backwards-compatibility when the latter is impossible in order to have the former. I will obviously try to minimize it, but something that works at such a basic level of the language is just going to require some changes for it to work. Also, I hate to sound self-centered, but I should point out somewhere
It doesn't sound self-centered. =) Problem is that my knowledge base is obviously all in CPython so my startup costs are much lower than if I tried this in PyPy. Plus there is the point of embedding this into Firefox (possibly) eventually. Does PyPy support embedding yet at the C level? -Brett

Brett Cannon wrote:
I assume that the extent of incompatible changes would be limited as much as possible. So the only checks would be in operations that are directly affected by whatever incompatible changes are made. The performance and complexity costs of this are likely to be small -- or at least should not be assumed to be large before having hammered out a more detailed design. Suppose, for the sake of argument, that we introduced private methods and attributes. If an attribute in an existing standard library class was changed to be private, then code depending on it would break. But if there were a notion of a "compatibility private" attribute that acts as private only in a sandboxed interpreter, then no code running in an unprotected interpreter would break. -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

Brett Cannon wrote:
Another rationale for basing the work on CPython is that it should be possible to implement the resulting security model regardless of the implementation language used for the interpreter core (C/Python, Java/Python, C#/Python, RPython/Python). If you can figure out how to do it in C, it should be feasible to do it in the others. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Hi Brett, Here are some comments on your proposal. Sorry this took so long. I apologize if any of these comments are out of date (but also look forward to your answers to some of the questions, as they'll help me understand some more of the details of your proposal). Thanks!
Is this a definition or an implementation choice? As in, are you defining "sandboxed" to mean "with altered built-ins" or just "restricted in some way", and does the above mean to imply that altering the built-ins is what triggers other kinds of restrictions (as it did in Python's old restricted execution mode)?
Is a "bare interpreter" just one example of a sandboxed interpreter, or are all sandboxed interpreters in your design initially bare (i.e. "sandboxed" = "bare" + zero or more granted authorities)?
The "security domain" is the boundary at which security is cared about. For this dicussion, it is the interpreter.
It might be clearer to say (if i understand correctly) "Each interpreter is a separate security domain." Many interpreters can run within a single operating system process, right? Could you say a bit about what sort of concurrency model you have in mind? How would this interact (if at all) with use of the existing threading functionality?
The "powerbox" is the thing that possesses the ultimate power in the system. In our case it is the Python process.
This could also be the application process, right?
I'm glad to have this use case set out early in the document, so the reader can keep it in mind as an example while reading about the model.
As Mark Miller mentioned in another message, your descriptions of "who-I-am" security and "what-I-have" security make sense, but they don't correspond to "permission" vs. "authority". They correspond to "identity-based" vs. "authority-based" security.
Nice summary.
Don't local scopes count as private namespaces? It seems clear that they aren't designed with the intention of being exposed, unlike other namespaces in Python.
It also makes providing security at the object level using object-capabilities non-existent in pure Python code.
I don't think this is necessarily the case. No Python code i've ever seen expects to be able to invade the local scopes of other functions, so you could use them as private namespaces. There are two ways i've seen to invade local scopes: (a) Use gc.get_referents to get back from a cell object to its contents. (b) Compare the cell object to another cell object, thereby causing __eq__ to be invoked to compare the contents of the cells. So you could protect local scopes by prohibiting these or by simply turning off access to func_closure. It's clear that hardly any code depends on these introspection featuresl, so it would be reasonble to turn them off in a sandboxed interpreter. (It seems you would have to turn off some introspection features anyway in order to have reliable import guards.)
Yup.
Threat Model ///////////////////////////////////////
Good to see this specified here. I like the way you've broken this down.
* An interpreter cannot gain abilties the Python process possesses without explicitly being given those abilities.
It would be good to enumerate which abilities you're referring to in this item. For example, a bare interpreter should be able to allocate memory and call most of the built-in functions, but should not be able to open network connections.
* An interpreter cannot influence another interpreter directly at the Python level without explicitly allowing it.
You mean, without some other entity explicitly allowing it, right? What would that other entity be -- presumably the interpreter that spawned both of these sub-interpreters?
* An interpreter cannot use operating system resources without being explicitly given those resources.
Okay.
* A bare Python interpreter is always trusted.
What does "trusted" mean in the above?
* Python bytecode is always distrusted. * Pure Python source code is always safe on its own.
It would be helpful to clarify "safe" here. I assume by "safe" you mean that the Python source code can express whatever it wants, including potentially dangerous activities, but when run in a bare or sandboxed interpreter it cannot have harmful effects. But then in what sense does the "safety" have to do with the Python source code rather than the restrictions on the interpreter? Would it be correct to say: + We want to guarantee that Python source code cannot violate the restrictions in a restricted or bare interpreter. + We do not prevent arbitrary Python bytecode from violating these restrictions, and assume that it can.
By "malicious" do you just mean "anything that isn't accessible to a bare interpreter"?
* A sub-interpreter started by another interpreter does not inherit any state.
Do you envision a tree of interpreters and sub-interpreters? Can the levels of spawning get arbitrarily deep? If i am visualizing your model correctly, maybe it would be useful to introduce the term "parent", where each interpreter has as its parent either the Python process or another interpreter. Then you could say that each interpreter acquires authority only by explicit granting from its parent. Then i have another question: can an interpreter acquire authorities only when it is started, or can it acquire them while it is running, and how?
This sounds like part of your design to me. It might help to have this earlier in the document (maybe even with an example diagram of a tree of interpreters).
It might be good to declare two categories of resources -- those protected by object hiding and those protected by a per-interpreter setting -- and make lists.
You might qualify the last statement a bit. For example, a Python implementation of a pure algorithm (e.g. string processing, data compression, etc.) would still work in a sandboxed interpreter.
Keeping Python "pythonic" is required for all design decisions.
As Lawrence Oluyede also mentioned, it would be helpful to say a little more about what "pythonic" means.
Sounds good.
Abilities of a Standard Sandboxed Interpreter =============================================
[...]
This looks reasonable. This is probably a good place to itemize exactly which built-ins are considered safe.
Presumably these are passed in to the proxy's constructor.
I'm unclear about this. Is the whitelist a list of module names only, or of filenames with extensions? Does the normal path-searching process take place or can it be restricted in some way? Would it simplify the security analysis to have the whitelist be a dictionary that maps module names to absolute pathnames? If both the .py and .pyc are present, the normal import would find the .pyc file; would the import proxy reject such an import or ignore it and recompile the .py instead?
It must be warned that importing any C extension module is dangerous.
Right.
This seems like a good idea. Can you identify which minimum essential pieces of the import machinery have to be written in C?
The existence of the constructor isn't (by itself) the problem. The problem is that both of the following are true: (a) From any object you can get its type object. (b) Using any type object you can construct a new instance. So, you can control this either by hiding the type object, separating the constructor from the type, or disabling the constructor.
Looks good so far. Not sure i see what's dangerous about 'type'.
It seems that in most cases, a single Python object is associated with a single pathname. If that's true in general, one solution would be to provide an introspection function named 'getpath' or something similar that would get the path associated with any object. This function might go in a module containing all the introspection functions, so imports of that module could be easily restricted.
Okay, more to work out here. :)
Good.
* ``__del__`` created in sandboxed interpreter but object is cleaned up in unprotected interpreter.
How do you envision the launching of a sandboxed interpreter to look? Could you sketch out some rough code examples? Were you thinking of something like: sys.spawn(code, dict) code: a string containing Python source code dict: the global namespace in which to run the code If you allow the parent interpreter to pass mutable objects into the child interpreter, then the parent and child can already communicate via the object, so '__del__' is a moot issue. Do you want to prevent all communication between parent and child? It's not obvious to me why that would be necessary.
* Using frames to walk the frame stack back to another interpreter.
Could you just disable introspection of the frame stack?
Yes.
I think the last suggestion above would lead to confusion. The two groups should have two distinct names and it should be clear which attribute goes with which group.
Sounds good.
Lots more to think about here. :)
Preventing denial-of-service is in general quite difficult, but i applaud the attempt. I agree with your decision to separate this work from the rest of the security model. -- ?!ng

On 9/6/06, Ka-Ping Yee <python-dev@zesty.ca> wrote:
I think they are slightly outdated. The latest version of the doc is in the bcannon-objcap branch and is named securing_python.txt ( http://svn.python.org/view/python/branches/bcannon-objcap/securing_python.tx... ).
There is no "triggering" of other restrictions. This is an implementation choice. "Sandboxed" means "with altered built-ins".
You build up from a bare interpreter by adding in authorities (e.g., providing a wrapped version of open()) to reach the level of security you want.
Many interpreters can run within a single operating system process, right?
Yes. Could you say a bit about what sort of concurrency model you
have in mind?
None specifically. Each new interpreter automatically runs in its own Python thread, so they have essentially the same concurrency as using the 'thread' module. How would this interact (if at all) with use of the
existing threading functionality?
See above.
If Python is embedded, yes.
Right. This was fixed the day Mark and Alan Karp made the comment.
Sort of. But you can still get access to them if you have an execution frame and they are not persistent. Generators are are worse since they store their execution frame with the generator itself, completely exposing the local namespace.
It also makes providing security at the object level using
object-capabilities non-existent in pure Python code.
I don't think this is necessarily the case. No Python code i've
Or the execution frame which is exposed directly on generators. But regardless, the comment was meant to apply to Python as it stands, not that it couldn't be possibly tweaked somehow. So you could protect local scopes by prohibiting these or by
Maybe this can be changed in the future, but this more than I need at the moment so I am not going to go down that path right now. But I added a quick mention of this.
The current version has more details per point than the one you read.
Yep. What would that other entity be -- presumably the interpreter that
spawned both of these sub-interpreters?
Sure. You could stick something in the built-in namespace of the sub-interpreter to use for communicating.
It means that if Python source code can execute within a bare interpreter it is considered safe code. This is covered in the new version of the doc.
Anything that could harm the system or interpreter.
Yes and yes. If i am visualizing your model correctly, maybe it would be useful to
You could, although there is not hierarchy at the implementation level. But it works in terms of who has a reference to whom and who gives each interpreter their authority. Then i have another question: can an interpreter acquire
authorities only when it is started, or can it acquire them while it is running, and how?
Well, whatever you want to do through the built-in namespace. So if you pass in a mutable object like a dict and add stuff to it on the fly, I don't see why you couldn't give new authorities on the fly.
Made Guiding Principles its own section and split off the bottom part of the section and put it under Implementation.
That is rather unknown since I am constantly finding stuff that is global to the process compared to the interpreter, so making the list seems premature.
I tossed in "all" to clarify.
Done in the current version.
Current plan is to expose the built-in namespace, imported modules, and sys module dict when creating an Interpreter instance.
Have not deciced, but probably module name. Does the normal path-searching process
take place or can it be restricted in some way?
Have not decided. Would it simplify the
security analysis to have the whitelist be a dictionary that maps module names to absolute pathnames?
Don't know. Protecting imports is the last thing I am going to implement since it is the trickiest. If both the .py and .pyc are present, the normal import would find the
.pyc file; would the import proxy reject such an import or ignore it and recompile the .py instead?
Somethign along those lines.
Loading of C extensions, stating files, reading files, etc. Pretty much that requires help from the OS.
I separated the constructor or initializer (tp_new or tp_init) into a factory function.
That's why it has the question mark. =)
That is the current thinking.
Possibly. I might have to wait until I am much closer to being done to discover more places where mutable shared state is exposed in a bare interpreter because I have not been able to think of anymore.
Were you thinking of
No, I don't since there should be a secure way to allow that. The __del__ worry came up from Guido pointing out you might be able to screw with it. But if you pass in something implemented in C you should be okay.
* Using frames to walk the frame stack back to another interpreter.
Could you just disable introspection of the frame stack?
If you don't allow importing of 'sys' then yes, and that is planned. I just wanted to make sure I didn't forget this needs to be protected. I do need to check what a generator's frame exposes, though.
This is also more complicated by the fact that some things are for the entire process while others are per interpreter. Might have to separate things out even more.
Oh yeah. =)
The memory tracking has a proof-of-concept done in the bcannon-sandboxing branch. Not perfect, but it does show how one could go about accounting for every byte of data in terms of what it is basically used for. -Brett

Brett Cannon wrote:
This may not be relevant or possible, in which case I apologise, but the .NET model of creating application domains is extremely useful. It allows you to assign domains and run code within those domains. This means, for example, you can create a plugin system and run the plugins in a secure domain. I realise that this was the intent of the original rexec module, and your proposed new design (which is very exciting) overcomes the difficulties in that approach. The only approach using the new system would be interprocess communication (?) with a trusted interpreter communicating with an un-trusted one. Would the communication layer need to be implemented as a C extension, or will a standard Python API be possible ? Hmmm.... maybe I should read your doc. :-) Michael Foord http://www.voidspace.org.uk/python/index.shtml

Michael Foord wrote:
Ok, started to read the doc - and realise it specifically addresses these issues. My apologies :-) Michael http://www.voidspace.org.uk/python/index.shtml

That's great. I just read your draft but I have little comments to do but before let me say that I liked the idea to borrow concepts from E. I've crossed the E's path in the beginning of this year and I found it a pot of really nice ideas (for promises and capabilities). Here are my comments about the draft: - it's not really clear to me what the "powerbox" is. I think I got the concept of "super process" but maybe it's to be clarified, isn't it? It become clear in the "threat model" paragraph - I hope no Rubystas will read the "Problem of No Private Namespace" section because they have private/protected keywords to enforce this stuff :-) Writing proxies in C will slow down the dev process (altough will speed up the performance maybe) but in a far future someone will come up with an alternative closer to the Python level - Can you write down a simple example of what you mean with "changing something of the built-in objects"? (in "Problem of mutable shared state") - What about the performance issues of the capabilities model overall? - I know what you meant to say but the paragraph about pythonicness and the security model seems a little "fuzzy" to me. Which are the boundaries of the allowed changes for the security stuff? - You don't say anything about networking and networked resources in the list of the standard sandboxed interpreter - Suppose we have a .py module. Based on your security model we can import it, right? When imported it generates a .pyc file. The second time we import it what happens? .pyc is ignored? import is not allowed at all? We can't rely on the name of the file.pyc because an attacker who knows the file.py is secure and the second import is done against file.pyc can replace the "secure" file.pyc with an implementation not secure and can do some kind of harm to the sandbox - About "Filesystem information". Does the sandboxed interpreter need to know all that information about file paths, files and so on? Can't we reset those attributes to something arbitrary? - About sys module: I think the best way is to have a purged fake sys module with only the stuff you need. pypy has the concept of faked modules too (altough for a different reason) - About networking: what do you think about the E's model of really safe networking, protected remotable objects and safe RPC? Is that model applicable to Python's in some way? We can't use the E's model as a whole (ask people to generate a safe key and send it by email is unfeasible) - is the protected memory model a some kind of memory monitor system? I think that's all for the draft. I wrote these comments during the reading of the document. Hope some of these help -- Lawrence http://www.oluyede.org/blog

On 7/20/06, Lawrence Oluyede <l.oluyede@gmail.com> wrote:
The powerbox is the thing that gives your security domains their initial abilities. The OS gives the process its abilities, but it does not directly work with the interpreter. Since the process does, though, it is considered the powerbox and farms out abilities that it has been given by the OS. I have tried to clarify the definition at the start of the doc. - I hope no Rubystas will read the "Problem of No Private Namespace"
Maybe. As I said in the doc, any changes must be Pythonic and adding private namespaces right now wouldn't be without much more thought and work. And if Ruby ends up with this security model but more thoroughly, more power to them. Their language is different in the right ways to support it. As for coding in C, thems the breaks. I plan in adding stuff to the stdlib for the common case. I might eventually think of a good, generic proxy object that could be used, but as of right now I am not worrying about that since it would be icing on the cake. - Can you write down a simple example of what you mean with "changing
something of the built-in objects"? (in "Problem of mutable shared state")
Done. - What about the performance issues of the capabilities model overall? Should be faster than an IBAC model since certain calls will not need to check the identity of the caller every time. But I am not worrying about performance, I am worrying about correctness, so I did not try to make any performance claims. - I know what you meant to say but the paragraph about pythonicness
and the security model seems a little "fuzzy" to me. Which are the boundaries of the allowed changes for the security stuff?
Being "pythonic" is a fuzzy term in itself and Guido is the only person who can make definitive claims over what is and is not Pythonic. As I have said, this doc was mostly written with python-dev in mind since they are the ones I have to convince to let this into the core and they all know the term. But I have tacked in a sentence on what the term means. - You don't say anything about networking and networked resources in
the list of the standard sandboxed interpreter
Nope. Have not started worrying about that yet. Just trying to get the basic model laid out. - Suppose we have a .py module. Based on your security model we can
It will be ignored. But I am hoping that through rewriting the import machinery more control over generating .pyc files can be had (see Skip Montanaro's PEP on this; forget the number). This is why exact details were left out of the implementation details. I just wanted people understand the approach to everything, not the concrete details of how it will be coded up. - About "Filesystem information". Does the sandboxed interpreter need
to know all that information about file paths, files and so on? Can't we reset those attributes to something arbitrary?
That is the point. It is not that the sandbox needs to know it, its that it needs to be hidden from the sandbox. - About sys module: I think the best way is to have a purged fake sys
module with only the stuff you need. pypy has the concept of faked modules too (altough for a different reason)
OK. - About networking: what do you think about the E's model of really
I have not looked at it. I am also not trying to build an RPC system *and* a security model for Python. That is just too much work right now. - is the protected memory model a some kind of memory monitor system? Basically. It just keeps a size_t on the memory cap and another on memory usage, and when memory is requested it makes sure that it won't go over the cap. And when memory is freed the usage goes down. It's very rough (hard to account for padding bits, etc. in C structs), but it should be good enough to prevent a program from hitting 800 MB when you really just wanted it to have 5 MB. I think that's all for the draft. I wrote these comments during the
reading of the document.
Hope some of these help
Thanks, Lawrence. -Brett

Got that.
Nope. Have not started worrying about that yet. Just trying to get the basic model laid out.
Ok sorry to have bothered
That is the point. It is not that the sandbox needs to know it, its that it needs to be hidden from the sandbox.
So I think that's a "simple" step during the importing step.
I have not looked at it. I am also not trying to build an RPC system *and* a security model for Python. That is just too much work right now.
Ok sorry :-)
Thanks, Lawrence.
Thank you! -- Lawrence http://www.oluyede.org/blog

"Lawrence Oluyede" <l.oluyede@gmail.com> wrote in message news:9eebf5740607200117r4d4613e2i91665ea211bab46@mail.gmail.com...
- I know what you meant to say but the paragraph about pythonicness and the security model seems a little "fuzzy" to me.
I agree that this paragraph is weak and recommend that it be rewritten. In particular, I think the 'pythonic*' words should go, especially if you expect this document to be read by anyone other than dedicated pythonistas. I would start with something like "It is my goal that my thesis work be incorporated in some future version of the Python distribution. This has two constraints. First, changes to the core must not slow down normal operation. Second, visible changes must not violate the spirit and style of Python that make it a distinctive language." This alludes to the fact that your proposal discusses two highly overlapping yet separate projects: write a thesis that gains you a PhD degree; and produce an accepted patch set that give Python a useful security capability it does not now have. They have to be thought of as somewhat separate because you have two sets of 'overseers' and approvers: your thesis advisor and committee for the first; and Guido and other Python developers for the second. I think your thesis should currently be your first priority. Your current paragraph implied to me that you would not follow a promising line of research if you could not see how to make it 'pythonic'. If I were on your thesis committee, I think that would bother me ;-). In any case, I wish you the best with a double project that is obviously not a 'gimme'. Terry Jan Reedy (PhD, though not on any thesis committees)

Brett Cannon wrote:
How do you plan to handle CPU-hogs? Stuff like execution of a gigantic integer multiplication. This recipe for safe_eval: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746 which is otherwise very cute, does not handle this case as well: it tries to catch and interrupt long-running operations through a secondary thread, but fails on a single long operation because the GIL is not released and the alarm thread does not get its chance to run. -- Giovanni Bajo

Brett Cannon wrote:
I think the trick used by the safe_eval recipe (a separate thread which interrupts the script through thread.interrupt_main()) shows that, in most cases, it's possible to make sure that an embedded script does not take too long to execute. Do you agree that this usage case ("allow me to timeout an embedded script") is something which would be a very good start in the right direction? Now, I wonder, in a restricted execution environment such as that depicted in your document, how many different ways are there to make the Python interpreter enter a long calcolation loop which does not release the GIL? I can think of bignum*bignum, bignum**bignum or similar mathematical operations, but there are really a few. If we could make those release the GIL (or poll some kind of watchdog used to abort them, pretty much like they normally poll CTRL+C), then the same trick used by the recipe could be used. -- Giovanni Bajo

On 7/20/06, Giovanni Bajo <rasky@develer.com> wrote:
Probably. I just don't feel like worrying about it right now. =) Now, I wonder, in a restricted execution environment such as that depicted
Well, any work that does most of its calculation within C code and that does not touch base with the interpreter on a semi-regular basis would need to relesae the GIL. -Brett

For code objects, their construction is already commonly written as "compile(source)". For type objects, the constructor doesn't let you do anything you can't already do with a class statement. It doesn't need securing. For rewriting import.c in Python, the PEP 302 compliant import system API in pkgutil would be a good starting point. Your doc also asks about the imp.get_suffixes() list, and wonder where to set it from Python. As far as I am aware, you can't. get_suffixes() is built from _PyImport_FileTab, which is a C array. A switch statement is used to get from the file table entries to the appropriate handler functions. Quoting from the suggestions I put to the Py3k list: Use smarter data structures --------------------------- Currently, the individual handlers to load a fully identified module are exposed to Python code in a way that reflects the C-style data structures used in the current implementation. Simply switching to more powerful data structures for the file type handlers (i.e. use a PyTuple for filedescr values, a PyList for _PyImport_FileTab, and a PyDict instead of a switch statement to go from filedescr values to module loading/initialisation functions) and manipulating them all as normal Python objects could make the code in import.c much easier to follow. Extensible file type handling ----------------------------- If the file type handlers are stored in normal Python data structures as described above, it becomes feasible to make the import system extensible to different file types as well as to different file locations. This could be handled on a per-package basis, e.g. via a __file_types__ special attribute in packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On 7/20/06, Nick Coghlan <ncoghlan@gmail.com> wrote:
For code objects, their construction is already commonly written as "compile(source)".
Right, but some people like to construct directly from bytecode. For type objects, the constructor doesn't let you do anything you can't
already do with a class statement. It doesn't need securing.
I figured as much, but when I was making the list I was not sure and didn't want to stop my writing momentum to check. For rewriting import.c in Python, the PEP 302 compliant import system API in
pkgutil would be a good starting point.
Yep. Plan on looking at all of the various modules in the stdlib that assist with importing, package PEP (I think there is one), and PEP 302. Your doc also asks about the imp.get_suffixes() list, and wonder where to
Ah, OK. Quoting from the suggestions I put to the Py3k list:
Yep. I just kind of glanced at the rest of your suggestions, Nick, since I assumed a lot of it would change (or could be changed) if import was redone in as much Python as possible. Extensible file type handling
Yep. Although I am more interested in restricting than broadening the file types. This could be handled on a per-package basis, e.g. via a __file_types__
special attribute in packages.
Maybe. I don't want to get into introducing new abilities to start, though. -Brett

Brett Cannon wrote:
Either way you'd be mutating the list of recognised file types :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Hi Brett, On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
I also plan to rewrite the import machinery in pure Python.
http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/importing.py A bientot, Armin

On 7/22/06, Armin Rigo <arigo@tunes.org> wrote:
Thanks for the link, Armin. Since you guys don't have the import restrictions the CPython version would have and just have different coding needs for RPython obviously I can't just do a blind copy. But I will definitely take a look as I develop. Maybe you guys can even help to lower the duplication if it makes sense for you. BTW, do you guys happen to have extra tests from import? -Brett

Hi Brett, On Sat, Jul 22, 2006 at 10:33:19AM -0700, Brett Cannon wrote:
Yes, it should be possible to abstract the common logic in some way, using some kind of interface for all OS inspection and 'sys.modules' manipulations.
BTW, do you guys happen to have extra tests from import?
Yes, there is http://codespeak.net/svn/pypy/dist/pypy/module/__builtin__/test/test_import.... which will also need a bit of rewriting, but that should be straightforward. A bientot, Armin

Re-hi, On Wed, Jul 19, 2006 at 03:35:45PM -0700, Brett Cannon wrote:
http://svn.python.org/view/python/branches/bcannon-sandboxing/securing_python.txt?rev=50717&view=log.
I'm not sure I understand what you propose to fix holes like constructors and __subclasses__: it seems that you want to remove them altogether (and e.g. make factory functions instead). That would completely break all programs, right? I mean, there is no way such changes would go into mainstream CPython. Or do you propose to maintain a CPython branch manually for the foreseeable future? (From experience this is a bad idea...) A bientot, Armin

On 7/22/06, Armin Rigo <arigo@tunes.org> wrote:
Not altogether, just constructors on select types who are considered dangerous from a security standpoint. The breakage won't be horrible, but it will be there for advanced Python code. I will try to make the wording more clear when I get back to work on Tuesday.
I mean, there is no way such changes would go into mainstream CPython.
If this has to wait until Py3k then so be it.
Yeah, not my idea of fun either, but since this is a long term project, I will at least need to for the foreseeable future. -Brett

Armin Rigo wrote:
If I understand correctly, the proposal is that any incompatible changes to the language would apply only in "sandboxed" interpreters. So there is no reason why support for these couldn't go into the main branch. Of course we want to minimize the changes that will need to be made to programs and libraries to make them work in a sandboxed interpreter, but not at the expense of security. Some incompatible changes will be necessary. -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

Hi David, hi Brett, On Sun, Jul 23, 2006 at 02:18:48AM +0100, David Hopwood wrote:
That's what I originally thought too, but Brett writes: Implementation Details ======================== An important point to keep in mind when reading about the implementation details for the security model is that these are general changes and are not special to any type of interpreter, sandboxed or otherwise. That means if a change to a built-in type is suggested and it does not involve a proxy, that change is meant Python-wide for *all* interpreters. So that's why I'm starting to worry that Brett is proposing to change the regular Python language too. However, Brett, you also say somewhere else that backward compatibility is not an issue. So I'm a bit confused actually... Also, I hate to sound self-centered, but I should point out somewhere that PyPy was started by people who no longer wanted to maintain a fork of CPython, and preferred to work on building CPython-like variants automatically. Many of the security features you list would be quite easier to implement and maintain in PyPy than CPython -- also from a security perspective: it is easier to be sure that some protection is complete, and remains complete over time, if it is systematically generated instead of hand-patched in a dozen places. A bientot, Armin

On 7/23/06, Armin Rigo <arigo@tunes.org> wrote:
Yes, I am proposing changing some constructors and methods on some built-in types for the regular languages. That's it. No new keywords or major semantic changes and such. If I make changes just for sandboxed interpreters it changes the general approach of the security model by then requiring an identity check to see if the interpreter is sandboxed or not. However, Brett, you also say somewhere
else that backward compatibility is not an issue. So I'm a bit confused actually...
Since this is my Ph.D. dissertation first and foremost, I am not going to tie my hands in such a way that I have to make too much of a compromise in order for this to work. I obviously don't want to change the feel of Python, but if I have to remove the constructor for code objects to prevent evil bytecode or __subclasses__() from object to prevent poking around stuff, then so be it. For this project, security is trumpeting backwards-compatibility when the latter is impossible in order to have the former. I will obviously try to minimize it, but something that works at such a basic level of the language is just going to require some changes for it to work. Also, I hate to sound self-centered, but I should point out somewhere
It doesn't sound self-centered. =) Problem is that my knowledge base is obviously all in CPython so my startup costs are much lower than if I tried this in PyPy. Plus there is the point of embedding this into Firefox (possibly) eventually. Does PyPy support embedding yet at the C level? -Brett

Brett Cannon wrote:
I assume that the extent of incompatible changes would be limited as much as possible. So the only checks would be in operations that are directly affected by whatever incompatible changes are made. The performance and complexity costs of this are likely to be small -- or at least should not be assumed to be large before having hammered out a more detailed design. Suppose, for the sake of argument, that we introduced private methods and attributes. If an attribute in an existing standard library class was changed to be private, then code depending on it would break. But if there were a notion of a "compatibility private" attribute that acts as private only in a sandboxed interpreter, then no code running in an unprotected interpreter would break. -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

Brett Cannon wrote:
Another rationale for basing the work on CPython is that it should be possible to implement the resulting security model regardless of the implementation language used for the interpreter core (C/Python, Java/Python, C#/Python, RPython/Python). If you can figure out how to do it in C, it should be feasible to do it in the others. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Hi Brett, Here are some comments on your proposal. Sorry this took so long. I apologize if any of these comments are out of date (but also look forward to your answers to some of the questions, as they'll help me understand some more of the details of your proposal). Thanks!
Is this a definition or an implementation choice? As in, are you defining "sandboxed" to mean "with altered built-ins" or just "restricted in some way", and does the above mean to imply that altering the built-ins is what triggers other kinds of restrictions (as it did in Python's old restricted execution mode)?
Is a "bare interpreter" just one example of a sandboxed interpreter, or are all sandboxed interpreters in your design initially bare (i.e. "sandboxed" = "bare" + zero or more granted authorities)?
The "security domain" is the boundary at which security is cared about. For this dicussion, it is the interpreter.
It might be clearer to say (if i understand correctly) "Each interpreter is a separate security domain." Many interpreters can run within a single operating system process, right? Could you say a bit about what sort of concurrency model you have in mind? How would this interact (if at all) with use of the existing threading functionality?
The "powerbox" is the thing that possesses the ultimate power in the system. In our case it is the Python process.
This could also be the application process, right?
I'm glad to have this use case set out early in the document, so the reader can keep it in mind as an example while reading about the model.
As Mark Miller mentioned in another message, your descriptions of "who-I-am" security and "what-I-have" security make sense, but they don't correspond to "permission" vs. "authority". They correspond to "identity-based" vs. "authority-based" security.
Nice summary.
Don't local scopes count as private namespaces? It seems clear that they aren't designed with the intention of being exposed, unlike other namespaces in Python.
It also makes providing security at the object level using object-capabilities non-existent in pure Python code.
I don't think this is necessarily the case. No Python code i've ever seen expects to be able to invade the local scopes of other functions, so you could use them as private namespaces. There are two ways i've seen to invade local scopes: (a) Use gc.get_referents to get back from a cell object to its contents. (b) Compare the cell object to another cell object, thereby causing __eq__ to be invoked to compare the contents of the cells. So you could protect local scopes by prohibiting these or by simply turning off access to func_closure. It's clear that hardly any code depends on these introspection featuresl, so it would be reasonble to turn them off in a sandboxed interpreter. (It seems you would have to turn off some introspection features anyway in order to have reliable import guards.)
Yup.
Threat Model ///////////////////////////////////////
Good to see this specified here. I like the way you've broken this down.
* An interpreter cannot gain abilties the Python process possesses without explicitly being given those abilities.
It would be good to enumerate which abilities you're referring to in this item. For example, a bare interpreter should be able to allocate memory and call most of the built-in functions, but should not be able to open network connections.
* An interpreter cannot influence another interpreter directly at the Python level without explicitly allowing it.
You mean, without some other entity explicitly allowing it, right? What would that other entity be -- presumably the interpreter that spawned both of these sub-interpreters?
* An interpreter cannot use operating system resources without being explicitly given those resources.
Okay.
* A bare Python interpreter is always trusted.
What does "trusted" mean in the above?
* Python bytecode is always distrusted. * Pure Python source code is always safe on its own.
It would be helpful to clarify "safe" here. I assume by "safe" you mean that the Python source code can express whatever it wants, including potentially dangerous activities, but when run in a bare or sandboxed interpreter it cannot have harmful effects. But then in what sense does the "safety" have to do with the Python source code rather than the restrictions on the interpreter? Would it be correct to say: + We want to guarantee that Python source code cannot violate the restrictions in a restricted or bare interpreter. + We do not prevent arbitrary Python bytecode from violating these restrictions, and assume that it can.
By "malicious" do you just mean "anything that isn't accessible to a bare interpreter"?
* A sub-interpreter started by another interpreter does not inherit any state.
Do you envision a tree of interpreters and sub-interpreters? Can the levels of spawning get arbitrarily deep? If i am visualizing your model correctly, maybe it would be useful to introduce the term "parent", where each interpreter has as its parent either the Python process or another interpreter. Then you could say that each interpreter acquires authority only by explicit granting from its parent. Then i have another question: can an interpreter acquire authorities only when it is started, or can it acquire them while it is running, and how?
This sounds like part of your design to me. It might help to have this earlier in the document (maybe even with an example diagram of a tree of interpreters).
It might be good to declare two categories of resources -- those protected by object hiding and those protected by a per-interpreter setting -- and make lists.
You might qualify the last statement a bit. For example, a Python implementation of a pure algorithm (e.g. string processing, data compression, etc.) would still work in a sandboxed interpreter.
Keeping Python "pythonic" is required for all design decisions.
As Lawrence Oluyede also mentioned, it would be helpful to say a little more about what "pythonic" means.
Sounds good.
Abilities of a Standard Sandboxed Interpreter =============================================
[...]
This looks reasonable. This is probably a good place to itemize exactly which built-ins are considered safe.
Presumably these are passed in to the proxy's constructor.
I'm unclear about this. Is the whitelist a list of module names only, or of filenames with extensions? Does the normal path-searching process take place or can it be restricted in some way? Would it simplify the security analysis to have the whitelist be a dictionary that maps module names to absolute pathnames? If both the .py and .pyc are present, the normal import would find the .pyc file; would the import proxy reject such an import or ignore it and recompile the .py instead?
It must be warned that importing any C extension module is dangerous.
Right.
This seems like a good idea. Can you identify which minimum essential pieces of the import machinery have to be written in C?
The existence of the constructor isn't (by itself) the problem. The problem is that both of the following are true: (a) From any object you can get its type object. (b) Using any type object you can construct a new instance. So, you can control this either by hiding the type object, separating the constructor from the type, or disabling the constructor.
Looks good so far. Not sure i see what's dangerous about 'type'.
It seems that in most cases, a single Python object is associated with a single pathname. If that's true in general, one solution would be to provide an introspection function named 'getpath' or something similar that would get the path associated with any object. This function might go in a module containing all the introspection functions, so imports of that module could be easily restricted.
Okay, more to work out here. :)
Good.
* ``__del__`` created in sandboxed interpreter but object is cleaned up in unprotected interpreter.
How do you envision the launching of a sandboxed interpreter to look? Could you sketch out some rough code examples? Were you thinking of something like: sys.spawn(code, dict) code: a string containing Python source code dict: the global namespace in which to run the code If you allow the parent interpreter to pass mutable objects into the child interpreter, then the parent and child can already communicate via the object, so '__del__' is a moot issue. Do you want to prevent all communication between parent and child? It's not obvious to me why that would be necessary.
* Using frames to walk the frame stack back to another interpreter.
Could you just disable introspection of the frame stack?
Yes.
I think the last suggestion above would lead to confusion. The two groups should have two distinct names and it should be clear which attribute goes with which group.
Sounds good.
Lots more to think about here. :)
Preventing denial-of-service is in general quite difficult, but i applaud the attempt. I agree with your decision to separate this work from the rest of the security model. -- ?!ng

On 9/6/06, Ka-Ping Yee <python-dev@zesty.ca> wrote:
I think they are slightly outdated. The latest version of the doc is in the bcannon-objcap branch and is named securing_python.txt ( http://svn.python.org/view/python/branches/bcannon-objcap/securing_python.tx... ).
There is no "triggering" of other restrictions. This is an implementation choice. "Sandboxed" means "with altered built-ins".
You build up from a bare interpreter by adding in authorities (e.g., providing a wrapped version of open()) to reach the level of security you want.
Many interpreters can run within a single operating system process, right?
Yes. Could you say a bit about what sort of concurrency model you
have in mind?
None specifically. Each new interpreter automatically runs in its own Python thread, so they have essentially the same concurrency as using the 'thread' module. How would this interact (if at all) with use of the
existing threading functionality?
See above.
If Python is embedded, yes.
Right. This was fixed the day Mark and Alan Karp made the comment.
Sort of. But you can still get access to them if you have an execution frame and they are not persistent. Generators are are worse since they store their execution frame with the generator itself, completely exposing the local namespace.
It also makes providing security at the object level using
object-capabilities non-existent in pure Python code.
I don't think this is necessarily the case. No Python code i've
Or the execution frame which is exposed directly on generators. But regardless, the comment was meant to apply to Python as it stands, not that it couldn't be possibly tweaked somehow. So you could protect local scopes by prohibiting these or by
Maybe this can be changed in the future, but this more than I need at the moment so I am not going to go down that path right now. But I added a quick mention of this.
The current version has more details per point than the one you read.
Yep. What would that other entity be -- presumably the interpreter that
spawned both of these sub-interpreters?
Sure. You could stick something in the built-in namespace of the sub-interpreter to use for communicating.
It means that if Python source code can execute within a bare interpreter it is considered safe code. This is covered in the new version of the doc.
Anything that could harm the system or interpreter.
Yes and yes. If i am visualizing your model correctly, maybe it would be useful to
You could, although there is not hierarchy at the implementation level. But it works in terms of who has a reference to whom and who gives each interpreter their authority. Then i have another question: can an interpreter acquire
authorities only when it is started, or can it acquire them while it is running, and how?
Well, whatever you want to do through the built-in namespace. So if you pass in a mutable object like a dict and add stuff to it on the fly, I don't see why you couldn't give new authorities on the fly.
Made Guiding Principles its own section and split off the bottom part of the section and put it under Implementation.
That is rather unknown since I am constantly finding stuff that is global to the process compared to the interpreter, so making the list seems premature.
I tossed in "all" to clarify.
Done in the current version.
Current plan is to expose the built-in namespace, imported modules, and sys module dict when creating an Interpreter instance.
Have not deciced, but probably module name. Does the normal path-searching process
take place or can it be restricted in some way?
Have not decided. Would it simplify the
security analysis to have the whitelist be a dictionary that maps module names to absolute pathnames?
Don't know. Protecting imports is the last thing I am going to implement since it is the trickiest. If both the .py and .pyc are present, the normal import would find the
.pyc file; would the import proxy reject such an import or ignore it and recompile the .py instead?
Somethign along those lines.
Loading of C extensions, stating files, reading files, etc. Pretty much that requires help from the OS.
I separated the constructor or initializer (tp_new or tp_init) into a factory function.
That's why it has the question mark. =)
That is the current thinking.
Possibly. I might have to wait until I am much closer to being done to discover more places where mutable shared state is exposed in a bare interpreter because I have not been able to think of anymore.
Were you thinking of
No, I don't since there should be a secure way to allow that. The __del__ worry came up from Guido pointing out you might be able to screw with it. But if you pass in something implemented in C you should be okay.
* Using frames to walk the frame stack back to another interpreter.
Could you just disable introspection of the frame stack?
If you don't allow importing of 'sys' then yes, and that is planned. I just wanted to make sure I didn't forget this needs to be protected. I do need to check what a generator's frame exposes, though.
This is also more complicated by the fact that some things are for the entire process while others are per interpreter. Might have to separate things out even more.
Oh yeah. =)
The memory tracking has a proof-of-concept done in the bcannon-sandboxing branch. Not perfect, but it does show how one could go about accounting for every byte of data in terms of what it is basically used for. -Brett
participants (10)
-
Armin Rigo
-
Brett Cannon
-
David Hopwood
-
Giovanni Bajo
-
Greg Ewing
-
Ka-Ping Yee
-
Lawrence Oluyede
-
Michael Foord
-
Nick Coghlan
-
Terry Reedy