<br><br><div><span class="gmail_quote">On 7/4/06, <b class="gmail_sendername">Ka-Ping Yee</b> <<a href="mailto:python-dev@zesty.ca">python-dev@zesty.ca</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Brett,<br><br>Here are some comments on the description of the restricted execution<br>model that you posted.<br><br>> When referring to the state of an interpreter, it is either "trusted" or<br>> "untrusted". A trusted interpreter has no restrictions imposed upon any
<br>> resource. An untrusted interpreter has at least one, possibly more, resource<br>> with a restriction placed upon it.<br><br>In response to Guido's comment about confusing the words "trusted" and<br>
"untrusted", how about "empowered" and "restricted"?</blockquote><div><br>Maybe. I am really starting to lean towards trusted and sandboxed. <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> When the Interpreter Is Embedded<br>> ================================<br>><br>> Single Untrusted Interpreter<br>> ----------------------------<br>><br>> This use case is when an application embeds the interpreter and never has more
<br>> than one interpreter running.<br>><br>> The main security issue to watch out for is not having default abilities be<br>> provided to the interpreter by accident.<br><br>I'd rather rephrase this in the opposite direction. The onus shouldn't
<br>be on the application to hunt down each possible dangerous authority and<br>deactivate them all one by one. The main security issue is to let the<br>application choose which abilities it wants the restricted interpreter
<br>to have, and then ensure that the restricted interpreter gets only those<br>abilities.</blockquote><div><br>Right. I am thinking more of an implementation screw up that somehow provides access to an object that has escalated rights.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Multiple Untrusted Interpreters<br>> -------------------------------<br>
><br>> When multiple interpreters, all untrusted at varying levels, need to be<br>> running within a single application. This is the key use case that this<br>> proposed design is targetted for.<br>><br>> On top of the security issues from a single untrusted interpreter,
<br>> there is one additional worry. Resources cannot end up being leaked<br>> into other interpreters where they are given escalated rights.<br><br>What is your model here for communication between interpreters? If two
<br>interpreters can communicate, any attempt to "prevent leakage" of<br>resources is meaningless. When you say "leaked into other interpreters"<br>are you talking about a Python object leaking or something else at a
<br>lower level?</blockquote><div><br>I am talking about Python objects.<br><br>As for communication, I was planning on something included directly in globals or some custom object to handle that. I have not been focusing on that aspect so far.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Suppose for example that the application wants to embed two interpreters,<br>P and Q, and that the application wants P to be able to write files but
<br>Q to be restricted against writing files. When you say "leaked" above,<br>that suggests to me that you want to prevent something like<br><br> # code running in P<br> import spam<br> f = open('/home/doofus/.ssh/authorized_keys', 'a')
<br> spam.f = f<br><br> # code running in Q<br> import spam<br> spam.f.write('blargh')<br><br>The above example supposes that P and Q can communicate through a<br>shared module, spam, where they can pass Python objects.
</blockquote><div><br>Right. But Python modules are separate per interpreter and only C extension modules are in any way shared between interpreters. But sharing an open file like that is bad and why C extension modules must be whitelisted to be used.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">But notice that even if you prevent them from passing Python objects<br>like open files, any form of communication is sufficient to leak
<br>resources:<br><br> # code running in P<br> def add_key(key):<br> f = open('/home/doofus/.ssh/authorized_keys', 'a')<br> f.write(key + '\n')<br> f.close()<br><br> import socket<br> s =
socket.socket()<br> s.bind(('', 6666))<br> s.listen(1)<br> ns, addr = s.accept()<br> add_key(ns.recv(100))<br><br><br> # code running in Q<br> import webbrowser<br> webbrowser.open('<a href="http://localhost:6666/zebra'">
http://localhost:6666/zebra'</a>)<br><br>As long as P can listen for instructions from Q, it can give Q<br>the power to write to the filesystem.</blockquote><div><br>Right, which is why sockets and files are restricted and turned off by default. You have to give explicit permission to use either resource.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Filesystem<br>> ===================<br>><br>> The most obvious facet of a filesystem to protect is reading from it.
<br>> One does not want what is stored in ``/etc/passwd`` to get out. And<br>> one also does not want writing to the disk unless explicitly allowed<br>> for basically the same reason; if someone can write ``/etc/passwd``
<br>> then they can set the password for the root account.<br><br>There's a big difference between modifying (or erasing) an existing file<br>and writing a new file (e.g. for temporary storage). If i give you a<br>little filesystem of your own to play in, and it starts out empty, you
<br>can put whatever you want in it without violating my secrecy or the<br>integrity of my files.<br><br>I think you should be talking about this in terms of specifically<br>what abilities you want to be able to allow, based on examples of
<br>real-life applications.</blockquote><div><br>Fair enough. But since you have the ability to only list files specifically, you can give temporary file access by giving access to such a non-existent file for writing. If you don't like an existing file then you don't get access to it.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Physical Resources<br>> ===================<br>><br>> Memory should be protected. It is a limited resource on the system
<br>> that can have an impact on other running programs if it is exhausted.<br>> Being able to restrict the use of memory would help alleviate issues<br>> from denial-of-service (DoS) attacks.<br><br>> Networking
<br>> ===================<br>><br>> Networking is somewhat like the filesystem in terms of wanting similar<br>> protections. You do not want to let untrusted code make tons of socket<br>> connections or accept them to do possibly nefarious things (
e.g., acting<br>> as a zombie).<br>><br>> You also want to prevent finding out information about the network you are<br>> connected to. This includes doing DNS resolution since that allows one<br>> to find out what addresses your intranet has or what subnets you use.
<br><br>Again, it's risky to describe only individual cases of things to<br>prevent. What networking abilities are safe or necessary for the<br>kinds of applications you have in mind? Start from nothing and<br>work up from there.
</blockquote><div><br>That's the plan. I am planning to go through socket function by function and explicitly allow access as warranted and block everything else. It is not going to be "let's block DNS and allow everything else". Sorry if that wasn't clear. This is mostly just to say "I plan on restricting this kind of stuff, here is an example".
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Interpreter<br>> ===================<br>><br>> One must make sure that the interpreter is not harmed in any way.
<br>> There are several ways to possibly do this. One is generating<br>> hostile bytecode. Another is some buffer overflow. In general any<br>> ability to crash the interpreter is unacceptable.<br><br>This is hard for me to understand. What exactly do you trust and
<br>not trust? It seems to me that crashing an interpreter is only a<br>problem if a single interpreter is running both trusted and untrusted<br>code -- then if the untrusted code crashes the interpreter, the<br>trusted code suffers.
<br><br>But there doesn't seem to be any such thing in your model. Each<br>interpreter is either trusted or untrusted. If the interpreter is<br>trusted, and the code running in it causes it to crash, i assume<br>you would consider that to be the code's "own fault", right?
<br>And if the interpreter is untrusted, and the code running in it<br>causes it to crash, then the code has only harmed itself.<br><br>It seems to me that we need only be concerned about crashing when<br>the crash of an embedded interpreter will bring down its host
<br>application, or there are multiple interpreters embedded at once<br>and one interpreter causes another interpreter to crash.</blockquote><div><br>Right. But being embedded, won't any segfault of an interpreter bring down the embedded application?
<br><br>But you are correct, I am only concerned with preventing a crash of a sandboxed interperter.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> Resource Hiding<br>> =============================<br>[...]<br>> This can be viewed as a passive system for security.<br>[...]<br>> Resource Crippling<br>> =============================<br>> Another approach to security is to provide constant, proactive security
<br>> checking of rights to use a resource.<br><br>I think you have this backwards. Resource hiding is proactive:<br>before untrusted code has a chance to abuse anything, you decide<br>what you want to allow it to do. It defaults to no access, and
<br>only gets access to resources you have proactively decided to provide.</blockquote><div><br>I am using "proactive" as in constantly checking the security model.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Resource crippling is the opposite: it begins by giving carte blanche<br>to the untrusted code, then you run around trying to plug holes<br>by stopping everything you don't want. This is a lot more work,<br>and it is also much more dangerous. If you forget to plug even
<br>one hole, you're hosed.</blockquote><div><br>Yeah, I know, which is why I am only bothering with 'file' and 'socket'.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Back to what you wrote about resource hiding:<br><br>> This can be viewed as a passive system for security. Once a resource<br>> has been given to code there are no more checks to make sure the<br>> security model is being violated.
<br><br>This last sentence doesn't make any sense. If you decided to give<br>the resource, how is using the resource a violation? Either you<br>want to enable the resource or you don't. If you want to enable<br>it, give it; if you don't, don't give it. As a criticism of the
<br>resource hiding approach, it's a red herring -- there's no way<br>to interpret this sentence that doesn't make it also an<br>unfalsifiable criticism of any possible security model.</blockquote><div><br>Yeah, I figured that out after I wrote this.
<br><br>> The most common implementation of resource hiding is capabilities.<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> In this type of system a resource's reference acts as a ticket that
<br>> represents the right to use the resource. Once code has a reference<br>> it is considered to have full use of that resource it represents and<br>> no further security checks are performed.<br><br>Same thing. What "further security checks" are we worried about?
<br>Woult it check to see whether we've authorized the interpreter to<br>have access to the resource ... which we already know to be true?<br><br>> To allow customizable restrictions one can pass references to wrappers of
<br>> resources. This allows one to provide custom security to resources instead of<br>> requiring an all-or-nothing approach.<br><br>The ability to customize security restrictions is an important<br>advantage of the resource hiding approach, since resource crippling
<br>requires that the architect of the security model anticipate every<br>possible security restriction that future programmers might need.<br><br>Using resource crippling is analogous to removing "def" from the
<br>language and requiring Python programmers to only use functions<br>that are provided in the built-in modules instead of writing their<br>own functions.<br><br>> To use an analogy, imagine you are providing security for your home.
<br>> With capabilities, security came from not having any way to know<br>> where your house is without being told where it was; a reference<br>> to its location. You might be able to ask a guard (e.g., Java's<br>
> ClassLoader) for a map, but if they refuse there is no way for you<br>> to guess its location without being told. But once you knew where<br>> it was, you had complete use of the house.<br><br>This analogy is only fair if you compare it to the same analogy for
<br>the resource crippling approach. Resource crippling doesn't get you<br>any finer-grained control either! The comparison story is:<br><br> With resource crippling, security comes from having a guard<br> at the door to your house. When a Python interpreter comes
<br> up to the door, the guard checks to see if the interpreter<br> has permission to enter the house, and if it does, then it<br> gets complete use of the house.<br><br>Why is the granularity of control described as the whole house
<br>in the resource-hiding story, but as each door in the house in<br>the resource-crippling story?</blockquote><div><br>Because, as you said above, if you want someone to have the resource (the house, or in more concrete terms, a 'file') you just give it to them. If you cripple it, though, you might provide a 'file' object but restrict how many bytes are written.
<br><br>But I also realize that resource hiding handles this by providing a wrapper that provides the protection.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> And that complete access is an issue with a capability system.<br>> If someone played a little loose with a reference for a resource<br>> then you run the risk of it getting out.<br><br>Could you be more specific about what you mean by "it getting out"?
</blockquote><div><br>Out of a trusted interpreter and ending up in a sandboxed interpreter some how.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
If you mean getting from a trusted interpreter to an untrusted<br>interpreter -- then how is a resource going to travel between<br>interpreters?</blockquote><div><br>Beats me, but I am always scared of Armin and Samuele. =)
<br></div><br>It seems that your criticisms are aimed at resource crippling being a "plug holes as needed but if you foul up you are screwed" with resource hiding being more "fix the fundamental issues and just don't present access to resources you don't want to give access to (or wrap accordingly)". And in general I agree with this assessment. But I also realize that Python was not designed for security in mind and there seems to be new ways to get access to 'file'. If I felt confident that I could find and hide 'file' as needed, I would go that route immediately. But I don't think I can (and Armin has said this as well).
<br><br>If you think you can help figure out every place a reference to 'file' can be found through the standard interpreter, then fine, let's go that way. I just don't have faith this can be done effectively.<br><br>-Brett
<br><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Or if not, then are you thinking of a situation in which one<br>piece of code is trusted with the resource, but another piece of
<br>code is not, and both are running in the same interpreter?<br></blockquote><div> </div><br></div>