It's apparent that I didn't explain capabilities clearly enough. Also I misunderstood something about rexec in general and ZipFile in particular. Once we succeed at understanding each other, I'll then inquire whether you agree with my Big Word Proofs. (I, Zooko, wrote lines prepended with "> > ".) Guido wrote:
So in the "separate policy language" way of life, access to the ZipFile class gives you the ability to open files anywhere in the filesystem. The ZipFile class therefore has the "dangerous" flag set, and when you run code that you think might misuse this feature, you set the "can't use dangerous things" flag on that code.
But that's not how rexec works. In the rexec world, the zipfile module has no special privileges; when it is imported by untrusted code, it is reloaded from disk as if it were untrusted itself. The zipfile.ZipFile class is a client of "open", an implementation of which is provided to the untrusted code by the trusted code.
<Zooko reads the zipfile module docs.> How is the implementation of "open" provided by the trusted code to the untrusted code? Is it possible to provide a different "open" implementation to different "instances" of the zipfile module? (I think not, as there is no such thing as "a different instance of a module", but perhaps you could have two rexec "workspaces" each of which has a zipfile module with a different "open"?)
In this scheme, there are no flags, and when you run code that you think might misuse this feature, you simply don't give that code a reference to the ZipFile class. (Also, we have to arrange that it can't acquire a reference by "import zipfile".)
The rexec world solves this very nicely IMO. Can't the capability world do it the same way? The only difference might be that 'open' would have to be a capability.
I don't understand exactly how rexec works yet, but so far it sounds like capabilities. Here's a two sentence definition of capabilities: Authority originates in C code (in the interpreter or C extension modules), and is passed from thing to thing. A given thing "X" -- an instance of ZipFile, for example -- has the authority to use a given authority -- to invoke the real open(), for example -- if and only if some thing "Y" previously held both the "open()" authority and the "authority to extend authorities to X" authority, and chose to extend the "open()" authority to X. That rule could be enforced with the rexec system, right? Here is a graphical representation of this rule. (Taken from [1].) http://www.erights.org/elib/capability/ode/images/fundamental.gif In the diagram, the authority is "Carol", the thing that started with the authority is "Alice", and Alice is in the process of extending to Bob the authority to use Carol. This act -- the extending of authority from Alice to Bob -- is the only way that Bob can gain authority, and it can only happen if Alice has both the authority to use Carol and the authority to extend authorities to Bob. Those two sentences above (and equivalently the graph) completely define capabilities, in the abstract. They don't say how they are implemented. A particular implementation that I find deeply appealing is to make "has a reference to 'open'" be the determiner of whether a thing has the authority to use "open", and to make "has a reference to X" be the determiner of whether a thing has the authority to extend authorities to X. That's "unifying designation with authority", and that's what the E language does.
But I think "this code can't use ZipFile" is the wrong thing to say. You should only have to say "this code can't write files" (or something more specific).
I agree. I incorrectly inferred from previous messages that the current problem under discussion was allowing or denying access to the ZipFile class. But whatever resource we wish to control access to, these same techniques will apply.
In a system where designation is not unified with authority, you tell this untrusted code "I want you to do this action X.", and then you also have to go update the policy specification to say that the code in question is allowed to do the action X.
Sorry, you've lost me here. Which part is the "designation" (new word for me) and which part is the "authority"?
Sorry. First let me point out that the issue of unifying designation with authority is separable from "the capability access control rule" described above. The two have good synergy, but aren't identical. By "designation" I meant "naming". For example... Let's see, I think I'll go back to my toy tictactoe example from [2]. In the tictactoe example, you have to specify which wxWindow the tictactoe game object should draw into. This is "designation" -- you pass a reference, which designates which specific window you are talking about. If you use the principle of unifying designation and authority, then this same act -- passing a reference to this particular wxWindows object -- conveys both the identification of which window to draw into and the authority to draw into it. # access control system with unified designation and authority game = TicTacToeGame() game.display(wxPython.wxWindow()) If you have separate designation and authority, then the same code has to look something like this: # access control system with separate designation and authority game = TicTacToeGame() window = wxPython.wxWindow() def policy(subject, resource, operation): if (subject is game) and (resource is window) and \ (operation == "invoke methods of"): return True return False rexec.register_policy_hook(policy) game.display(window) This is what I call "say it twice if you really mean it". Hm. Reviewing the rexec docs, I being to suspect that the "access control system with unified designation and authority" *is* how Python does access control in restricted mode, and that rexec itself is just to manage module import and certain dangerous builtins.
It really sounds to me like at least one of our fundamental (?) differences is the autonomicity of code units. I think of code (at least Python code) as a passive set of instructions that has no inherent authority but derives authority from the built-ins passed to it; you seem to describe code as having inherent authority.
I definitely don't intend for code to have inherent authority (other than the Trusted Code Base -- the interpreter -- which can't help but have it). The word "thing" in my two-sentence definition (a white circle in the diagram) are "computational things that can have state and behavior". (This includes Python objects, closures, stack frames, etc... In another context I would call them "objects", but Python uses the word "object" for something more specific -- an instance of a class.)
This would be effectively the "virtualization" of access control. I regard it as a kind of holy Grail for internet computing.
How practical is this dream? How useful?
Let's revisit the issue once we understand one another's access control schemes. ;-) Regards, Zooko [1] http://www.erights.org/elib/capability/ode/overview.html [2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html http://zooko.com/ ^-- under re-construction: some new stuff, some broken links
[Zooko]
It's apparent that I didn't explain capabilities clearly enough. Also I misunderstood something about rexec in general and ZipFile in particular. Once we succeed at understanding each other, I'll then inquire whether you agree with my Big Word Proofs.
It's apparent that you don't understand rexec enough; I'll try to explain.
(I, Zooko, wrote lines prepended with "> > ".)
Guido wrote:
So in the "separate policy language" way of life, access to the ZipFile class gives you the ability to open files anywhere in the filesystem. The ZipFile class therefore has the "dangerous" flag set, and when you run code that you think might misuse this feature, you set the "can't use dangerous things" flag on that code.
But that's not how rexec works. In the rexec world, the zipfile module has no special privileges; when it is imported by untrusted code, it is reloaded from disk as if it were untrusted itself. The zipfile.ZipFile class is a client of "open", an implementation of which is provided to the untrusted code by the trusted code.
<Zooko reads the zipfile module docs.>
How is the implementation of "open" provided by the trusted code to the untrusted code? Is it possible to provide a different "open" implementation to different "instances" of the zipfile module? (I think not, as there is no such thing as "a different instance of a module", but perhaps you could have two rexec "workspaces" each of which has a zipfile module with a different "open"?)
To the contrary, it is very easy to provide code with a different version of open(). E.g.: # this executes as trusted code def my_open(...): "open() variant that only allows reading" my_builtins = {"len": len, "open": my_open, "range": range, ...} namespace = {"__builtins__": my_builtins} exec "..." in namespace The final exec executes the untrusted code string "..." in a restricted environment where the built-in 'open' refers to my_open. Because import statements are also treated this way (they call the builtin function __import__), the same applies for import. IOW, namespace["__builtins__"] acts as the set of "root capabilities" given to the untrusted code.
In this scheme, there are no flags, and when you run code that you think might misuse this feature, you simply don't give that code a reference to the ZipFile class. (Also, we have to arrange that it can't acquire a reference by "import zipfile".)
The rexec world solves this very nicely IMO. Can't the capability world do it the same way? The only difference might be that 'open' would have to be a capability.
I don't understand exactly how rexec works yet, but so far it sounds like capabilities.
Yes. That may be why the demand for capabilities has been met with resistance: to quote the French in "Monty Python and the Holy Grail", "we already got one!" :-)
Here's a two sentence definition of capabilities:
I've heard too many of these. They are all too abstract.
Authority originates in C code (in the interpreter or C extension modules), and is passed from thing to thing.
This part I like.
A given thing "X" -- an instance of ZipFile, for example -- has the authority to use a given authority -- to invoke the real open(), for example -- if and only if some thing "Y" previously held both the "open()" authority and the "authority to extend authorities to X" authority, and chose to extend the "open()" authority to X.
But the instance of ZipFile is not really a protection domain. Methods on the instance may have different authority.
That rule could be enforced with the rexec system, right?
Yes, except that there are currently design bugs (starting in Python 2.2) that open holes; see Samuele Pedroni's posts here.
Here is a graphical representation of this rule. (Taken from [1].)
http://www.erights.org/elib/capability/ode/images/fundamental.gif
In the diagram, the authority is "Carol", the thing that started with the authority is "Alice", and Alice is in the process of extending to Bob the authority to use Carol. This act -- the extending of authority from Alice to Bob -- is the only way that Bob can gain authority, and it can only happen if Alice has both the authority to use Carol and the authority to extend authorities to Bob.
Sure. The question is, what exactly are Alice, Bob and Carol? I claim that they are not specific class instances but they are each a "workspace" as I tried to explain before. A workspace is more or less the contents of a particular "sys.modules" dictionary.
Those two sentences above (and equivalently the graph) completely define capabilities, in the abstract. They don't say how they are implemented. A particular implementation that I find deeply appealing is to make "has a reference to 'open'" be the determiner of whether a thing has the authority to use "open", and to make "has a reference to X" be the determiner of whether a thing has the authority to extend authorities to X. That's "unifying designation with authority", and that's what the E language does.
Yes. And then "has a reference to 'open'" is bootstrapped by sticking (some variant of) 'open' in the __builtin__ module of a particular "workspace". (Note that workspace is a term I'm inventing here, you won't find it in the Python literature.)
But I think "this code can't use ZipFile" is the wrong thing to say. You should only have to say "this code can't write files" (or something more specific).
I agree. I incorrectly inferred from previous messages that the current problem under discussion was allowing or denying access to the ZipFile class. But whatever resource we wish to control access to, these same techniques will apply.
In a system where designation is not unified with authority, you tell this untrusted code "I want you to do this action X.", and then you also have to go update the policy specification to say that the code in question is allowed to do the action X.
Sorry, you've lost me here. Which part is the "designation" (new word for me) and which part is the "authority"?
Sorry. First let me point out that the issue of unifying designation with authority is separable from "the capability access control rule" described above. The two have good synergy, but aren't identical.
By "designation" I meant "naming". For example... Let's see, I think I'll go back to my toy tictactoe example from [2].
In the tictactoe example, you have to specify which wxWindow the tictactoe game object should draw into. This is "designation" -- you pass a reference, which designates which specific window you are talking about. If you use the principle of unifying designation and authority, then this same act -- passing a reference to this particular wxWindows object -- conveys both the identification of which window to draw into and the authority to draw into it.
# access control system with unified designation and authority game = TicTacToeGame() game.display(wxPython.wxWindow())
If you have separate designation and authority, then the same code has to look something like this:
# access control system with separate designation and authority game = TicTacToeGame() window = wxPython.wxWindow() def policy(subject, resource, operation): if (subject is game) and (resource is window) and \ (operation == "invoke methods of"): return True return False rexec.register_policy_hook(policy) game.display(window)
This is what I call "say it twice if you really mean it".
Hm. Reviewing the rexec docs, I being to suspect that the "access control system with unified designation and authority" *is* how Python does access control in restricted mode, and that rexec itself is just to manage module import and certain dangerous builtins.
Yes.
It really sounds to me like at least one of our fundamental (?) differences is the autonomicity of code units. I think of code (at least Python code) as a passive set of instructions that has no inherent authority but derives authority from the built-ins passed to it; you seem to describe code as having inherent authority.
I definitely don't intend for code to have inherent authority (other than the Trusted Code Base -- the interpreter -- which can't help but have it). The word "thing" in my two-sentence definition (a white circle in the diagram) are "computational things that can have state and behavior". (This includes Python objects, closures, stack frames, etc... In another context I would call them "objects", but Python uses the word "object" for something more specific -- an instance of a class.)
This would be effectively the "virtualization" of access control. I regard it as a kind of holy Grail for internet computing.
How practical is this dream? How useful?
Let's revisit the issue once we understand one another's access control schemes. ;-)
Regards,
Zooko
[1] http://www.erights.org/elib/capability/ode/overview.html [2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html
I propose to continue this in a week; I'm leaving for Python UK right now and expect to have scarce connectivity there if at all. Back Sunday night. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido van Rossum]
I propose to continue this in a week; I'm leaving for Python UK right now and expect to have scarce connectivity there if at all. Back Sunday night.
Which means it will get summarized in *three* separate summaries. This thread will never die!!! I am going to become a capabilities expert whether I want to or not. =) -Brett
Guido van Rossum wrote:
How is the implementation of "open" provided by the trusted code to the untrusted code? Is it possible to provide a different "open" implementation to different "instances" of the zipfile module? (I think not, as there is no such thing as "a different instance of a module", but perhaps you could have two rexec "workspaces" each of which has a zipfile module with a different "open"?)
To the contrary, it is very easy to provide code with a different version of open(). E.g.:
# this executes as trusted code def my_open(...): "open() variant that only allows reading" my_builtins = {"len": len, "open": my_open, "range": range, ...} namespace = {"__builtins__": my_builtins} exec "..." in namespace
That's fair enough, but why is it better for the "protection domain" to be an invoked "workspace" instead of an object? Think of it from a software engineering point of view: you're proposing that the right way to manage security is to override more-or-less global variables. Zooko is proposing that you pass the capabilities each method needs to that method. i.e. standard structured programming. Let's say that untrusted code wants access to the socket module. The surrounding code wants to tame it to prevent socket connections to certain IP addresses. I think that in the rexec model, the surrounding application would have to go in and poke "safe" versions of the constructor into the module. Or they would have to disallow access to the module altogether and provide an object that tamed module appropriately. The first approach is kind of error prone. The second approach requires the untrusted code to use a model of programming that is very different than "standard Python." If we imagined a Python with capabilities were built in deeply, the socket module would be designed to be tamed. By default it would have no authority at all except that which is passed in. The authority to contact the outside world would be separate from all of the other useful stuff in the socket module and socket class. I'm not necessarily advocating this kind of a change to the Python library... Paul Prescod
participants (4)
-
Brett Cannon
-
Guido van Rossum
-
Paul Prescod
-
Zooko