[Python-Dev] Capabilities

Mon, 31 Mar 2003 17:22:41 -0500

It's apparent that I didn't explain capabilities clearly enough.  Also 
I misunderstood something about rexec in general and ZipFile in particular.  
Once we succeed at understanding each other, I'll then inquire whether you agree 
with my Big Word Proofs.

(I, Zooko, wrote lines prepended with "> > ".)

 Guido wrote:
>
> > So in the "separate policy language" way of life, access to the
> > ZipFile class gives you the ability to open files anywhere in the
> > filesystem.  The ZipFile class therefore has the "dangerous" flag
> > set, and when you run code that you think might misuse this feature,
> > you set the "can't use dangerous things" flag on that code.
> 
> But that's not how rexec works.  In the rexec world, the zipfile
> module has no special privileges; when it is imported by untrusted
> code, it is reloaded from disk as if it were untrusted itself.  The
> zipfile.ZipFile class is a client of "open", an implementation of
> which is provided to the untrusted code by the trusted code.

<Zooko reads the zipfile module docs.>

How is the implementation of "open" provided by the trusted code to the 
untrusted code?  Is it possible to provide a different "open" implementation to 
different "instances" of the zipfile module?  (I think not, as there is no such 
thing as "a different instance of a module", but perhaps you could have two 
rexec "workspaces" each of which has a zipfile module with a different "open"?)

> > In this scheme, there are no flags, and when you run code
> > that you think might misuse this feature, you simply don't give that
> > code a reference to the ZipFile class.  (Also, we have to arrange
> > that it can't acquire a reference by "import zipfile".)
> 
> The rexec world solves this very nicely IMO.  Can't the capability
> world do it the same way?  The only difference might be that 'open'
> would have to be a capability.

I don't understand exactly how rexec works yet, but so far it sounds like 
capabilities.

Here's a two sentence definition of capabilities:

Authority originates in C code (in the interpreter or C extension modules), and 
is passed from thing to thing.  A given thing "X" -- an instance of ZipFile, for 
example -- has the authority to use a given authority -- to invoke the real 
open(), for example -- if and only if some thing "Y" previously held both the 
"open()" authority and the "authority to extend authorities to X" authority, and 
chose to extend the "open()" authority to X.  

That rule could be enforced with the rexec system, right?

Here is a graphical representation of this rule.  (Taken from [1].)

http://www.erights.org/elib/capability/ode/images/fundamental.gif

In the diagram, the authority is "Carol", the thing that started with the 
authority is "Alice", and Alice is in the process of extending to Bob the 
authority to use Carol.  This act -- the extending of authority from Alice to 
Bob -- is the only way that Bob can gain authority, and it can only happen if 
Alice has both the authority to use Carol and the authority to extend 
authorities to Bob.

Those two sentences above (and equivalently the graph) completely define 
capabilities, in the abstract.  They don't say how they are implemented.  A 
particular implementation that I find deeply appealing is to make "has a 
reference to 'open'" be the determiner of whether a thing has the authority to 
use "open", and to make "has a reference to X" be the determiner of whether a 
thing has the authority to extend authorities to X.  That's "unifying 
designation with authority", and that's what the E language does.

> But I think "this code can't use ZipFile" is the wrong thing to say.
> You should only have to say "this code can't write files" (or
> something more specific).

I agree.  I incorrectly inferred from previous messages that the current problem 
under discussion was allowing or denying access to the ZipFile class.  But 
whatever resource we wish to control access to, these same techniques will 
apply.

> > In a system where designation is not unified with authority, you
> > tell this untrusted code "I want you to do this action X.", and then
> > you also have to go update the policy specification to say that the
> > code in question is allowed to do the action X.
> 
> Sorry, you've lost me here.  Which part is the "designation" (new word
> for me) and which part is the "authority"?

Sorry.  First let me point out that the issue of unifying designation with 
authority is separable from "the capability access control rule" described 
above.  The two have good synergy, but aren't identical.

By "designation" I meant "naming".  For example...  Let's see, I think I'll go 
back to my toy tictactoe example from [2].

In the tictactoe example, you have to specify which wxWindow the tictactoe game 
object should draw into.  This is "designation" -- you pass a reference, which 
designates which specific window you are talking about.  If you use the 
principle of unifying designation and authority, then this same act -- passing a 
reference to this particular wxWindows object -- conveys both the identification 
of which window to draw into and the authority to draw into it.

# access control system with unified designation and authority
game = TicTacToeGame()
game.display(wxPython.wxWindow())

If you have separate designation and authority, then the same code has to look 
something like this:

# access control system with separate designation and authority
game = TicTacToeGame()
window = wxPython.wxWindow()
def policy(subject, resource, operation):
 if (subject is game) and (resource is window) and \
   (operation == "invoke methods of"):
  return True
 return False
rexec.register_policy_hook(policy)
game.display(window)

This is what I call "say it twice if you really mean it".

Hm.  Reviewing the rexec docs, I being to suspect that the "access control 
system with unified designation and authority" *is* how Python does access 
control in restricted mode, and that rexec itself is just to manage module 
import and certain dangerous builtins.

> It really sounds to me like at least one of our fundamental (?)
> differences is the autonomicity of code units.  I think of code (at
> least Python code) as a passive set of instructions that has no
> inherent authority but derives authority from the built-ins passed to
> it; you seem to describe code as having inherent authority.

I definitely don't intend for code to have inherent authority (other than the 
Trusted Code Base -- the interpreter --  which can't help but have it).  The 
word "thing" in my two-sentence definition (a white circle in the diagram) are 
"computational things that can have state and behavior".  (This includes Python 
objects, closures, stack frames, etc...  In another context I would call them 
"objects", but Python uses the word "object" for something more specific -- an 
instance of a class.)

> > This would be effectively the "virtualization" of access control.  I
> > regard it as a kind of holy Grail for internet computing.
> 
> How practical is this dream?  How useful?

Let's revisit the issue once we understand one another's access control schemes.
;-)

Regards,

Zooko

[1] http://www.erights.org/elib/capability/ode/overview.html
[2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html

http://zooko.com/
         ^-- under re-construction: some new stuff, some broken links