[Python-Dev] Capabilities

Guido van Rossum guido@python.org
Mon, 31 Mar 2003 17:43:09 -0500


[Zooko]
> It's apparent that I didn't explain capabilities clearly enough.
> Also I misunderstood something about rexec in general and ZipFile in
> particular.  Once we succeed at understanding each other, I'll then
> inquire whether you agree with my Big Word Proofs.

It's apparent that you don't understand rexec enough; I'll try to
explain.

> (I, Zooko, wrote lines prepended with "> > ".)
> 
>  Guido wrote:
> >
> > > So in the "separate policy language" way of life, access to the
> > > ZipFile class gives you the ability to open files anywhere in the
> > > filesystem.  The ZipFile class therefore has the "dangerous" flag
> > > set, and when you run code that you think might misuse this feature,
> > > you set the "can't use dangerous things" flag on that code.
> > 
> > But that's not how rexec works.  In the rexec world, the zipfile
> > module has no special privileges; when it is imported by untrusted
> > code, it is reloaded from disk as if it were untrusted itself.  The
> > zipfile.ZipFile class is a client of "open", an implementation of
> > which is provided to the untrusted code by the trusted code.
> 
> <Zooko reads the zipfile module docs.>
> 
> How is the implementation of "open" provided by the trusted code to
> the untrusted code?  Is it possible to provide a different "open"
> implementation to different "instances" of the zipfile module?  (I
> think not, as there is no such thing as "a different instance of a
> module", but perhaps you could have two rexec "workspaces" each of
> which has a zipfile module with a different "open"?)

To the contrary, it is very easy to provide code with a different
version of open().  E.g.:

  # this executes as trusted code
  def my_open(...):
    "open() variant that only allows reading"
  my_builtins = {"len": len, "open": my_open, "range": range, ...}
  namespace = {"__builtins__": my_builtins}
  exec "..." in namespace

The final exec executes the untrusted code string "..." in a
restricted environment where the built-in 'open' refers to my_open.
Because import statements are also treated this way (they call the
builtin function __import__), the same applies for import.  IOW,
namespace["__builtins__"] acts as the set of "root capabilities" given
to the untrusted code.

> > > In this scheme, there are no flags, and when you run code that
> > > you think might misuse this feature, you simply don't give that
> > > code a reference to the ZipFile class.  (Also, we have to
> > > arrange that it can't acquire a reference by "import zipfile".)
> > 
> > The rexec world solves this very nicely IMO.  Can't the capability
> > world do it the same way?  The only difference might be that
> > 'open' would have to be a capability.
> 
> I don't understand exactly how rexec works yet, but so far it sounds
> like capabilities.

Yes.  That may be why the demand for capabilities has been met with
resistance: to quote the French in "Monty Python and the Holy Grail",
"we already got one!" :-)

> Here's a two sentence definition of capabilities:

I've heard too many of these.  They are all too abstract.

> Authority originates in C code (in the interpreter or C extension
> modules), and is passed from thing to thing.

This part I like.

> A given thing "X" -- an instance of ZipFile, for example -- has the
> authority to use a given authority -- to invoke the real open(), for
> example -- if and only if some thing "Y" previously held both the
> "open()" authority and the "authority to extend authorities to X"
> authority, and chose to extend the "open()" authority to X.

But the instance of ZipFile is not really a protection domain.
Methods on the instance may have different authority.

> That rule could be enforced with the rexec system, right?

Yes, except that there are currently design bugs (starting in Python
2.2) that open holes; see Samuele Pedroni's posts here.

> Here is a graphical representation of this rule.  (Taken from [1].)
> 
> http://www.erights.org/elib/capability/ode/images/fundamental.gif
> 
> In the diagram, the authority is "Carol", the thing that started
> with the authority is "Alice", and Alice is in the process of
> extending to Bob the authority to use Carol.  This act -- the
> extending of authority from Alice to Bob -- is the only way that Bob
> can gain authority, and it can only happen if Alice has both the
> authority to use Carol and the authority to extend authorities to
> Bob.

Sure.  The question is, what exactly are Alice, Bob and Carol?  I
claim that they are not specific class instances but they are each a
"workspace" as I tried to explain before.  A workspace is more or less
the contents of a particular "sys.modules" dictionary.

> Those two sentences above (and equivalently the graph) completely
> define capabilities, in the abstract.  They don't say how they are
> implemented.  A particular implementation that I find deeply
> appealing is to make "has a reference to 'open'" be the determiner
> of whether a thing has the authority to use "open", and to make "has
> a reference to X" be the determiner of whether a thing has the
> authority to extend authorities to X.  That's "unifying designation
> with authority", and that's what the E language does.

Yes.  And then "has a reference to 'open'" is bootstrapped by sticking
(some variant of) 'open' in the __builtin__ module of a particular
"workspace".  (Note that workspace is a term I'm inventing here, you
won't find it in the Python literature.)

> > But I think "this code can't use ZipFile" is the wrong thing to
> > say.  You should only have to say "this code can't write files"
> > (or something more specific).
> 
> I agree.  I incorrectly inferred from previous messages that the
> current problem under discussion was allowing or denying access to
> the ZipFile class.  But whatever resource we wish to control access
> to, these same techniques will apply.
> 
> > > In a system where designation is not unified with authority, you
> > > tell this untrusted code "I want you to do this action X.", and
> > > then you also have to go update the policy specification to say
> > > that the code in question is allowed to do the action X.
> > 
> > Sorry, you've lost me here.  Which part is the "designation" (new
> > word for me) and which part is the "authority"?
> 
> Sorry.  First let me point out that the issue of unifying
> designation with authority is separable from "the capability access
> control rule" described above.  The two have good synergy, but
> aren't identical.
> 
> By "designation" I meant "naming".  For example...  Let's see, I
> think I'll go back to my toy tictactoe example from [2].
> 
> In the tictactoe example, you have to specify which wxWindow the
> tictactoe game object should draw into.  This is "designation" --
> you pass a reference, which designates which specific window you are
> talking about.  If you use the principle of unifying designation and
> authority, then this same act -- passing a reference to this
> particular wxWindows object -- conveys both the identification of
> which window to draw into and the authority to draw into it.
> 
> # access control system with unified designation and authority
> game = TicTacToeGame()
> game.display(wxPython.wxWindow())
> 
> If you have separate designation and authority, then the same code
> has to look something like this:
> 
> # access control system with separate designation and authority
> game = TicTacToeGame()
> window = wxPython.wxWindow()
> def policy(subject, resource, operation):
>  if (subject is game) and (resource is window) and \
>    (operation == "invoke methods of"):
>   return True
>  return False
> rexec.register_policy_hook(policy)
> game.display(window)
> 
> This is what I call "say it twice if you really mean it".
> 
> Hm.  Reviewing the rexec docs, I being to suspect that the "access
> control system with unified designation and authority" *is* how
> Python does access control in restricted mode, and that rexec itself
> is just to manage module import and certain dangerous builtins.

Yes.

> > It really sounds to me like at least one of our fundamental (?)
> > differences is the autonomicity of code units.  I think of code
> > (at least Python code) as a passive set of instructions that has
> > no inherent authority but derives authority from the built-ins
> > passed to it; you seem to describe code as having inherent
> > authority.
> 
> I definitely don't intend for code to have inherent authority (other
> than the Trusted Code Base -- the interpreter -- which can't help
> but have it).  The word "thing" in my two-sentence definition (a
> white circle in the diagram) are "computational things that can have
> state and behavior".  (This includes Python objects, closures, stack
> frames, etc...  In another context I would call them "objects", but
> Python uses the word "object" for something more specific -- an
> instance of a class.)
> 
> > > This would be effectively the "virtualization" of access control.  I
> > > regard it as a kind of holy Grail for internet computing.
> > 
> > How practical is this dream?  How useful?
> 
> Let's revisit the issue once we understand one another's access
> control schemes.
> ;-)
> 
> Regards,
> 
> Zooko
> 
> [1] http://www.erights.org/elib/capability/ode/overview.html
> [2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html

I propose to continue this in a week; I'm leaving for Python UK right
now and expect to have scarce connectivity there if at all.  Back
Sunday night.

--Guido van Rossum (home page: http://www.python.org/~guido/)