[Python-Dev] Capabilities
Guido van Rossum
guido@python.org
Mon, 31 Mar 2003 17:43:09 -0500
[Zooko]
> It's apparent that I didn't explain capabilities clearly enough.
> Also I misunderstood something about rexec in general and ZipFile in
> particular. Once we succeed at understanding each other, I'll then
> inquire whether you agree with my Big Word Proofs.
It's apparent that you don't understand rexec enough; I'll try to
explain.
> (I, Zooko, wrote lines prepended with "> > ".)
>
> Guido wrote:
> >
> > > So in the "separate policy language" way of life, access to the
> > > ZipFile class gives you the ability to open files anywhere in the
> > > filesystem. The ZipFile class therefore has the "dangerous" flag
> > > set, and when you run code that you think might misuse this feature,
> > > you set the "can't use dangerous things" flag on that code.
> >
> > But that's not how rexec works. In the rexec world, the zipfile
> > module has no special privileges; when it is imported by untrusted
> > code, it is reloaded from disk as if it were untrusted itself. The
> > zipfile.ZipFile class is a client of "open", an implementation of
> > which is provided to the untrusted code by the trusted code.
>
> <Zooko reads the zipfile module docs.>
>
> How is the implementation of "open" provided by the trusted code to
> the untrusted code? Is it possible to provide a different "open"
> implementation to different "instances" of the zipfile module? (I
> think not, as there is no such thing as "a different instance of a
> module", but perhaps you could have two rexec "workspaces" each of
> which has a zipfile module with a different "open"?)
To the contrary, it is very easy to provide code with a different
version of open(). E.g.:
# this executes as trusted code
def my_open(...):
"open() variant that only allows reading"
my_builtins = {"len": len, "open": my_open, "range": range, ...}
namespace = {"__builtins__": my_builtins}
exec "..." in namespace
The final exec executes the untrusted code string "..." in a
restricted environment where the built-in 'open' refers to my_open.
Because import statements are also treated this way (they call the
builtin function __import__), the same applies for import. IOW,
namespace["__builtins__"] acts as the set of "root capabilities" given
to the untrusted code.
> > > In this scheme, there are no flags, and when you run code that
> > > you think might misuse this feature, you simply don't give that
> > > code a reference to the ZipFile class. (Also, we have to
> > > arrange that it can't acquire a reference by "import zipfile".)
> >
> > The rexec world solves this very nicely IMO. Can't the capability
> > world do it the same way? The only difference might be that
> > 'open' would have to be a capability.
>
> I don't understand exactly how rexec works yet, but so far it sounds
> like capabilities.
Yes. That may be why the demand for capabilities has been met with
resistance: to quote the French in "Monty Python and the Holy Grail",
"we already got one!" :-)
> Here's a two sentence definition of capabilities:
I've heard too many of these. They are all too abstract.
> Authority originates in C code (in the interpreter or C extension
> modules), and is passed from thing to thing.
This part I like.
> A given thing "X" -- an instance of ZipFile, for example -- has the
> authority to use a given authority -- to invoke the real open(), for
> example -- if and only if some thing "Y" previously held both the
> "open()" authority and the "authority to extend authorities to X"
> authority, and chose to extend the "open()" authority to X.
But the instance of ZipFile is not really a protection domain.
Methods on the instance may have different authority.
> That rule could be enforced with the rexec system, right?
Yes, except that there are currently design bugs (starting in Python
2.2) that open holes; see Samuele Pedroni's posts here.
> Here is a graphical representation of this rule. (Taken from [1].)
>
> http://www.erights.org/elib/capability/ode/images/fundamental.gif
>
> In the diagram, the authority is "Carol", the thing that started
> with the authority is "Alice", and Alice is in the process of
> extending to Bob the authority to use Carol. This act -- the
> extending of authority from Alice to Bob -- is the only way that Bob
> can gain authority, and it can only happen if Alice has both the
> authority to use Carol and the authority to extend authorities to
> Bob.
Sure. The question is, what exactly are Alice, Bob and Carol? I
claim that they are not specific class instances but they are each a
"workspace" as I tried to explain before. A workspace is more or less
the contents of a particular "sys.modules" dictionary.
> Those two sentences above (and equivalently the graph) completely
> define capabilities, in the abstract. They don't say how they are
> implemented. A particular implementation that I find deeply
> appealing is to make "has a reference to 'open'" be the determiner
> of whether a thing has the authority to use "open", and to make "has
> a reference to X" be the determiner of whether a thing has the
> authority to extend authorities to X. That's "unifying designation
> with authority", and that's what the E language does.
Yes. And then "has a reference to 'open'" is bootstrapped by sticking
(some variant of) 'open' in the __builtin__ module of a particular
"workspace". (Note that workspace is a term I'm inventing here, you
won't find it in the Python literature.)
> > But I think "this code can't use ZipFile" is the wrong thing to
> > say. You should only have to say "this code can't write files"
> > (or something more specific).
>
> I agree. I incorrectly inferred from previous messages that the
> current problem under discussion was allowing or denying access to
> the ZipFile class. But whatever resource we wish to control access
> to, these same techniques will apply.
>
> > > In a system where designation is not unified with authority, you
> > > tell this untrusted code "I want you to do this action X.", and
> > > then you also have to go update the policy specification to say
> > > that the code in question is allowed to do the action X.
> >
> > Sorry, you've lost me here. Which part is the "designation" (new
> > word for me) and which part is the "authority"?
>
> Sorry. First let me point out that the issue of unifying
> designation with authority is separable from "the capability access
> control rule" described above. The two have good synergy, but
> aren't identical.
>
> By "designation" I meant "naming". For example... Let's see, I
> think I'll go back to my toy tictactoe example from [2].
>
> In the tictactoe example, you have to specify which wxWindow the
> tictactoe game object should draw into. This is "designation" --
> you pass a reference, which designates which specific window you are
> talking about. If you use the principle of unifying designation and
> authority, then this same act -- passing a reference to this
> particular wxWindows object -- conveys both the identification of
> which window to draw into and the authority to draw into it.
>
> # access control system with unified designation and authority
> game = TicTacToeGame()
> game.display(wxPython.wxWindow())
>
> If you have separate designation and authority, then the same code
> has to look something like this:
>
> # access control system with separate designation and authority
> game = TicTacToeGame()
> window = wxPython.wxWindow()
> def policy(subject, resource, operation):
> if (subject is game) and (resource is window) and \
> (operation == "invoke methods of"):
> return True
> return False
> rexec.register_policy_hook(policy)
> game.display(window)
>
> This is what I call "say it twice if you really mean it".
>
> Hm. Reviewing the rexec docs, I being to suspect that the "access
> control system with unified designation and authority" *is* how
> Python does access control in restricted mode, and that rexec itself
> is just to manage module import and certain dangerous builtins.
Yes.
> > It really sounds to me like at least one of our fundamental (?)
> > differences is the autonomicity of code units. I think of code
> > (at least Python code) as a passive set of instructions that has
> > no inherent authority but derives authority from the built-ins
> > passed to it; you seem to describe code as having inherent
> > authority.
>
> I definitely don't intend for code to have inherent authority (other
> than the Trusted Code Base -- the interpreter -- which can't help
> but have it). The word "thing" in my two-sentence definition (a
> white circle in the diagram) are "computational things that can have
> state and behavior". (This includes Python objects, closures, stack
> frames, etc... In another context I would call them "objects", but
> Python uses the word "object" for something more specific -- an
> instance of a class.)
>
> > > This would be effectively the "virtualization" of access control. I
> > > regard it as a kind of holy Grail for internet computing.
> >
> > How practical is this dream? How useful?
>
> Let's revisit the issue once we understand one another's access
> control schemes.
> ;-)
>
> Regards,
>
> Zooko
>
> [1] http://www.erights.org/elib/capability/ode/overview.html
> [2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html
I propose to continue this in a week; I'm leaving for Python UK right
now and expect to have scarce connectivity there if at all. Back
Sunday night.
--Guido van Rossum (home page: http://www.python.org/~guido/)