Re: [Python-Dev] Capabilities
Guido wrote:
I understand how class ZipFile could exercise authority in a rexec-based world, if the zipfile module was trusted code. But I thought that a capability view of the world doesn't distinguish between trusted and untrusted code. I guess I need to understand better what kind of "barriers" the capability way of life *does* use.
I think you are on track with regard to the deeper question you are grappling with. Almost all dangerous things come ultimately from C code. (I can think of one danger that can come from pure Python code: it can provide an illicit communications channel between other objects.) So in the "separate policy language" way of life, access to the ZipFile class gives you the ability to open files anywhere in the filesystem. The ZipFile class therefore has the "dangerous" flag set, and when you run code that you think might misuse this feature, you set the "can't use dangerous things" flag on that code. In the capability way of life, it is still the case that access to the ZipFile class gives you the ability to open files anywhere in the system! (That is: I'm assuming for now that we implement capabilities without re-writing every dangerous class in the Library.) In this scheme, there are no flags, and when you run code that you think might misuse this feature, you simply don't give that code a reference to the ZipFile class. (Also, we have to arrange that it can't acquire a reference by "import zipfile".) So far the two approaches have the same effect, and the difference, for better or for worse, is that the policy of "this code can't use ZipFile" is encoded in Python reference-management code in the latter and encoded in a pair of flags in the former. Now, we might want to allow certain code to use something else dangerous (such as the socket module) while simultaneously disallowing it from using ZipFile. As we add N more dangerous modules, and M more objects of untrusted code that we want to control, we have an N*M access control matrix to configure which code can use which modules. (In an access control matrix, rows are "subjects" -- things that can exercise authority and columns are "resources" -- things that might require authority when used.) In a system where designation is not unified with authority, you tell this untrusted code "I want you to do this action X.", and then you also have to go update the policy specification to say that the code in question is allowed to do the action X. This "say it twice if you really mean it" overhead puts a practical limit on how fine-grained your policies can be, and it adds a source of accidents that lead to security holes. So now with a large or fine-grained access control matrix, we see the "unify designation and authority" maxim really shines, and really matches well with the Zen of Python. But there is still another advantage that capabilities offer over other access control systems. With normal access control (and an extremely diligent and patient programmer and user) you can in theory achieve the Principle of Least Privilege -- that the untrusted code runs with the minimal set of authorities necessary to do its job. However, this is implemented by creating a new "principal" -- a new row in the access control matrix, setting the access control bits in each element of that row, and preventing any other code from setting the bits in that row. Now, observe that only maximally trusted code -- with "root" authority -- is allowed to make these kinds of updates to the access control matrix. This means that all code is divided into two kinds: the kind that can impose Least-Privilege on code that it invokes (this code has root authority), and the kind that can be constrained by Least-Privilege when it is invoked (this code doesn't). With capabilities there is no such distinction. All code can be constrained to have access to only the privileges that it requires, and at the same time all code can constrain other code that it invokes. This feature, which I call "Higher-Order Principle of Least Privilege" [*] enables new applications. For example, using first-order Least-Privilege a web browser which runs cap-Python "caplets" could extend selective privileges to the caplets, such as permission to read a certain file, while withholding others, such as permission to write to that file, or permission to send the contents of the file to a remote computer. In addition, if cap-Python supports Higher-Order Least-Privilege, those caplets could themselves use other caplets ("web services"?) without unnecessarily exposing their privileges to those sub-caplets. One could imagine, for example, a web browser written in cap-Python, which runs inside the first web browser (e.g. Mozilla with a cap-Python plug-in), and uses cap-Python caplets to extend its (the cap-Python web browser's) functionality. If people already had the cap-Python plug-in installed in their local Mozilla, then simply visiting the "cap-python-browser.com" site would be sufficient to launch the cap-Python web browser. Of course, this could lead straight to a fully functional desktop, making good on Marc Andreesen's old threat to turn the browser into the operating system and the operating system into the device driver. This would be effectively the "virtualization" of access control. I regard it as a kind of holy Grail for internet computing. Regards, Zooko [*] I call it that because it is the application of the Principle of Least Privilege to the implementation of the Principle of Least Privilege. One should be able to impose least-privilege constraints on the code one uses without requiring full root privileges oneself! http://zooko.com/ ^-- under re-construction: some new stuff, some broken links
Guido wrote:
I understand how class ZipFile could exercise authority in a rexec-based world, if the zipfile module was trusted code. But I thought that a capability view of the world doesn't distinguish between trusted and untrusted code. I guess I need to understand better what kind of "barriers" the capability way of life *does* use.
[Zooko]
I think you are on track with regard to the deeper question you are grappling with. Almost all dangerous things come ultimately from C code. (I can think of one danger that can come from pure Python code: it can provide an illicit communications channel between other objects.)
So in the "separate policy language" way of life, access to the ZipFile class gives you the ability to open files anywhere in the filesystem. The ZipFile class therefore has the "dangerous" flag set, and when you run code that you think might misuse this feature, you set the "can't use dangerous things" flag on that code.
But that's not how rexec works. In the rexec world, the zipfile module has no special privileges; when it is imported by untrusted code, it is reloaded from disk as if it were untrusted itself. The zipfile.ZipFile class is a client of "open", an implementation of which is provided to the untrusted code by the trusted code. This implementation does access checking (according to a separate policy language, indeed). So importing Python modules is always safe for untrusted code, because the imported Python code derives its authority from whatever the untrusted user already has. (It's different for C extension modules of course.)
In the capability way of life, it is still the case that access to the ZipFile class gives you the ability to open files anywhere in the system! (That is: I'm assuming for now that we implement capabilities without re-writing every dangerous class in the Library.) In this scheme, there are no flags, and when you run code that you think might misuse this feature, you simply don't give that code a reference to the ZipFile class. (Also, we have to arrange that it can't acquire a reference by "import zipfile".)
The rexec world solves this very nicely IMO. Can't the capability world do it the same way? The only difference might be that 'open' would have to be a capability.
So far the two approaches have the same effect, and the difference, for better or for worse, is that the policy of "this code can't use ZipFile" is encoded in Python reference-management code in the latter and encoded in a pair of flags in the former.
But I think "this code can't use ZipFile" is the wrong thing to say. You should only have to say "this code can't write files" (or something more specific).
Now, we might want to allow certain code to use something else dangerous (such as the socket module) while simultaneously disallowing it from using ZipFile. As we add N more dangerous modules, and M more objects of untrusted code that we want to control, we have an N*M access control matrix to configure which code can use which modules. (In an access control matrix, rows are "subjects" -- things that can exercise authority and columns are "resources" -- things that might require authority when used.)
In the rexec world, modules and classes don't have separate privileges -- the privileges are held by a larger concept, which we might call a "workspace". The rexec world allows many workspaces with different privileges -- but no communication between them.
In a system where designation is not unified with authority, you tell this untrusted code "I want you to do this action X.", and then you also have to go update the policy specification to say that the code in question is allowed to do the action X.
Sorry, you've lost me here. Which part is the "designation" (new word for me) and which part is the "authority"?
This "say it twice if you really mean it" overhead puts a practical limit on how fine-grained your policies can be, and it adds a source of accidents that lead to security holes.
So now with a large or fine-grained access control matrix, we see the "unify designation and authority" maxim really shines, and really matches well with the Zen of Python.
Sorry, this is too abstract for me to see (yet). You are sounding a bit like a used-car salesman here, or "Proof by using Big Words". :-)
But there is still another advantage that capabilities offer over other access control systems. With normal access control (and an extremely diligent and patient programmer and user) you can in theory achieve the Principle of Least Privilege -- that the untrusted code runs with the minimal set of authorities necessary to do its job. However, this is implemented by creating a new "principal" -- a new row in the access control matrix, setting the access control bits in each element of that row, and preventing any other code from setting the bits in that row.
Now, observe that only maximally trusted code -- with "root" authority -- is allowed to make these kinds of updates to the access control matrix. This means that all code is divided into two kinds: the kind that can impose Least-Privilege on code that it invokes (this code has root authority), and the kind that can be constrained by Least-Privilege when it is invoked (this code doesn't).
In the rexec world, it is possible for a restricted workspace (at least in theory -- the rexec module may not be directly usable but something similar could) to create another workspace and selectively pass privileges into that workspace.
With capabilities there is no such distinction. All code can be constrained to have access to only the privileges that it requires, and at the same time all code can constrain other code that it invokes.
This feature, which I call "Higher-Order Principle of Least Privilege" [*] enables new applications.
Sorry, more "Big Words". :-)
For example, using first-order Least-Privilege a web browser which runs cap-Python "caplets" could extend selective privileges to the caplets, such as permission to read a certain file, while withholding others, such as permission to write to that file, or permission to send the contents of the file to a remote computer.
In addition, if cap-Python supports Higher-Order Least-Privilege, those caplets could themselves use other caplets ("web services"?) without unnecessarily exposing their privileges to those sub-caplets.
It really sounds to me like at least one of our fundamental (?) differences is the autonomicity of code units. I think of code (at least Python code) as a passive set of instructions that has no inherent authority but derives authority from the built-ins passed to it; you seem to describe code as having inherent authority.
One could imagine, for example, a web browser written in cap-Python, which runs inside the first web browser (e.g. Mozilla with a cap-Python plug-in), and uses cap-Python caplets to extend its (the cap-Python web browser's) functionality. If people already had the cap-Python plug-in installed in their local Mozilla, then simply visiting the "cap-python-browser.com" site would be sufficient to launch the cap-Python web browser.
Of course, this could lead straight to a fully functional desktop, making good on Marc Andreesen's old threat to turn the browser into the operating system and the operating system into the device driver.
This would be effectively the "virtualization" of access control. I regard it as a kind of holy Grail for internet computing.
How practical is this dream? How useful?
Regards,
Zooko
[*] I call it that because it is the application of the Principle of Least Privilege to the implementation of the Principle of Least Privilege. One should be able to impose least-privilege constraints on the code one uses without requiring full root privileges oneself!
http://zooko.com/ ^-- under re-construction: some new stuff, some broken links
--Guido van Rossum (home page: http://www.python.org/~guido/)
But that's not how rexec works.
It seems to me that the restricted execution mechanism (is there a shorter term for this? calling it rexec is a misnomer, as has been pointed out -- let's call it the REM for now) really is a kind of capability system. The REM works by closing off a bunch of loopholes and then controlling which builtins a piece of code has access to. That code can then pass them on to other code or withhold them. Sounds a lot like capabilities, doesn't it? So the hypothesised "capability python" would be rather like having REM permanently in effect... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Zooko wrote:
In the capability way of life, it is still the case that access to the ZipFile class gives you the ability to open files anywhere in the system! (That is: I'm assuming for now that we implement capabilities without re-writing every dangerous class in the Library.) In this scheme, there are no flags, and when you run code that you think might misuse this feature, you simply don't give that code a reference to the ZipFile class. (Also, we have to arrange that it can't acquire a reference by "import zipfile".)
It would probably be helpful to explain what you (or, at least, I) would do if you (I) were writing from scratch, rather then "taming" the existing libraries. In this case, Zipfile would require a file capability to be passed to it at construction time, and so would become non-dangerous, which is, I think, where Guido is coming from. The risk only occurs because we want to not rewrite the whole library, just to wrap it, and its important to understand that this isn't really the "proper" way to do it (though, of course, the ZipFile class is not unlike any of the other non-capability things we'd have to wrap anyway, given a non-capability OS underneath, it just happens to be one that _can_ be rewritten if we want to rewrite it). Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff
participants (4)
-
Ben Laurie
-
Greg Ewing
-
Guido van Rossum
-
Zooko