
On Thu, Mar 19, 2009 at 4:12 PM, Mark Seaborn <mrs@mythic-beasts.com> wrote:
Guido van Rossum <guido@python.org> wrote:
On Thu, Mar 12, 2009 at 1:24 PM, Mark Seaborn <mrs@mythic-beasts.com> wrote:
Suppose we have an object x with a private attribute, "_field", defined by a class Foo:
class Foo(object):
def __init__(self): self._field = "secret"
x = Foo()
Can you add some principals to this example? Who wrote the Foo class definition? Does CapPython have access to the source code for Foo? To the class object?
OK, suppose we have two principals, Alice and Bob. Alice receives a string from Bob. Alice instantiates the string using CapPython's safe_eval() function, getting back a module object that contains a function object. Alice passes the function an object x. Alice's intention is that the function should not be able to get hold of the contents of x._field, no matter what string Bob supplies.
To make this more concrete, this is what Alice executes, with source_from_bob defined in a string literal for the sake of example:
source_from_bob = """ class C: def f(self): return self._field def entry_point(x): C.f(x) # potentially gets the secret object in Python 3.0 """
import safeeval
secret = object()
class Foo(object): def __init__(self): self._field = secret
x = Foo() module = safeeval.safe_eval(source_from_bob, safeeval.Environment()) module.entry_point(x)
In this example, Bob's code is not given access to the class object Foo. Furthermore, Bob should not be able to get access to the class Foo from the instance x. The type() builtin is not considered to be safe in CapPython so it is not included in the default environment.
Bob's code is not given access to the source code for class Foo. But even if Bob is aware of Alice's source code, it should not affect whether Bob can get hold of the secret object.
OK, I think I understand all this, except I don't have much of an idea of what subset of the language Bob is allowed to used.
By the way, you can try out the example by getting the code from the Bazaar repository: bzr branch http://bazaar.launchpad.net/%7Emrs/cappython/trunk cappython
If you don't mind I will try to avoid downloading your source a little longer.
However, in Python 3.0, the CapPython code can do this:
class C(object):
def f(self): return self._field
C.f(x) # returns "secret"
Whereas in Python 2.x, C.f(x) would raise a TypeError, because C.f is not being called on an instance of C.
In Python 2.x I could write
class C(Foo): def f(self): return self._field
In the example above, Bob's code is not given access to Foo, so Bob cannot do this. But you are right, if Bob's code were passed Foo as well as x, Bob could do this.
Suppose Alice wanted to give Bob access to class Foo, perhaps so that Bob could create derived classes. It is still possible for Alice to do that safely, if Alice defines Foo differently. Alice can pass the secret object to Foo's constructor instead of having the class definition get its reference to the secret object from an enclosing scope:
class Foo(object):
def __init__(self, arg): self._field = arg
secret = object() x = Foo(secret) module = safeeval.safe_eval(source_from_bob, safeeval.Environment()) module.entry_point(x, Foo)
Bob can create his own objects derived from Foo, but cannot use his access to Foo to break encapsulation of instance x. Foo is now authorityless, in the sense that it does not capture "secret" from its enclosing environment, unlike the previous definition.
or alternatively
class C(x.__class__): <same f as before>
The verifier would reject x.__class__, so this is not possible.
Guido said, "I don't understand where the function object f gets its magic powers".
The answer is that function definitions directly inside class statements are treated specially by the verifier.
Hm, this sounds like a major change in language semantics, and if I were Sun I'd sue you for using the name "Python" in your product. :-)
Damn, the makers of Typed Lambda Calculus had better watch out for legal action from the makers of Lambda Calculus(tm) too... :-) Is it really a major change in semantics if it's just a subset? ;-)
Well yes. The empty subset is also a subset. :-) More seriously, IIUC you are disallowing all use of attribute names starting with underscores, which not only invalidates most Python code in practical use (though you might not care about that) but also disallows the use of many features that are considered part of the language, such as access to __dict__ and many other introspective attributes.
To some extent the verifier's check of only accessing private attributes through self is just checking a coding style that I already follow when writing Python code (except sometimes for writing test cases).
You might wish this to be true, but for most Python programmers, it isn't. Introspection is a commonly-used part of the language (probably more so than in Java). So is the use of attribute names starting with a single underscore outside the class tree, e.g. by "friend" functions.
Of course some of the verifier's checks, such as only allowing attribute assignments through self, are a lot more draconian than coding style checks.
That also sounds like a rather serious hindrance to writing Python as most people think of it.
If you wrote the same function definition at the top level:
def f(var): return var._field # rejected
the attribute access would be rejected by the verifier, because "var" is not a self variable, and private attributes may only be accessed through self variables.
I renamed the variable in the example,
What do you mean by this?
I just mean that I applied alpha conversion.
BTW that's a new term for me. :-)
def f(self): return self._field
is equivalent to
def f(var): return var._field
This equivalence is good.
Whether these function definitions are accepted by the verifier depends on their context.
But this isn't. Are you saying that the verifier accepts the use of self._foo in a method? That would make the scenario of potentially passing a class defined by Alice into Bob's code much harder to verify -- now suddenly Alice has to know about a lot of things before she can be sure that she doesn't leave open a backdoor for Bob.
Do you also catch things like
g = getattr s = 'field'.replace('f', '_f')
print g(x, s)
?
The default environment doesn't provide the real getattr() function. It provides a wrapped version that rejects private attribute names.
Do you have a web page describing the precise list of limitations you apply in your "subset" of Python? Does it support import of some form? -- --Guido van Rossum (home page: http://www.python.org/~guido/)