On Sun, Mar 29, 2009 at 4:57 PM, Mark Seaborn email@example.com wrote:
As a side note, it is interesting to compare CapPython to ECMAScript 3.1's strict mode, which, as I understand it, changes the semantics of ECMAScript's attribute access such that doing X.A when X does not have an attribute A raises an exception rather than returning undefined.
Fortunately CapPython does not have to make this kind of semantic change.
Well of course it makes a much more severe semantic change by declaring illegal all use of attribute names starting with underscore.
I guess if you wanted to override getattr on a per-module basis you could give each module a separate __builtins__.
However, CPython's restricted execution mode (which Tav is proposing to resurrect) does change the semantics of attribute access.
It does not change the general semantics of attribute access -- it only takes away a small set of *specific* attributes (e.g. __code__ and func_code) from a small set of *specific* object types (e.g. function objects). This is because every object has the ability to override getting attributes (via __getattribute__ in Python, or tp_getattro in C).
It's not yet clear to me how this works, and how it applies to the getattr function. I suspect it involves looking up the stack.
No, it does not look at the stack. It looks at the globals, which contain a special magic entry __builtins__ (with an 's') which is the dict where built-in functions are looked up. When this dict is the same object as the *default* built-in dict (which is __builtin__.__dict__ where __builtin__ -- without 's' -- is the module defining the built-in functions), it gives you supervisor privileges; if it is any other object, it disallows access to those specific attributes I referred to above.
I really recommend that you study the CPython implementation. Without understanding it you stand a chance of creating a secure subset.
The getattr() function and the x.y notation both invoke the same implementation (PyObject_GetAttr()). This in turn defers to the tp_getattro slot of the object x. And if the object is implemented in Python, this in turn defers to the object's __getattribute__ method. Then object.__getattribute__ defines the default lookup code, which searches into the object's __dict__ if there is one, then in the class's __dict__ and walking the MRO, and finally (just before raising AttributeError) calls the __getattr__ hook if it exists (don't confuse the latter with __getattribute__).
Guido van Rossum firstname.lastname@example.org wrote:
More seriously, IIUC you are disallowing all use of attribute names starting with underscores, which not only invalidates most Python code in practical use (though you might not care about that) but also disallows the use of many features that are considered part of the language, such as access to __dict__ and many other introspective attributes.
This is true. I'm not claiming that a lot of Python code will pass the verifier. It might not accept all idiomatic code; I'm just claiming that code using encapsulated objects under CapPython can still be idiomatic.
For some definition of idiomatic. There are a lot of well-known Python idioms involving attribute names starting with underscore.
(I hate to question your Python proficiency, but I do have to wonder -- how much Python have you written in your life? Where did you learn Python?)
We could probably allow reading self.__dict__ safely in CapPython.
Though that's not enough -- peeking in other.__dict__ is also somewhat common.
The term "introspection" covers a lot of language features. Some are OK in an object-capability language and some are not.
Agreed. And many introspection features aren't that important or commonly used. But some others are, and this includes using __dict__ and __class__.
For example, some might consider dir() to be an introspective feature,
and this function is fine if suitably wrapped.
You'd have to look at the C implementation to see what it might do though.
x.__class__.__name__ is a common idiom. Although we can't allow x.__class__ on its own, we could provide a get_class_name function and rewrite "x.__class__.__name__" to "get_class_name(x)".
"type(x) is C" is another common idiom.
Though in most cases isinstance(x, C) is preferred.
Again, CapPython doesn't provide type() but it can provide a type_is() function: def type_is(x, t): return type(x) is t
And slowly we slide down the path of writing less and less idiomatic Python...
The "locals" builtin is not something CapPython can allow in general. Any function that can look up the stack in this way is potentially dangerous. But it might be OK to allow "locals()", i.e. the case where "locals" is called as a function and not used as a first class value. I would prefer not to have to do that though.
Using locals() isn't that idiomatic anyway, so this is probably fine. It's mostly used by beginners who are still exploring the extreme end of the language's dynamism. :-)
To some extent the verifier's check of only accessing private attributes through self is just checking a coding style that I already follow when writing Python code (except sometimes for writing test cases).
You might wish this to be true, but for most Python programmers, it isn't. Introspection is a commonly-used part of the language (probably more so than in Java). So is the use of attribute names starting with a single underscore outside the class tree, e.g. by "friend" functions.
The friend function pattern is an example of something that CapPython could support, with some extra notation in order to make it explicit. It is a case of what is known as rights amplification in capability systems.
Here's an example of how I envisage it would work in CapPython:
class C(object): def _get_foo(self): return self._foo _get_foo = C._get_foo
Although C._get_foo would normally be rejected, the verifier would allow reading C._get_foo immediately after the class definition as a special case. The resulting _get_foo function would only be able to operate on instances of C (assuming the presence of unbound methods in the language).
I'm not sure how useful this is -- friends aren't necessarily in the same module as the class, otherwise they might as well be declared as static methods.
Of course some of the verifier's checks, such as only allowing attribute assignments through self, are a lot more draconian than coding style checks.
That also sounds like a rather serious hindrance to writing Python as most people think of it.
Attribute assignment is something that we could handle by rewriting. For example,
x.y = z
could be rewritten to
x's class definition would have to declare that attribute y is assignable. The problem with attribute assignment in Python as it stands is that it is opt-out. Attributes can be made read-only (by using "property" or defining __setattr__), but this is not the default.
This will encourage people to write "Java in Python" which is an unfortunately common anti-pattern.
Whether these function definitions are accepted by the verifier depends on their context.
But this isn't.
Are you saying that the verifier accepts the use of self._foo in a method?
That would make the scenario of potentially passing a class defined by Alice into Bob's code much harder to verify -- now suddenly Alice has to know about a lot of things before she can be sure that she doesn't leave open a backdoor for Bob.
In most cases Alice would not want Bob to extend classes that she has defined, so she would not give Bob access to the unwrapped class objects. She would just give Bob the constructor.
Or perhaps, better, a factory function, right?
If Alice wants to be sure that she does that, she can add a decorator to all her class definitions:
def constructor_only(klass): def wrapper(*args, **kwargs): return klass(*args, **kwargs) return wrapper
@constructor_only class C(object): ...
Clever. It does meant that even the class body of C cannot refer to C-the-class, which prevents certain idioms (mostly involving updating class variables -- perhaps not all that common).
(However, this assumes that class decorators are available, and CapPython does not support Python 2.6 yet.)
Well you can always do this manually:
class C(object): ... C = constructor_only(C)
The default environment doesn't provide the real getattr() function. It provides a wrapped version that rejects private attribute names.
Do you have a web page describing the precise list of limitations you apply in your "subset" of Python?
I started some wiki pages to explain the verifier rules and which builtins are allowed, blocked or wrapped: http://plash.beasts.org/wiki/CapPython/VerifierRules http://plash.beasts.org/wiki/CapPython/Builtins I hope that will make things clearer.
Ok, I'll try to remember to look there before responding next time.
Does it support import of some form?
Yes, it supports import: http://lackingrhoticity.blogspot.com/2008/09/dealing-with-modules-and-builti...
The safeeval module allows callers to provide their own __import__ function when evalling code.
Ok. Have you done a security contest like Tav did yet? Implementing import correctly *and* safely is fiendishly difficult.