Hi.
[Mark Hammond]
> The point isn't about my suffering as such. The point is more that
> python-dev owns a tiny amount of the code out there, and I don't believe we
> should put Python's users through this.
>
> Sure - I would be happy to "upgrade" all the win32all code, no problem. I
> am also happy to live in the bleeding edge and take some pain that will
> cause.
>
> The issue is simply the user base, and giving Python a reputation of not
> being able to painlessly upgrade even dot revisions.
I agree with all this.
[As I imagined explicit syntax did not catch up and would require
lot of discussions.]
[GvR]
> > Another way is to use special rules
> > (similar to those for class defs), e.g. having
> >
> > <frag>
> > y=3
> > def f():
> > exec "y=2"
> > def g():
> > return y
> > return g()
> >
> > print f()
> > </frag>
> >
> > # print 3.
> >
> > Is that confusing for users? maybe they will more naturally expect 2
> > as outcome (given nested scopes).
>
> This seems the best compromise to me. It will lead to the least
> broken code, because this is the behavior that we had before nested
> scopes! It is also quite easy to implement given the current
> implementation, I believe.
>
> Maybe we could introduce a warning rather than an error for this
> situation though, because even if this behavior is clearly documented,
> it will still be confusing to some, so it is better if we outlaw it in
> some future version.
>
Yes this can be easy to implement but more confusing situations can arise:
<frag>
y=3
def f():
y=9
exec "y=2"
def g():
return y
return y,g()
print f()
</frag>
What should this print? the situation leads not to a canonical solution
as class def scopes.
or
<frag>
def f():
from foo import *
def g():
return y
return g()
print f()
</frag>
[Mark Hammond]
> > This probably won't be a very popular suggestion, but how about pulling
> > nested scopes (I assume they are at the root of the problem)
> > until this can be solved cleanly?
>
> Agreed. While I think nested scopes are kinda cool, I have lived without
> them, and really without missing them, for years. At the moment the cure
> appears worse then the symptoms in at least a few cases. If nothing else,
> it compromises the elegant simplicity of Python that drew me here in the
> first place!
>
> Assuming that people really _do_ want this feature, IMO the bar should be
> raised so there are _zero_ backward compatibility issues.
I don't say anything about pulling nested scopes (I don't think my opinion
can change things in this respect)
but I should insist that without explicit syntax IMO raising the bar
has a too high impl cost (both performance and complexity) or creates
confusion.
[Andrew Kuchling]
> >Assuming that people really _do_ want this feature, IMO the bar should be
> >raised so there are _zero_ backward compatibility issues.
>
> Even at the cost of additional implementation complexity? At the cost
> of having to learn "scopes are nested, unless you do these two things
> in which case they're not"?
>
> Let's not waffle. If nested scopes are worth doing, they're worth
> breaking code. Either leave exec and from..import illegal, or back
> out nested scopes, or think of some better solution, but let's not
> introduce complicated backward compatibility hacks.
IMO breaking code would be ok if we issue warnings today and implement
nested scopes issuing errors tomorrow. But this is simply a statement
about principles and raised impression.
IMO import * in an inner scope should end up being an error,
not sure about 'exec's.
We will need a final BDFL statement.
regards, Samuele Pedroni.
> *Guido van Rossum * guido(a)python.org <mailto:guido%40python.org>
> //
>
>Tuples are for heterogeneous data, list are for homogeneous data.
>
Only if you include *both* null cases:
- tuple of type( i ) == type( i+1 )
- list of PyObject
Homo-/heterogeneity is orthogonal to the primary benefits of lists
(mutability) and of tuples (fixed order/length).
Else why can you do list( (1, "two", 3.0) ) and tuple( [x, y, z] ) ?
>Tuples are *not* read-only lists.
>
It just happens that "tuple( sequence )" is the most easy & obvious (and
thus right?) way to spell "immutable sequence".
Stop reading whenever you're convinced. ;-) (not about mutability,
but about homo/heterogeneity)
There are three (mostly) independent characteristics of tuples (in most
to least important order, by frequency of use, IMO):
- fixed order/fixed length - used in function argument/return tuples and
all uses as a "struct"
- heterogeneity allowed but not required - used in many function
argument tuples and many "struct" tuples
- immutability - implies fixed-order and fixed-length, and used
occasionally for specific needs
The important characteristics of lists are also independent of each
other (again, IMO on the order):
- mutability of length & content - used for dynamically building collections
- heterogeneity allowed but not required - used occasionally for
specific needs
It turns out that fixed-length sequences are often useful for
heterogeneous data, and that most sequences that require mutability are
homogeneous.
Examples from the standard library (found by grep '= (' and grep '= \[' ):
# homogeneous tuple - homogeneity, fixed order, and fixed length are
all required
# CVS says Guido wrote/imported this. ;-)
whrandom.py: self._seed = (x or 1, y or 1, z or 1)
# homogeneous tuple - homogeneity is required - all entries must be
'types'
# suitable for passing to 'isinstance( A, typesTuple )', which
(needlessly?) requires a tuple to avoid
# possibly recursive general sequences
types.py: StringTypes = (StringType, UnicodeType)
# heterogeneous list of values of all basic types (we need to be
able to copy all types of values)
# this could be a tuple, but neither immutability, nor fixed length,
nor fixed order are needed, so it makes more sense as a list
# CVS blames Guido here, too, in version 1.1. ;-)
copy.py: l = [None, 1, 2L, 3.14, 'xyzzy', (1, 2L), [3.14, 'abc'],
{'abc': 'ABC'}, (), [], {}]
Other homogeneous tuples (may benefit from mutability, but require
fixed-length/order):
- 3D coordinates
- RGB color
- binary tree node (child, next)
Other heterogeneous lists (homogeneous lists of base-class instances
blah-blah-blah):
- files AND directories to traverse (strings? "File" objects?)
- emails AND faxes AND voicemails AND tasks in your Inbox (items?)
- mail AND newsgroup accounts (accounts?)
- return values OR exceptions from a list of test cases and test suites
(PyObjects? introduce an artificial base class?)
Must-be-stubborn-if-you-got-this-far-ly y'rs ;-)
kb
Guido writes:
> IMO, xrange() must die.
Glad to hear it. I always found range() vs xrange() a wart.
But if you had it do do over, how would you do it?
-- Michael Chermside
We seem to have added tzset() gimmicks to CVS Python.
test_time now fails on Windows, simply because time.tzset raises
AttributeError there.
Now Windows does support tzset(), but not TZ values of the form
test_time.test_tzset() is testing, like
environ['TZ'] = 'US/Eastern'
and
environ['TZ'] = 'Australia/Melbourne'
The rub here is that I haven't found *any* tzset man pages on the web that
claim TZ accepts such values (other than to silently ignore them because
they're not in a recognized format). The POSIX defn is typical:
http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html
and search down for TZ. There's no way to read that as supporting the
values we're testing.
Anyone have a clue?
not-all-pits-should-be-dived-into-ly y'rs - tim
[Moving a discussion about capabilities to where it arguably belongs]
[Ben Laurie]
> The point about capabilities is that mere possession of a capability is
> all that is required to exercise it. If you start adding security
> checkers to them, then you don't have capabilities anymore. But the
> point is somewhat deeper that than - given capabilities, you can
> implement proxies without requiring any more infrastructure - you can
> also implement security schemes that don't really correspond to any kind
> of security checking at all (ok, you can probably find some convoluted
> way to achieve the same effect, but I'll bet it comes down to having
> tokens that correspond to proxies, and security checkers that allow you
> to proceed if you have the appropriate token - in other words,
> capabilities, but very hard to use).
>
> So, it seems to me, its simpler and more powerful to start with
> capabilities and build proxies on top of them (or whatever alternate
> scheme you want to build).
>
> Once more, my apologies for not just getting straight to the point.
>
> BTW, if you would like to explain why you don't think bound methods are
> the way to go on python-dev, I'd love to hear it.
It seems to e a matter of convenience. Often objects have many
methods to which you want to provide access as a group. E.g. I might
have a service configuration registry object. The object behaves
roughly like a dictionary. A certain user may be given read-only
access to the registry. Using capabilities, I would have to hand her
a bunch of capabilities for various methods: __getitem__, has_key,
get, keys, items, values, and many more. Using proxies I can simply
give her a read-only proxy for the object. So proxies are more
powerful.
Before you start saying that we should use capabilities as the more
fundamental mechanism and build proxies on top of that: as you point
out, we already have an equivalent more fundamental mechanism, bound
methods, which is equivalent to capabilities. It's just that raw
capabilities aren't very usable, so one way or another we've got to
build something on top of that.
--Guido van Rossum (home page: http://www.python.org/~guido/)
Guido wrote:
>
> I understand how class ZipFile could exercise authority in a
> rexec-based world, if the zipfile module was trusted code. But I
> thought that a capability view of the world doesn't distinguish
> between trusted and untrusted code. I guess I need to understand
> better what kind of "barriers" the capability way of life *does* use.
I think you are on track with regard to the deeper question you are grappling
with. Almost all dangerous things come ultimately from C code. (I can think of
one danger that can come from pure Python code: it can provide an illicit
communications channel between other objects.)
So in the "separate policy language" way of life, access to the ZipFile class
gives you the ability to open files anywhere in the filesystem. The ZipFile
class therefore has the "dangerous" flag set, and when you run code that you
think might misuse this feature, you set the "can't use dangerous things" flag
on that code.
In the capability way of life, it is still the case that access to the ZipFile
class gives you the ability to open files anywhere in the system! (That is: I'm
assuming for now that we implement capabilities without re-writing every
dangerous class in the Library.) In this scheme, there are no flags, and when
you run code that you think might misuse this feature, you simply don't give
that code a reference to the ZipFile class. (Also, we have to arrange that it
can't acquire a reference by "import zipfile".)
So far the two approaches have the same effect, and the difference, for better
or for worse, is that the policy of "this code can't use ZipFile" is encoded in
Python reference-management code in the latter and encoded in a pair of flags in
the former.
Now, we might want to allow certain code to use something else dangerous (such
as the socket module) while simultaneously disallowing it from using ZipFile.
As we add N more dangerous modules, and M more objects of untrusted code that we
want to control, we have an N*M access control matrix to configure which code
can use which modules. (In an access control matrix, rows are "subjects" --
things that can exercise authority and columns are "resources" -- things that
might require authority when used.)
In a system where designation is not unified with authority, you tell this
untrusted code "I want you to do this action X.", and then you also have to go
update the policy specification to say that the code in question is allowed to
do the action X. This "say it twice if you really mean it" overhead puts a
practical limit on how fine-grained your policies can be, and it adds a source
of accidents that lead to security holes.
So now with a large or fine-grained access control matrix, we see the "unify
designation and authority" maxim really shines, and really matches well with
the Zen of Python.
But there is still another advantage that capabilities offer over other access
control systems. With normal access control (and an extremely diligent and
patient programmer and user) you can in theory achieve the Principle of Least
Privilege -- that the untrusted code runs with the minimal set of authorities
necessary to do its job. However, this is implemented by creating a new
"principal" -- a new row in the access control matrix, setting the access
control bits in each element of that row, and preventing any other code from
setting the bits in that row.
Now, observe that only maximally trusted code -- with "root" authority -- is
allowed to make these kinds of updates to the access control matrix. This means
that all code is divided into two kinds: the kind that can impose
Least-Privilege on code that it invokes (this code has root authority), and the
kind that can be constrained by Least-Privilege when it is invoked (this code
doesn't).
With capabilities there is no such distinction. All code can be constrained to
have access to only the privileges that it requires, and at the same time all
code can constrain other code that it invokes.
This feature, which I call "Higher-Order Principle of Least Privilege" [*]
enables new applications.
For example, using first-order Least-Privilege a web browser which runs
cap-Python "caplets" could extend selective privileges to the caplets, such as
permission to read a certain file, while withholding others, such as permission
to write to that file, or permission to send the contents of the file to a
remote computer.
In addition, if cap-Python supports Higher-Order Least-Privilege, those caplets
could themselves use other caplets ("web services"?) without unnecessarily
exposing their privileges to those sub-caplets.
One could imagine, for example, a web browser written in cap-Python, which runs
inside the first web browser (e.g. Mozilla with a cap-Python plug-in), and uses
cap-Python caplets to extend its (the cap-Python web browser's) functionality.
If people already had the cap-Python plug-in installed in their local Mozilla,
then simply visiting the "cap-python-browser.com" site would be sufficient to
launch the cap-Python web browser.
Of course, this could lead straight to a fully functional desktop, making good
on Marc Andreesen's old threat to turn the browser into the operating system and
the operating system into the device driver.
This would be effectively the "virtualization" of access control. I regard it
as a kind of holy Grail for internet computing.
Regards,
Zooko
[*] I call it that because it is the application of the Principle of Least
Privilege to the implementation of the Principle of Least Privilege. One should
be able to impose least-privilege constraints on the code one uses without
requiring full root privileges oneself!
http://zooko.com/
^-- under re-construction: some new stuff, some broken links
Guido van Rossum <guido(a)python.org> writes:
>> > I think you could subclass the metaclass, override __new__, and delete
>> > the bogus __getstate__ from the type's __dict__. Then you'll get the
>> > default pickling behavior which ignores slots; that should work just
>> > fine in your case. :-)
>>
>> Ooh, that's sneaky! But I can't quite see how it works. The error
>> message I quoted at the top about __getstate__ happens when you try to
>> pickle an instance of the class. If I delete __getstate__ during
>> __new__, it won't be there for pickle to find when I try to do the
>> pickling. What will keep it from inducing the same error?
>
> Just try it. There are many ways to customize pickling, and if
> __getstate__ doesn't exist, pickling is done differently.
Since this doesn't work:
>>> d = type('d', (object,), { '__slots__' : ['foo'] } )
>>> pickle.dumps(d())
I'm still baffled as to why this works:
>>> class mc(type):
... def __new__(self, *args):
... x = type.__new__(self, *args)
... del args[2]['__getstate__']
... return x
...
>>> c = mc('c', (object,), { '__slots__' : ['foo'], '__getstate__' : lambda self: tuple() } )
>>> pickle.dumps(c())
'ccopy_reg\n_reconstructor\np0\n(c__main__\nc\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n.'
especially since:
>>> dir(d) == dir(c)
1
I don't see the logic in the source for object.__reduce__(), so where
is it? OK, I see it in typeobject.c. But now:
>>> c.__getstate__
<unbound method c.<lambda>>
OK, this seems to indicate that my attempt to remove __getstate__ from
the class __dict__ was a failure. That explains why pickling c works,
but not why you suggested that I remove __getstate__ inside of
__new__. Did you mean for me to do something different?
I note that c's __slots__ aren't pickled at all, which I guess was the
point of the __getstate__ requirement:
>>> x = c()
>>> x.foo = 1
>>> pickle.dumps(x) == pickle.dumps(c())
1
Fortunately, in our case the __slots__ are empty so it doesn't matter.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
It's apparent that I didn't explain capabilities clearly enough. Also
I misunderstood something about rexec in general and ZipFile in particular.
Once we succeed at understanding each other, I'll then inquire whether you agree
with my Big Word Proofs.
(I, Zooko, wrote lines prepended with "> > ".)
Guido wrote:
>
> > So in the "separate policy language" way of life, access to the
> > ZipFile class gives you the ability to open files anywhere in the
> > filesystem. The ZipFile class therefore has the "dangerous" flag
> > set, and when you run code that you think might misuse this feature,
> > you set the "can't use dangerous things" flag on that code.
>
> But that's not how rexec works. In the rexec world, the zipfile
> module has no special privileges; when it is imported by untrusted
> code, it is reloaded from disk as if it were untrusted itself. The
> zipfile.ZipFile class is a client of "open", an implementation of
> which is provided to the untrusted code by the trusted code.
<Zooko reads the zipfile module docs.>
How is the implementation of "open" provided by the trusted code to the
untrusted code? Is it possible to provide a different "open" implementation to
different "instances" of the zipfile module? (I think not, as there is no such
thing as "a different instance of a module", but perhaps you could have two
rexec "workspaces" each of which has a zipfile module with a different "open"?)
> > In this scheme, there are no flags, and when you run code
> > that you think might misuse this feature, you simply don't give that
> > code a reference to the ZipFile class. (Also, we have to arrange
> > that it can't acquire a reference by "import zipfile".)
>
> The rexec world solves this very nicely IMO. Can't the capability
> world do it the same way? The only difference might be that 'open'
> would have to be a capability.
I don't understand exactly how rexec works yet, but so far it sounds like
capabilities.
Here's a two sentence definition of capabilities:
Authority originates in C code (in the interpreter or C extension modules), and
is passed from thing to thing. A given thing "X" -- an instance of ZipFile, for
example -- has the authority to use a given authority -- to invoke the real
open(), for example -- if and only if some thing "Y" previously held both the
"open()" authority and the "authority to extend authorities to X" authority, and
chose to extend the "open()" authority to X.
That rule could be enforced with the rexec system, right?
Here is a graphical representation of this rule. (Taken from [1].)
http://www.erights.org/elib/capability/ode/images/fundamental.gif
In the diagram, the authority is "Carol", the thing that started with the
authority is "Alice", and Alice is in the process of extending to Bob the
authority to use Carol. This act -- the extending of authority from Alice to
Bob -- is the only way that Bob can gain authority, and it can only happen if
Alice has both the authority to use Carol and the authority to extend
authorities to Bob.
Those two sentences above (and equivalently the graph) completely define
capabilities, in the abstract. They don't say how they are implemented. A
particular implementation that I find deeply appealing is to make "has a
reference to 'open'" be the determiner of whether a thing has the authority to
use "open", and to make "has a reference to X" be the determiner of whether a
thing has the authority to extend authorities to X. That's "unifying
designation with authority", and that's what the E language does.
> But I think "this code can't use ZipFile" is the wrong thing to say.
> You should only have to say "this code can't write files" (or
> something more specific).
I agree. I incorrectly inferred from previous messages that the current problem
under discussion was allowing or denying access to the ZipFile class. But
whatever resource we wish to control access to, these same techniques will
apply.
> > In a system where designation is not unified with authority, you
> > tell this untrusted code "I want you to do this action X.", and then
> > you also have to go update the policy specification to say that the
> > code in question is allowed to do the action X.
>
> Sorry, you've lost me here. Which part is the "designation" (new word
> for me) and which part is the "authority"?
Sorry. First let me point out that the issue of unifying designation with
authority is separable from "the capability access control rule" described
above. The two have good synergy, but aren't identical.
By "designation" I meant "naming". For example... Let's see, I think I'll go
back to my toy tictactoe example from [2].
In the tictactoe example, you have to specify which wxWindow the tictactoe game
object should draw into. This is "designation" -- you pass a reference, which
designates which specific window you are talking about. If you use the
principle of unifying designation and authority, then this same act -- passing a
reference to this particular wxWindows object -- conveys both the identification
of which window to draw into and the authority to draw into it.
# access control system with unified designation and authority
game = TicTacToeGame()
game.display(wxPython.wxWindow())
If you have separate designation and authority, then the same code has to look
something like this:
# access control system with separate designation and authority
game = TicTacToeGame()
window = wxPython.wxWindow()
def policy(subject, resource, operation):
if (subject is game) and (resource is window) and \
(operation == "invoke methods of"):
return True
return False
rexec.register_policy_hook(policy)
game.display(window)
This is what I call "say it twice if you really mean it".
Hm. Reviewing the rexec docs, I being to suspect that the "access control
system with unified designation and authority" *is* how Python does access
control in restricted mode, and that rexec itself is just to manage module
import and certain dangerous builtins.
> It really sounds to me like at least one of our fundamental (?)
> differences is the autonomicity of code units. I think of code (at
> least Python code) as a passive set of instructions that has no
> inherent authority but derives authority from the built-ins passed to
> it; you seem to describe code as having inherent authority.
I definitely don't intend for code to have inherent authority (other than the
Trusted Code Base -- the interpreter -- which can't help but have it). The
word "thing" in my two-sentence definition (a white circle in the diagram) are
"computational things that can have state and behavior". (This includes Python
objects, closures, stack frames, etc... In another context I would call them
"objects", but Python uses the word "object" for something more specific -- an
instance of a class.)
> > This would be effectively the "virtualization" of access control. I
> > regard it as a kind of holy Grail for internet computing.
>
> How practical is this dream? How useful?
Let's revisit the issue once we understand one another's access control schemes.
;-)
Regards,
Zooko
[1] http://www.erights.org/elib/capability/ode/overview.html
[2] http://mail.python.org/pipermail/python-dev/2003-March/033938.htmlhttp://zooko.com/
^-- under re-construction: some new stuff, some broken links
Since the introduction of the iconv codec there have been numerous
bug reports related to the codec and the lack of cross platform
support for it (ranging from: the codec doesn't compile and the
codec doesn't support standard names for common encodings to
core dumps in the linking phase).
I'd like to question whether the codec is really ready for prime
time yet. Right now it causes people more trouble than it does
any good.
Some examples:
https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=…https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=…https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=…https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=…
The problem doesn't seem to be related to the code implementation
itself, but rather the varying quality of iconv implementations
out there.
OTOH, without some field testing the codec will never get into
shape for prime time, so perhaps it would be better to only
enable it via a configure option or make a failure to compile
the codec as painless as possible.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Software directly from the Source (#1, Mar 31 2003)
>>> Python/Zope Products & Consulting ... http://www.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
Python UK 2003, Oxford: 1 days left
EuroPython 2003, Charleroi, Belgium: 85 days left