[Types-sig] Re: feedback: PyDL RFC 0.4

Paul Prescod paul@prescod.net
Mon, 03 Jan 2000 23:39:45 -0500


Thanks for your feedback. I will need a lot more before we are done this
thing!

Greg Stein wrote:
> 
> Wouldn't these be called "abstract interfaces" or "parameterized
> intefaces"? That seems to be a more standard terminology.

I don't like "parameterized interface" because Sequence(Int) *is*
parameterized. I need to distinguish Sequence(_X) from Sequence(Int).

Abstract is a little better, but we aren't dealing with abstract classes
as they are known in C++ or Java.

> > Typedefs allow us to give names to complete or incomplete interfaces
> > described by interface expressions. Typedefs are an interface
> > expression re-use mechanism.
> 
> typedefs are also used to assign names to things like "Int or String".
> 
> I don't see "Int" as an interface (even though it probably is in *theory*,
> it doesn't seem that way in layman usage).

It makes the spec much easier to read and write if we think of them
uniformly as interfaces. Else we must constantly refer to "interfaces
and thingees like Int and String."

> And I still don't understand the need to specify that *two* files exist.
> Why are we going to look for two files? Isn't one sufficient?

One is where you put your hand-written declarations. The other is where
the interpreter dumps the declarations that it extracts from the Python
file. That way you can use all inline declarations, a separate file or
*both* with no danger of having your hard work overwritten.

> In the above example, we have three interface objects. One is available
> via the name "foo1" in the module-level namespace. One is available as
> "Bar.foo2" (via the class' namespace, and the class is in the module
> namespace). The third, foo2, is only available within the function Baz().

We're making a static type checking system. I don't see what runtime
definition of interfaces in a function scope buys other than confusion.
If we need to have interface decarlations in random contexts then we
should differentiate compile-time available ones with a "decl" keyword
prefix.

> I do not believe there is a need to place the interfaces into a distinct
> namespace. I'd be happy to hear one (besides forward-refs, which can be
> handled by an incomplete interface definition).

A static type checking system exists to precede and constrain dynamism,
not to expand it.

> What does "builtin" mean? That these interfaces are magically predefined
> somewhere and available anywhere?

Yes.

> Note: you probably want to remove the plural from "Modules", "Methods",
> and *Methods.

Fixed.

> What is the "Null" interface? Is that supposed to be None's interface? I
> don't believe that we need a name for None's interface, do we? And why
> introduce a name like "Null"? That doesn't seem very descriptive of the
> interface; something like NoneInterface might be better.

Okay, I'll just use None.

> > Certain interfaces may have only one implementation. These "primitive"
> > types
> > are Int, Long, Float, String, UnboundMethods, BoundMethods, Module,
> > Function
> > and Null. Over time this list may get shorter as the Python
> > implementation is generalized to work mostly by interfaces.
> 
> I don't understand what you're saying here. This paragraph doesn't seem to
> be relevant.

It is crucial to the distinction between implementations and interfaces.
Certain types do not have such a distinction so you cannot just
implement the right attributes and expect it to "work". You cannot make
a new class that Python treats as an Integer. You cannot make a new
class that MFC treats as a window handle. Here's what I say in my
current working version:

> Sometimes there exists code that is only compatible with a single
> implementation of an interface. This is the case when the object's
> actual bit-pattern is more important than its interface. Examples
> include integers, window handles, C pointers and so forth. For this
> reason, every class is considered also an interface. Only instances of
> the class and its subclasses (if any) conform to the interface. These
> are called "implementation specific interfaces."

> > Note: The Python interface graph may not always be a tree. For
> > instance there might someday be a type that is both a mapping and a
> > sequence.
> 
> In the above statement, you're mixing up implementations (which can use
> disjoint interfaces) with the interface hierarchy. Or by "type" are you
> referring to a new interface which combines a couple interfaces?
> 
> Note that I think it is quite valid to state that interfaces must always
> be a tree, although I don't see any reason to avoid multiple-inheritance.

So if we allow multiple inheritance then they will not always be a tree,
right?

In my working draft, "Class" is a sub-interface of both "Interface" and
"Callable".

> >...
> > Interface expression language:
> > ==============================
> 
> These are normally called "type declarators". I would suggest using
> standard terminology here.

We aren't dealing with types. We are dealing with interfaces. And we
aren't dealing with declarators, but with expressions. These expressions
can be used in contexts other than type declarations.

"Neither Holy, nor Roman nor an Empire" - Voltaire

> Just use the "dotted_name" construct here -- that is well-defined by the
> Python grammar already. It also provides for things like
> "os.path.SomeInterface".

The construct is fine for the grammar but it doesn't describe the
semantics.

> Note that interfaces do *not* have to occur in a PyDL module. Leave the
> spec open for a combined syntax -- we shouldn't be required to declare all
> interfaces in separate files.

Interfaces in a Python file are automatically extracted and are thus
available in a PyDL module.

> In other words, the lengths do not have to be equal. A precondition is
> that all union typedecls must be "flattened" to remove other unions. The
> resulting, flattened list must then follow your equivalency algorithm.

Okay, I'll see what I can do about it.

> > 3. parameterize a interface:
> >
> > Array( Int, 50 )
> > Array( length=50, elements=Int )
> >
> > Note that the arguments can be either interface expressions or simple
> > Python expressions. A "simple" Python expression is an expression that
> > does not involve a function call or variable reference.
> 
> I disagree with the notion of expressions for the parameter values. I
> think our only form of parameterization is with typedecl objects. The type
> checker is only going to be dealing with type information -- expression
> values as part of an interface don't make sense at compile time.

Parameters would be made available to implementing classes at runtime. I
see a lot of virtue in numeric bounds, string prefixes and so forth:

typedecl colors as Enum(elements=["Red","Green","Blue"])

> I agree. The return type should also be optional. Note that we can't allow
> just a name (and no type), as that would be ambiguous with just a type
> name.

I like the explicitness of requireing a return type and I harbor hopes
that Python will one day distinguish between NO return type and
something that happens to be able to return None.

> > Note that at this point in time, every Python callable returns
> > something, even if it is None. The return value can be named,
> > merely as documentation:
> >
> > def( Arg1 as Int , ** as {String: Int}) - > ReturnCode as Int
> 
> Ack! ... no, I do not think we should allow names in there. Return values
> are never named and would never be used. Parameters actually have names,
> which the values are bound to. A return value name also introduces a minor
> problem in the grammar (is the name a name for the return value or a type
> name?).

How is the issue different in the return code versus in parameters?

I think that this is a very useful features for IDEs and other
documentation and has zero cost.

> >...
> >  2. Basic attribute interface declarations:
> >
> > decl myint as Int                   # basic
> > decl intarr as Array( Int, 50 )     # parameterized
> > decl intarr2 as Array( size = 40, elements = Int ) # using keyword
> > syntax
> 
> "as" does make sense in this context, but I'd use colons for consistency.

The inconsistency is very minor and I am somewhat uncomfortable with
appearing to begin a suite. I doubt that programmers would even notice
the inconsistency.

> > So this is allowed:
> >
> > class (_X,_Y) spam( A, B ):
> >     decl someInstanceMember as _X
> >     decl someOtherMember as Array( _X, 50 )
> >
> >     ....
> 
> You haven't introduced this syntax before. Is this a class definition? 

Er, yes, but I don't have that syntax in the language anymore. Just
change "class" to "interface"

> > These are NOT allowed:
> >
> > decl someModuleMember(_X) as Array( _X, 50 )
> 
> Reason: modules are not parameterizable.

No, the reason was stated before. Because *attributes* like
someModuleMember cannot be declared to need incomplete interfaces. Only
interfaces can be incomplete.

> However: I think modules should be able to conform to an interface. And
> since an interface can be parameterized, then this means that a module can
> be parameterized. This is analogous to parameterizing a class.
> 
> > class (_Y) spam( A, B ):
> >     decl someInstanceMember(_X) as Array( _X, 50 )
> >
> > Because that would allow you to create a "spam" without getting around
> > to saying what _X is for that spam's someInstanceMember. That would
> > disallow static type checking.
> 
> Agreed. The _X must occur in the class declaration statement.

No, that's another typo. Here's another example and it comes back to the
fact that attributes cannot be incomplete:

interface (_Y) spam( A, B ):
    decl someInstanceMember(_Y) as Array( _Y, 50 ) 

> >...
> > It is possible to allow _X to vary to some extent but still require it
> > to always be a Number:
> >
> > decl Add(_X as Number) as def( a as _X, b as _X )-> _X
> 
> Note that this implies the concept of hierarchy among the interfaces.

Yes, that was also implied by the graph that started with Any. That is
now explicit:

<p>An interface may be derived from (or based upon) another interface
called the base interface using Python inheritance
syntax. Objects directly supporting a derived interface are said to 
indirectly support the base interface and its base interfaces 
all of the way up to the most basic interface, Any.

> Note that you will then have to define a
> rule for whether "decl x as Int" is the "same" as "decl x as Number". For
> conformance, is the first too specific, or is it just a more concrete form
> of the latter? (but still allowed)

Well, in general there is no problem specifying a base versus derived
interface. Your choice of specificity. The "Int" is a special case
because it is also an implementation specific interface derived from

> > The syntax for a class definition is identical to that for a function
> > with the keyword "def" replaced by "class".  What we are really
> > defining is the constructor. The signature of the created object can
> > be described in an interface declaration.
> 
> Ick. We don't need anything special for this. The constructor is given by
> the __init__ that occurs in the interface.
> 
> > decl TreeNode(_X) as class(
> >             a as _X,
> >             Right as TreeNode( _X ) or None,
> >             Left as TreeNode( _X ) or None )
> >                 -> ParentClasses, Interfaces
> 
> This would be:
> 
>   class (_X) TreeNode(ParentClasses):
>     __interfaces__ = Interfaces
>     def __init__(self, a: _X,
>                  Right: TreeNode(_X) or None,
>                  Left: TreeNode(_X) or None):
>       ...

I don't want to introduce a new kind of interface declaration for
classes. You should use ordinary interface declarations. There is no
need for a new kind of "class-y" interface declaration and it will
likely be abused so that more code is implementation specific than it
needs to be.

> If you're just trying to create the notion of a factory, then "def" is
> appropriate:
> 
>   decl TreeNode(_X): def(a: _X,
>                          Right: TreeNode(_X) or None,
>                          Left: TreeNode(_X) or None)    \
>                        -> (ParentClasses or Interfaces)

No, we need ot differentiate functions from classes because classes can
be subclassed. Otherwise there is no difference. That's why all we do is
change the keyword.

> These should be assignments and use a unary operator. The operator is much
> more flexible:
> 
>   print_typedecl_object(typedef Int or String)
> 
> Can't do that with a typedef or decl *statement*.

You can't do it in one line, but you can do it. It is of debatable
utility anyhow. The vast majority of the time you want to introspect
interface objects that are *in use* not hard-coded ones. Introspection
is really a secondary consideration anyhow.

> Also note that your BoundedInt example is a *runtime* parameterization.
> The type checker can't do anything about:
> 
>   decl x: PositiveInt
>   x = -1

That's true. I don't see that as a problem.

> But we *can* check something like this:
> 
>   def foo(x: NegativeInt):
>     ...
>   decl y: PositiveInt
>   y = 5
>   foo(y)

I'm curious how you would see your type inferencer knowing whether to
inference 

j=6

as "int", "PositiveInt" or "NegativeInt"

Anyhow, you still don't prevent:

decl y: NegativeInt
y=5
foo(y)

> But this latter case is more along the lines of naming a particular type
> of Int. The syntax could very well be something like:
> 
>   decl PositiveInt: subtype Int
>   decl NegativeInt: subtype Int

No need for the keyword "subtype". Two different int typedefs should
both be usable as ints (I think), but not as each other.

> The type-checker would know that PositiveInt is related somehow to Int
> (and it would have to issue warnings when mixed). 

Argh. More warnings. I do not view it within our purview to require
implementations to issue warnings. We define something as legal or as
illegal. Anything else is between the implementor and the user.

> It would also view
> PositiveInt and NegativeInt as different (thereby creating the capability
> for the warning in the foo(y) example above).
> 
> Anyhow... as I mentioned above, we should only be allowing typedecl
> parameters. We can't type-check value-based parameters.

Not at compile time, but we can provide them to the implementation
object which can check them at runtime.

> If you want to introduce a type name for a runtime type-enforcement (a
> valid concept! such as your PositiveInt thing), then we should allow full
> expressions and other kinds of fun in the parameter expressions (since the
> runtime type should be createable with anything it wants; we've already
> given up all hope on it). But then we get into problems trying to
> distinguish between a type declarator and an expression. For example:
> 
>   MyType = typedef ParamType(0, Int or String)
> 
> In this example, the first is an expression, but the second should be a
> type declarator. Figuring out which is which is tricky for the parser.

Well maybe we need to just make the expression syntax unifiable with
Python syntax. I am not comfortable to say that a type expression will
never occur unadorned in Python code or vice versa.

> I disagree that they will always be extracted into a separate PyDL file.
> As an optimization: sure, we could do this. Effectively like caching a
> module's bytecodes in a .pyc file. But I don't think you should codify
> that here.

It makes the specification easier to write and understand.

> >...
> >     "typesafe":
> >     ===========
> > In addition to decl and typedecl the keyword "typesafe" can be used to
> > indicate that a function or method uses types in such a way that each
> > operation can be checked at compile time and demonstrated not to call
> > any function or operation with the wrong types.
> 
> What about the problem of non-existence? How "safe" is "typesafe"? And how
> is this different from regular type checking?

It is how you *turn on* regular type checking at compile time.

> >...
> > An interface checker's job is to ensure that methods that claim to be
> > typesafe actually are. It must report and refuse to compile modules
> > that misuse the keyword and may not refuse to compile modules that do
> > not.
> 
> That last sentence is awkward. Can you rephrase/split/etc?

Yes.

> Class definitions also have the parameterization syntax change:
> 
>   class (_X) Foo(Super):
>     decl node: _X
>     ...

No, that was a bug in the spec. Classes are declared just like functions
except for the "class" keyword. They behave just like functions except
that they can be subclassed.

> Class and modules should also have a syntax for specifying the
> interface(s) they conform to. 

I think that that is extracted from the class declaration's return type
automatically. We will have to invent something for modules. "moddecl"
or something.

> Don't you mean "list of string", or should you drop the brackets?

Thanks, fixed.

> > __conforms__ : def (obj: Any ) -> boolean
> 
> Just call it "conforms". There is no need to "hide" this method since the
> interface does not expose interface members as its *own* members.

It is __conforms__ for the same reason that __repr__, __cmp__ and
__init__ are hidden: it is actually invoked through the magic "isa"
syntax, not directly. I would be amenable to getting rid of the
underscores Python-wide, but in any case I want to be consistent.

> I think that you would want a version that just checks an objects
> __interfaces__ attribute (quick), and a different method that does an
> exhaustive check of the object's apparent interface against the specified
> interface.

I'll add that as an issue.

> > Experimental syntax:
> > ====================
> >
> > There is a backwards compatible syntax for embedding declarations in a
> > Python 1.5x file:
> >
> > "decl","myint as Integer"
> 
> Just use a single string. The parse tree actually gets even uglier if you
> put that comma in there :-). We can pull the "decl" out just as easily if
> it is the first part of a "decl myint: Integer".

I don't want to get confused with docstrings.

> Why pull them out? Leave them in the file and use them there. No need to
> replicate the stuff to somewhere else (and then try to deal with the
> resulting synchronization issues).

Because we always pull them out to a distinguishable file name, there
are no more synchronization issues than with Python .pyc files.

> I'd say the temporary syntax could be another function call:
> 
>   assert_type(x, "Int or String")

Okay.

> > The runtime should not allow an assignment or function call to violate
> > the declarations in the PyDL file. In an "optimized speed mode" those
> > checks would be disabled. In non-optimized mode, these assignments
> > would generate an IncompatibleAssignmentError.
> 
> This is a difficult requirement for the runtime. I would suggest moving
> this to a V2 requirement.

I don't see that our version numbers and Python's version numbers have
to coincide. If it takes two years for Python to live up to all of the
rules of our spec, so be it. 

> > The runtime should not allow a read from an unassigned attribute. It
> > should raise NotAssignedError if it detects this at runtime instead of
> > at compile time.
> 
> Huh? We already have a definition for this. It raises NameError or
> AttributeError. Please don't redefine this behavior.

It is a different error. NameError's and AttributeError's should be
eliminated through static type checking (where it is used religiously).
NotAssignedError is not always statically detectable. 

Implementing this is also easy since objects have access to their
interface objects.

> >...
> >     Idea: The Undefined Object:
> >     ===========================
> 
> You haven't addressed any of my concerns with this object. Even though
> you've listed it under the "future" section, I think you're still going to
> have some serious [implementation] problems with this concept.

It is about as difficult to handle as pervasive assignment checks and
both will probably be part of a written-from-scratch Python.

 Paul Prescod