[Types-sig] A late entry

John Viega John@list.org
Thu, 16 Mar 2000 11:24:19 -0800


On Wed, Mar 15, 2000 at 09:26:28PM -0500, scott wrote:
> On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote:
> 
> neat paper.  any other references to throw at us?

Tons, more than you would want to read.  I think the most interesting
for you would be:

Jens Palsberg and Michael Schwartzbach.  Object-Oriented Type Systems.
John Wiley and Sons, 1994.  ISBN 0-471-941288

You can also look at some more of the papers I've given students in
the past, some of which are downloadable from:
http://www.list.org/~viega/cs655/

In particular, Unit 6 papers.  The Cardelli and Wegner
paper is worthwhile.  The Milner paper isn't there, and it's pretty
dense anyway.  You'll be interested in Day et. al. (which is a
reasonable starting place for constrained genericity).

Out of the unit 8 papers, I'd probably recommend only Agesen's beyond
what I've already given you.  Abadi and Cardelli's is really dense and
not all that interesting/applicable.  Castagna's is there to see if
students can find the big problems with the work... 

> When we talk about adding runtime checks, it's a little unclear to me
> exactly what is meant.  Are you referring to additional runtime checks
> that the existing dynamic type system in python does not provide?  Is
> leveraging the existing dynamic type system in this way feasible for
> these sorts of checks?  If this is possible, it is one approach I'd
> prefer -- there's no performance hit, and nothing that isn't checked.
> The only drawback I can see to using the existing system as a fallback
> is that it would limit the degree of optimization that is available,
> but I believe that's OK.

Any time you add a dynamic type check that wasn't there before, there
is a performance hit.  I'm just saying that we should try hard to
minimize the number of dynamic checks, period.

> It seems like you are refering to inferencing as a mechanism which can
> both allow the user to denote fewer types and as a means of dealing
> with the mixing of unchecked code with checked code.  

Honestly, without some type annotations, you're not likely to get very
far on the later there.

> With regards to
> meating the first goal, a very limited kind of inferencing is
> possible, where the first assignment to a variable from an expression
> of a given type has the same affect as declaring the variable as that
> type in the first place.  I think that the former goal of reducing the
> number of declarations the programmer must make is attainable with a
> mechanism like this, but the latter goal would require a real
> inferencing algo.

I think it's silly to do a 1/2 assed job with an inference
algorithm... I'd rather not have one at all.  People don't want to
memorize a rule beyond "if the type is ambiguous you must declare it
explicitly".  Honestly, type inferencing has its problems too.  For
example, code can infer a more general type than intended, etc.
However, minimizing the effort of the programmer definitely seems more
pythonesque.

> > A couple of brief thoughts at this point... What should the type
> > checker do with infered types?  Place them directly into the code?
> > Place them in comments or a doc string so that they have no effect but
> > you can see what the checker did and double-check its work?
> 
> maybe just print out inferred types if requested and the type checker
> is a separate program?

I dunno, I always thought it'd be cool if it spit out another copy of
the code that's fully annotated, improving documentation and keeping
the amount of work down.  I actually prefer to see all type
information, myself...


> > I think it would be best to be able to say, "the in type is the out
> > type, and who cares about the type beyond that".  Parametric
> > polymorphism to the rescue... the type could be <a>-><a>, where <a>
> > indicates that x can be of any type.  I think the <a> syntax could be
> > better (I prefer ML's, which types identity as 'a->'a, but I hear
> > there's some resistance to it in this community... I'm going to use 'a
> > from now on because it looks more natural to me).  One problem here is
> > that it probably isn't going to be possible to infer an accurate
> > principal polymorphic type for things in all cases (see above).  I'll
> > have to give the worst cases some thought.
> 
> The stab I have taken at writing a checker allows this kind of
> polymorphism for functions.  It uses ~ instead of '.  Using a prefix
> character is something I prefer as well, though I think there are much
> more important issues to ponder, so I get on with it :)

Use the tick, as it's widely accepted... assume the emacs mode
problems can be fixed :)

I take it you're not trying to infer principal types or anything
complex yet...

> > Another problem that falls out at this point is how to write the type
> > of x when we've seen x call "foo".  Do we look at all classes across
> > the application for ones with a foo method?  And do we care if that
> > solutions precluds classes that dynamically add their "foo" method
> > from applying?  Ick.  I'd prefer to avoid the concrete, and infer
> > something like:
> > 
> > << foo: None->'a >>
> > 
> > Which would read: an object with a field foo of type function taking
> > no arguments and returning anything.
> 
> Do you think it's ok to require that the programmer declare something
> about x in this case?  If something like what you suggest is inferred
> there, it seems like a class of errors might slip through that we
> might not want to allow to slip.  


That's true with type inferencing, period.  I think if we're going to
have it, we should stick to regular rules instead of special casing
stuff like this.  I don't think that it's going to end up being a huge
source of problems anyway.

> 
> contstrained polymorphism seems to solve lots of more complex static
> type issues in clean ways.  Do you know of any references on
> implementations for constrained polymorphism?  I'd like to look more
> closely at how it's done elsewhere before taking a crack at it myself.

An okay place to start is with the Day et al. paper on the website I
gave you above.

> I like the idea of discouraging OR'ing of types as much as possible.
> There are two cases where I think it could come in very handy:  first,
> if the static type of None is considered a distinct type rather
> something like a null type, then there are lots of things that return
> or produce type-of(None) OR something else.  Examples are dict.get and
> default arguments.   After wrestling with this for a while, I've come
> to think that the introduction of a null-type in a static type system
> is more manageable for the programmer.  Do you see other ways of
> dealing with that or how would you prefer that those things are
> handled?  The other case is recursive data types such as trees, where
> a node can contain either other nodes or leaves.

I disagree with you here.  

For your first case, "None type": Many languages have void
single-valued types without any need for OR-ing types.  Remember, a
type specifies the universe of possible values.  For all object types,
None is a valid value in that set of values.  The void type is the set
that only contains the value None.  All object types are subtypes of
the void type, and you get all the benefits of subtyping polymorphism.
I don't see any problem of the sort you're talking about here, at all.

For your second case, modling a tree where a node contains either
other nodes or leaves: There are far better ways to model the problem.
First, most trees don't have nodes without values, but let's ignore
that for a minute, and assume otherwise.  The natural way to model
this problem is with subtyping polymorphism, not with the OR-ing of
types:

class NodeBase:  # Theoretically abstract.
  def print_tree(self): pass

class NonLeafNode(NodeBase):
  left :- NodeBase
  right :- NodeBase
  def print_tree(self):
    left.print_tree()
    right.print_tree()

typedef printable << print()-> None >>
class LeafNode< T -> printable >(NodeBase):
  value :- T
  def print_tree(self):
    value.print()

"T -> printable" Should read something like "any type T that is
printable" (constrained genericity).

I still assert that OR-ing types should *not* be in a python type
system.  You're basically saying, "here are things that should require
a runtime cast, but we're going to completely ignore that statically
and dynamically".

When are dynamic checks necessary?  Generally, you're trying to do
something that can be written as an assignment.  Since types are
essentially sets, the LHS has to be a subset of the RHS in order for
us to make the determination that an assignment will always yield an
object of a legal type.  If the LHS and RHS are disjoint (well, if
None is the only shared value w/ object types), that should never be
possible.  If there is some overlap, then a dynamic cast is required.
I'd *really* like to see it be the case that the only times that "foo"
is in the same set of values as 12 is when parametric polymorphism is
involved, and with the any type.

I am aware that not allowing OR'd types makes things a bit harder for
legacy code that people want to change to use the type system (such as
the standard library).  The one place where it's a big issue is with
heterogenous lists.  However, I think it really reduces the power of a
type system to allow a list to be typed (string|integer|file), etc.


> > 
> > decl z :- integer
> > typecase a:
> >   case (x :- integer, y :- integer) : z = x + y
> >   case (x :- integer,)              : z = x
> >   case x :- integer                 : z = x
> > 
> > That's not quite as bad.
> 
> With regards to type casing possibly causing the language to slow down
> by bringing the static type information into runtime, do you think it
> would be reasonable to allow typecasing only on types that are easily
> expressible in terms of the existing dynamic type system?  It seems to
> me that this approach would save a lot of work, limit the runtime
> overhead, and discourage OR'ing all at once. 

Well, any time you have to dynamically cast there's going to be a
performance hit.  I'm not really worried about matching, though.  You
can do it fairly efficiently, plus it satisfies the principal of
localized cost... the feature costs the programmer nothing unless he
uses it.  Also, I think that our most important goal here is to design
the right type system for python, not to pick one that is very easy to
implement...



> 
> Have you read about expressing the above with "mytype"?  that is:
> 
> interface if_LinkedList:
> 	def add(self, l :- mytype):
> 		...
> 		
> class LinkedList(if_LinkedList):
> 	def add(self, l):
> 		...
> 
> class BiDirectionalLinkedList(LinkedList):
> 	...
> 
> the syntax is fairly simple, and 'mytype' just means the type of
> instances of the class that implements the method.

Well, first of all, let me say that covariance provides a much more
elegent solution to the problem.  Type variables are not as easy for
the average programmer to understand.  If type variables are done
right, then they'd basically duplicate language features like
genericity.  No one wants to have non-orthogonal constructs, so we'd
probably remove genericity, which would result in a type system that's
more difficult to explain and use, plus not nearly as well understood.

Another problem with using type variables to solve this problem is
that it requires the programmer to anticipate how people will want to
use their classes upfront.  If you don't happen to use a type variable
the first time you specify a parameter, derived classes cannot change
the variance of the parameters without going back and modifying the
original code.  Plus, you guys haven't talked about any type variables
except "mytype", which is not powerful enough to handle uses of
covariance where the argument to a method is of some type other than
the type for which the method is a member.

> > It should be possible to implement this solution, and even do so
> > incrementally.  There's another (better, less pessimistic) solution
> > called the global validity approach.  The problem with it, IIRC, is
> > that the algorithm basically assumes that type checking goes on in a
> > "closed world" environment, where you're checking the entire system at
> > once.  That probably isn't desirable.  I wonder if there haven't been
> > refinements to this algorithm that allow it to work incrementally.
> > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
> > rule, assuming that the catcall rule can actually work for Python.
> 
> One possible approach to covariance of method parameters is to check
> each method of class against all the possible different types of
> 'self'.  This is what "stick" does, and it finds exactly the cases
> where there are type errors.  It does require more checking than a
> general rule, and it does add complications to the problem of mixing
> checked modules and unchecked ones accross class inheritence, but it
> does work.  I'd be interested in any feedback on this approach you
> have...

To get it right, you would essentially be doing the same thing that
the global validity approach does.  In particular, you have the exact
same problems in that a "closed world" assumption is required.
Incremental checking is far more useful, and I think that the
polymorphic catcall rule is simple enough (though not if you call it
"polymorphic catcall" when you report an error to the end user!).

John