[Types-sig] A late entry

Thu, 16 Mar 2000 16:06:12 -0500

On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote:
> On Wed, Mar 15, 2000 at 09:26:28PM -0500, scott wrote:
> > On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote:
> > 
> > neat paper.  any other references to throw at us?
> 
> Tons, more than you would want to read.  I think the most interesting
> for you would be:
> 
> Jens Palsberg and Michael Schwartzbach.  Object-Oriented Type Systems.
> John Wiley and Sons, 1994.  ISBN 0-471-941288
> 
> You can also look at some more of the papers I've given students in
> the past, some of which are downloadable from:
> http://www.list.org/~viega/cs655/
> 
> In particular, Unit 6 papers.  The Cardelli and Wegner
> paper is worthwhile.  The Milner paper isn't there, and it's pretty
> dense anyway.  You'll be interested in Day et. al. (which is a
> reasonable starting place for constrained genericity).
> 
> Out of the unit 8 papers, I'd probably recommend only Agesen's beyond
> what I've already given you.  Abadi and Cardelli's is really dense and
> not all that interesting/applicable.  Castagna's is there to see if
> students can find the big problems with the work... 

I guess I need to set aside some time to read :)

> 
> > When we talk about adding runtime checks, it's a little unclear to me
> > exactly what is meant.  Are you referring to additional runtime checks
> > that the existing dynamic type system in python does not provide?  Is
> > leveraging the existing dynamic type system in this way feasible for
> > these sorts of checks?  If this is possible, it is one approach I'd
> > prefer -- there's no performance hit, and nothing that isn't checked.
> > The only drawback I can see to using the existing system as a fallback
> > is that it would limit the degree of optimization that is available,
> > but I believe that's OK.
> 
> Any time you add a dynamic type check that wasn't there before, there
> is a performance hit.  I'm just saying that we should try hard to
> minimize the number of dynamic checks, period.

I don't get the feeling I was clear enough about what I'm saying re:
dynamic checks, so I'll try again.  Minimizing run time checks is
something I very much agree with.  I also agree that some are likely
to be necessary.  I think that where those checks are necessary, it
seems to make sense to leverage python's existing type system to
implement them, because that type system is already in place and there
would be no need for python objects as they exist in C or java or
whatever to carry around any additional information.

For example, if we add a field to the list structure in C Python that
contains the set of types contained in the list, then every time a del
somelist[x] occured, the extra information would have to be updated by
potentially checking the entire list.  If there was a way to make
runtime checks work reasonably without this kind of extra weight, it
seems worth pursueing to me.  One way that seems feasible is to
leverage the existing type system in python.  

For example, if we provided a hierarchical interface to the existing
type system, and that hierarchy were mirrored in the static system,
then dynamic checks or casts could be limited to those expressible in
the hierarchy based on python's existing type system (where you can
compare lists and tuples but not lists of ints and lists of strings).

I hope that clarifies the idea...

> 
> > It seems like you are refering to inferencing as a mechanism which can
> > both allow the user to denote fewer types and as a means of dealing
> > with the mixing of unchecked code with checked code.  
> 
> Honestly, without some type annotations, you're not likely to get very
> far on the later there.

Of course :)  I didn't mean to imply otherwise at all.

> 
> > With regards to
> > meating the first goal, a very limited kind of inferencing is
> > possible, where the first assignment to a variable from an expression
> > of a given type has the same affect as declaring the variable as that
> > type in the first place.  I think that the former goal of reducing the
> > number of declarations the programmer must make is attainable with a
> > mechanism like this, but the latter goal would require a real
> > inferencing algo.
> 
> I think it's silly to do a 1/2 assed job with an inference
> algorithm... I'd rather not have one at all.  People don't want to
> memorize a rule beyond "if the type is ambiguous you must declare it
> explicitly".  Honestly, type inferencing has its problems too.  For
> example, code can infer a more general type than intended, etc.
> However, minimizing the effort of the programmer definitely seems more
> pythonesque.

In my own experience, I've never seen a type inferencer that succeeded
at only requiring type annotations where they would otherwise be
ambiguous, that includes ML.  I spent more time guessing what the type
inferencer considered ambiguous than anything else.  The rule of
thumb for where annotations are required would be easier for me to get
if it didn't involve potentially vague notions like deciding what is
and is not ambiguous.  That's just me, though.

[...]
> 
> I take it you're not trying to infer principal types or anything
> complex yet...

No, deduction would be a more accurate term.

> 
> > > Another problem that falls out at this point is how to write the type
> > > of x when we've seen x call "foo".  Do we look at all classes across
> > > the application for ones with a foo method?  And do we care if that
> > > solutions precluds classes that dynamically add their "foo" method
> > > from applying?  Ick.  I'd prefer to avoid the concrete, and infer
> > > something like:
> > > 
> > > << foo: None->'a >>
> > > 
> > > Which would read: an object with a field foo of type function taking
> > > no arguments and returning anything.
> > 
> > Do you think it's ok to require that the programmer declare something
> > about x in this case?  If something like what you suggest is inferred
> > there, it seems like a class of errors might slip through that we
> > might not want to allow to slip.  
> 
> 
> That's true with type inferencing, period.  I think if we're going to
> have it, we should stick to regular rules instead of special casing
> stuff like this.  I don't think that it's going to end up being a huge
> source of problems anyway.

One of things expressed earlier was the idea of leaving the
implementation of inferencing until after the type system and checker
were done.  Does that order of events seem reasonable to you?

> > I like the idea of discouraging OR'ing of types as much as possible.
> > There are two cases where I think it could come in very handy:  first,
> > if the static type of None is considered a distinct type rather
> > something like a null type, then there are lots of things that return
> > or produce type-of(None) OR something else.  Examples are dict.get and
> > default arguments.   After wrestling with this for a while, I've come
> > to think that the introduction of a null-type in a static type system
> > is more manageable for the programmer.  Do you see other ways of
> > dealing with that or how would you prefer that those things are
> > handled?  The other case is recursive data types such as trees, where
> > a node can contain either other nodes or leaves.
> 
> I disagree with you here.  

The way I read what you say below, we're actually agreeing about
having a special type for the value None, it seems to work best to me
as a valid value in the set of values of any object type.  That's what
I meant by 'something like a null type' above.  By doing this, you
lose the ability of a type checker to distinguish when something
should be None and when it should not, but this approach makes lots of
things easier both for the programmer and the implementation of a
static type system.
> 
> For your first case, "None type": Many languages have void
> single-valued types without any need for OR-ing types.  Remember, a
> type specifies the universe of possible values.  For all object types,
> None is a valid value in that set of values.  The void type is the set
> that only contains the value None.  All object types are subtypes of
> the void type, and you get all the benefits of subtyping polymorphism.
> I don't see any problem of the sort you're talking about here, at all.
> 
> For your second case, modling a tree where a node contains either
> other nodes or leaves: There are far better ways to model the problem.
> First, most trees don't have nodes without values, but let's ignore
> that for a minute, and assume otherwise.  The natural way to model
> this problem is with subtyping polymorphism, not with the OR-ing of
> types:
> 
> class NodeBase:  # Theoretically abstract.
>   def print_tree(self): pass
> 
> class NonLeafNode(NodeBase):
>   left :- NodeBase
>   right :- NodeBase
>   def print_tree(self):
>     left.print_tree()
>     right.print_tree()
> 
> typedef printable << print()-> None >>
> class LeafNode< T -> printable >(NodeBase):
>   value :- T
>   def print_tree(self):
>     value.print()
> 
> "T -> printable" Should read something like "any type T that is
> printable" (constrained genericity).

This seems like a good approach, and if None is treated specially as
above, then recursive types such as:

typedef IntTree (IntTree, int, IntTree)

aren't a problem either (atleast in terms of the need for OR).

> 
> I still assert that OR-ing types should *not* be in a python type
> system.

>You're basically saying, "here are things that should require
> a runtime cast, but we're going to completely ignore that statically
> and dynamically".

huh?  You mean you feel that OR-ing creates this situation?  In the
worst case, I agree.  I also have been searching for ways to eliminate
or atleast reduce OR-ing myself.  It seems essentially bad for static
systems.

> 
> When are dynamic checks necessary?  Generally, you're trying to do
> something that can be written as an assignment.  Since types are
> essentially sets, the LHS has to be a subset of the RHS in order for
> us to make the determination that an assignment will always yield an
> object of a legal type.  

You mean RHS must be a subset of the LHS, right?

as in 

x :- numeric
y :- int
y = 1 
x = y  # this is ok, int is subset of numeric

> If the LHS and RHS are disjoint (well, if
> None is the only shared value w/ object types), that should never be
> possible.  If there is some overlap, then a dynamic cast is required.
> I'd *really* like to see it be the case that the only times that "foo"
> is in the same set of values as 12 is when parametric polymorphism is
> involved, and with the any type.

<chuckle> 
I don't think many will disagree with you there.
> 
> I am aware that not allowing OR'd types makes things a bit harder for
> legacy code that people want to change to use the type system (such as
> the standard library).  The one place where it's a big issue is with
> heterogenous lists.  However, I think it really reduces the power of a
> type system to allow a list to be typed (string|integer|file), etc.
> 
> 
> > > 
> > > decl z :- integer
> > > typecase a:
> > >   case (x :- integer, y :- integer) : z = x + y
> > >   case (x :- integer,)              : z = x
> > >   case x :- integer                 : z = x
> > > 
> > > That's not quite as bad.
> > 
> > With regards to type casing possibly causing the language to slow down
> > by bringing the static type information into runtime, do you think it
> > would be reasonable to allow typecasing only on types that are easily
> > expressible in terms of the existing dynamic type system?  It seems to
> > me that this approach would save a lot of work, limit the runtime
> > overhead, and discourage OR'ing all at once. 
> 
> Well, any time you have to dynamically cast there's going to be a
> performance hit.  I'm not really worried about matching, though.  You
> can do it fairly efficiently, plus it satisfies the principal of
> localized cost... the feature costs the programmer nothing unless he
> uses it.  

This is true so long as extra information isn't carried around and
kept up to date at run time in order to make matching more efficient. 

> Also, I think that our most important goal here is to design
> the right type system for python, not to pick one that is very easy to
> implement...

agreed.  There seems to be an initially steep learning curve to
designing type systems.  I know it's been that way for me.  At some
point in the near future I hope to have read and learned enough to
take another stab at implementing an optional static type system for
python.

> 
> 
> 
> > 
> > Have you read about expressing the above with "mytype"?  that is:
> > 
> > interface if_LinkedList:
> > 	def add(self, l :- mytype):
> > 		...
> > 		
> > class LinkedList(if_LinkedList):
> > 	def add(self, l):
> > 		...
> > 
> > class BiDirectionalLinkedList(LinkedList):
> > 	...
> > 
> > the syntax is fairly simple, and 'mytype' just means the type of
> > instances of the class that implements the method.
> 
> Well, first of all, let me say that covariance provides a much more
> elegent solution to the problem.  Type variables are not as easy for
> the average programmer to understand.  If type variables are done
> right, then they'd basically duplicate language features like
> genericity.  No one wants to have non-orthogonal constructs, so we'd
> probably remove genericity, which would result in a type system that's
> more difficult to explain and use, plus not nearly as well understood.
> 
> Another problem with using type variables to solve this problem is
> that it requires the programmer to anticipate how people will want to
> use their classes upfront.  If you don't happen to use a type variable
> the first time you specify a parameter, derived classes cannot change
> the variance of the parameters without going back and modifying the
> original code.  Plus, you guys haven't talked about any type variables
> except "mytype", which is not powerful enough to handle uses of
> covariance where the argument to a method is of some type other than
> the type for which the method is a member.

time to take a closer look at constrained parametric polymorphism :)

> 
> > > It should be possible to implement this solution, and even do so
> > > incrementally.  There's another (better, less pessimistic) solution
> > > called the global validity approach.  The problem with it, IIRC, is
> > > that the algorithm basically assumes that type checking goes on in a
> > > "closed world" environment, where you're checking the entire system at
> > > once.  That probably isn't desirable.  I wonder if there haven't been
> > > refinements to this algorithm that allow it to work incrementally.
> > > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
> > > rule, assuming that the catcall rule can actually work for Python.
> > 
> > One possible approach to covariance of method parameters is to check
> > each method of class against all the possible different types of
> > 'self'.  This is what "stick" does, and it finds exactly the cases
> > where there are type errors.  It does require more checking than a
> > general rule, and it does add complications to the problem of mixing
> > checked modules and unchecked ones accross class inheritence, but it
> > does work.  I'd be interested in any feedback on this approach you
> > have...
> 
> To get it right, you would essentially be doing the same thing that
> the global validity approach does.  In particular, you have the exact
> same problems in that a "closed world" assumption is required.
> Incremental checking is far more useful, and I think that the
> polymorphic catcall rule is simple enough (though not if you call it
> "polymorphic catcall" when you report an error to the end user!).

I'll look into that more, as well as potential means for making the
global validity approach work more incrementally.

thanks,

scott