[Types-sig] A late entry

John Viega John@list.org
Thu, 16 Mar 2000 14:00:56 -0800


On Thu, Mar 16, 2000 at 04:06:12PM -0500, scott wrote:
> On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote:
> 
> I don't get the feeling I was clear enough about what I'm saying re:
> dynamic checks, so I'll try again.  Minimizing run time checks is
> something I very much agree with.  I also agree that some are likely
> to be necessary.  I think that where those checks are necessary, it
> seems to make sense to leverage python's existing type system to
> implement them, because that type system is already in place and there
> would be no need for python objects as they exist in C or java or
> whatever to carry around any additional information.

I didn't disagree with this.  It seems rather obvious to try to avoid
duplicating work if it's not necessary.

> For example, if we add a field to the list structure in C Python that
> contains the set of types contained in the list, then every time a del
> somelist[x] occured, the extra information would have to be updated by
> potentially checking the entire list.  If there was a way to make
> runtime checks work reasonably without this kind of extra weight, it
> seems worth pursueing to me.  One way that seems feasible is to
> leverage the existing type system in python.  

Ack!  Set of types bad!  The only problem with disallowing sets of
types (OR-ing types) is that legacy code will either need to be
rewritten or use "any", where it might be nice to get a bit more
accurate than "any".  I don't think it's worth supporting the feature
for this reason alone, as it has way too many bad consequences, and
the good consequences aren't that good.

> For example, if we provided a hierarchical interface to the existing
> type system, and that hierarchy were mirrored in the static system,
> then dynamic checks or casts could be limited to those expressible in
> the hierarchy based on python's existing type system (where you can
> compare lists and tuples but not lists of ints and lists of strings).

Oh, I don't think I agree with you here.  I think it's fine to
leverage off the existing type system where possible, but you need to
be able to add dynamic checks any time you can determine statically
that something may or may not need to give a type error at runtime.
If the dynamic checks needed can't be expressed in Python's dynamic
type system currently, then the dynamic system will have to be
changed.  

I wouldn't worry too much about this problem, though.  If you have a
good static type system, the number of dynamic checks that get added
will generally be pretty low.  The code in question can be added
without changes to Python's current type system.
> > 
> > I think it's silly to do a 1/2 assed job with an inference
> > algorithm... I'd rather not have one at all.  People don't want to
> > memorize a rule beyond "if the type is ambiguous you must declare it
> > explicitly".  Honestly, type inferencing has its problems too.  For
> > example, code can infer a more general type than intended, etc.
> > However, minimizing the effort of the programmer definitely seems more
> > pythonesque.
> 
> In my own experience, I've never seen a type inferencer that succeeded
> at only requiring type annotations where they would otherwise be
> ambiguous, that includes ML.  I spent more time guessing what the type
> inferencer considered ambiguous than anything else.  The rule of
> thumb for where annotations are required would be easier for me to get
> if it didn't involve potentially vague notions like deciding what is
> and is not ambiguous.  That's just me, though.

ML is unification-based.  Depending on what kind of implementation of
unification is used, there can be some bizzare corner cases (the
correct implementation is harder/less efficient).  But generally, if
the code you wrote was actually unambiguous, then ML can infer the
principal type.

In that respect, I do not believe you are correct.  However, I will
agree that it can take people a while to develop a good mental model
of what an inferencer is actually going to deduce.  Like I said
before, I'm perfectly comfortable with requiring explicit types in
order to type check a program.  But there seems to be some interest in
having the automation, even despite the drawbacks.  If that is the
case, I'd rather see something powerful and well-designed with few
restrictions than something ad hoc with many restrictions.


> One of things expressed earlier was the idea of leaving the
> implementation of inferencing until after the type system and checker
> were done.  Does that order of events seem reasonable to you?

Well, if you design things right, your inferencer can leverage some of
the infrastructure of your checker quite effectively.  It's definitely
the prefered ordering.


> The way I read what you say below, we're actually agreeing about
> having a special type for the value None, it seems to work best to me
> as a valid value in the set of values of any object type.  That's what
> I meant by 'something like a null type' above.  By doing this, you
> lose the ability of a type checker to distinguish when something
> should be None and when it should not, but this approach makes lots of
> things easier both for the programmer and the implementation of a
> static type system.

No, most languages have a rule that variables cannot have the void
type as their principal type.  This is no reason to allow OR-ing of
types.  Note how it isn't an issue in pretty much every other
language, either.  So I still don't see what you are seeing that
forces an OR construct.


> > For your second case, modling a tree where a node contains either
> > other nodes or leaves: There are far better ways to model the problem.
> > First, most trees don't have nodes without values, but let's ignore
> > that for a minute, and assume otherwise.  The natural way to model
> > this problem is with subtyping polymorphism, not with the OR-ing of
> > types:
> > 
> > class NodeBase:  # Theoretically abstract.
> >   def print_tree(self): pass
> > 
> > class NonLeafNode(NodeBase):
> >   left :- NodeBase
> >   right :- NodeBase
> >   def print_tree(self):
> >     left.print_tree()
> >     right.print_tree()
> > 
> > typedef printable << print()-> None >>
> > class LeafNode< T -> printable >(NodeBase):
> >   value :- T
> >   def print_tree(self):
> >     value.print()
> > 
> > "T -> printable" Should read something like "any type T that is
> > printable" (constrained genericity).
> 
> This seems like a good approach, and if None is treated specially as
> above, then recursive types such as:
> 
> typedef IntTree (IntTree, int, IntTree)
> 
> aren't a problem either (atleast in terms of the need for OR).

I got confused here for a second... this looks too much like an actual
tuple use for me. :) Whatever syntax ends up getting used, can we use
(x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2
of z?  That's a very common notation.


> > I still assert that OR-ing types should *not* be in a python type
> > system.
> 
> 
> >You're basically saying, "here are things that should require
> > a runtime cast, but we're going to completely ignore that statically
> > and dynamically".
> 
> huh?  You mean you feel that OR-ing creates this situation?  In the
> worst case, I agree.  I also have been searching for ways to eliminate
> or atleast reduce OR-ing myself.  It seems essentially bad for static
> systems.

Yes, OR-ing creates that situation.  I've definitely been arguing that
it makes your static checking less precise, forcing many type errors
to be caught at runtime.  I think that "any" should be the only shady
construct here, personally.  It should suffice for supporting code
that was untyped as written.  There is no need for an OR construct, it
can and should be eliminated.  I'll be very disappointed if it makes
it into the Python type system :)

> 
> > 
> > When are dynamic checks necessary?  Generally, you're trying to do
> > something that can be written as an assignment.  Since types are
> > essentially sets, the LHS has to be a subset of the RHS in order for
> > us to make the determination that an assignment will always yield an
> > object of a legal type.  
> 
> You mean RHS must be a subset of the LHS, right?

Of course; my typo.

> > Well, any time you have to dynamically cast there's going to be a
> > performance hit.  I'm not really worried about matching, though.  You
> > can do it fairly efficiently, plus it satisfies the principal of
> > localized cost... the feature costs the programmer nothing unless he
> > uses it.  
> 
> This is true so long as extra information isn't carried around and
> kept up to date at run time in order to make matching more efficient. 

Well, you're going to want to keep complete type information around
for the runtime to use anyway, so that doesn't really matter.

> > Well, first of all, let me say that covariance provides a much more
> > elegent solution to the problem.  Type variables are not as easy for
> > the average programmer to understand.  If type variables are done
> > right, then they'd basically duplicate language features like
> > genericity.  No one wants to have non-orthogonal constructs, so we'd
> > probably remove genericity, which would result in a type system that's
> > more difficult to explain and use, plus not nearly as well understood.
> > 
> > Another problem with using type variables to solve this problem is
> > that it requires the programmer to anticipate how people will want to
> > use their classes upfront.  If you don't happen to use a type variable
> > the first time you specify a parameter, derived classes cannot change
> > the variance of the parameters without going back and modifying the
> > original code.  Plus, you guys haven't talked about any type variables
> > except "mytype", which is not powerful enough to handle uses of
> > covariance where the argument to a method is of some type other than
> > the type for which the method is a member.
> 
> time to take a closer look at constrained parametric polymorphism :)

This isn't the best solution, IMHO.  I advocate covariance + a poly
catcall rule.

> > To get it right, you would essentially be doing the same thing that
> > the global validity approach does.  In particular, you have the exact
> > same problems in that a "closed world" assumption is required.
> > Incremental checking is far more useful, and I think that the
> > polymorphic catcall rule is simple enough (though not if you call it
> > "polymorphic catcall" when you report an error to the end user!).
> 
> I'll look into that more, as well as potential means for making the
> global validity approach work more incrementally.

There's been some work done in this area, but nothing that had
actually been implemented last I checked.  Why is it necessary?  What
do you have against a poly catcall rule?

John