[Types-sig] recursive types, type safety, and flow analysis

Wed, 22 Dec 1999 11:33:43 -0800 (PST)

On Wed, 22 Dec 1999, scott wrote:
> On Wed, Dec 22, 1999 at 01:45:43AM -0800, Greg Stein wrote:
> > On Wed, 22 Dec 1999, scott wrote:
> > 
> > I posted a set of 4 cases a few messages ago. Without control flow
> > analysis, the type checker cannot determine which of the four cases is
> > being used when it analyzes f(). 
> 
> for easy reference, they're at:
> http://www.python.org/pipermail/types-sig/1999-December/000935.html
> 
> 2 of those 4 cases use 'del' in the global space. I'm inclined to
> believe that those 2 cases are beyond the scope of static type
> checking.  
>...
> The other 2 cases seem handleable at compile time under a system which
> builds typedecl information by reference (*) -- iteratively or 2-pass,
> whichever works out best, sans flow-control analysis, and at compile
> time, with a separate typedecl-name namespace mechanism.  Any
> information distinguishing case 1 from case 2 (like redefining g() 
> inside f) would either fall under local namespace type-ing  or something 
> which should, IMO, not be handled by a static type system.

Yes, 2 can be handled and 2 cannot. My point was that while you're
analyzing f(), you cannot know which of the 4 cases are present in the
global script. Therefore you cannot issue warnings for f's use of g().
[ and funny heuristics may just serve to make the warnings issue
  non-deterministic from a human's standpoint ]

> [...]
> > The compiler *will* be able to generally verify types. It just can't
> > handle a determine which of a set of alternatives an object will have at a
> > specific point in type (assuming that object occurs in a different body of
> > code than that which is being analyzed).
> > 
> > Am I being clear enough? It seems like I've said this about three times so
> > far...
> 
> Yes, I got it, or so I think :) But I think we may have 2 different
> expectations of something fairly basic:
> 
> decl f(x: Int) -> Int|None
> decl g(x: Int) -> Int
> 
> def g(x):
>     return f(max(x, 1))
> 
> def f(x):
>     if x > 0:
>         return x
>     else:
>         return None
> 
> I *want* the static typing to complain and to be warned or blow up
> with the message that according to the type information alone, g() is
> not verifiably an Int.  It seems like you want this to work without
> complaints.  Is this correct?

Incorrect.

I said that we can perform type verification, but we cannot know a global
name's type (or existance) at a point in time. We collect (and union if
necessary) the types of globals. Then we analyze the functions.

In your example above, we find the types of f and g (as you listed in the
decl statements). When we turn to analyzing g(), we find a mismatch
between the return value of f() and g's return value and flag it.

Note that you don't have to declare f and g beforehand, but could just
rely on their definition to work:

def g(x: Int)->Int:
    return f(max(x, 1))

def f(x: Int)->Int or None:
    ...

Since we collect the globals' type information before looking at the
function bodies, we know the typedecl for f() before we analyze g.

> > > > The origination of this discussion was based on the recursive type issue.
> > > > If we have runtime objects, then I doubt we could support the recursive
> > > > type thing without some additional work. Or, as I'm suggesting, you do not
> > > > allow an undefined name (as specified by runtime/execution order) to be
> > > > used in a typedecl.
> 
> I still don't see how enforcing your suggestion allows any compile time
> checking at all -- unless you you further qualify it with 'used in a typedecl
> as it operates at run time'.

Given:

Int = type(1)
String = type("")
def f(x=len(sys.path): Int)->String:
  return str(x)

At runtime, Int and String must be defined when the function object is
constructed (because their values are stored into the function object).
This is analogous to requiring that sys.path and len must be defined.

That is the specified runtime behavior. The question: what happens at
compile time? The compiler knows that Int and String are defined (earlier
in the program) and what they represent. It performs the type checking as
we expect it to.

Let's be concrete and say this declaration occurs in the global code body.
As I've previously stated, the global body is analyzed first. It is
analyzed in runtime order! (top to bottom) As it steps through, and
reaches the f() definition, it knows the types/values of Int and String;
it then records that information as part of f's signature for later use.
If the line assigning to String is shift below f(), then we have an error:
when the analyzer reaches f(), it has not yet seen a definition for
String.

>...
> yes, essentially the same way that
> 
> struct node {
>     int i;
>     node * n;
> }; 
> 
> is OK, but
> 
> struct node {
>     int i;
>     node n;
> }
> 
> isn't. (gives 'incomplete type' with gcc).  You need to be able to have
> a reference to a type and alter it as the type declarations are
> processed.

Correct. That is what I'm suggesting.

>...
> > > > I do believe the information goes into the bytecode, but I don't think
> > > > that is the basis for needing to plan now. Instead, we have to define the
> > > > semantics of when/where those typedecl objects exist. Do we have them at
> > > > runtime? 
> > > 
> > > in the above, no, though we do have the ability to find a name
> > > anywhere at compile time.
> 
> I'd like to recant this statement and replace it with:  
> 
> 1) The typedecl information is stored in an application-wide static
> type model which is created at compile time (implies typedecl specific
> import/#include mechanism).
> 
> 2)The model is mapped to something potentially available at run time,
> eg bytecode with associated module, classe and function objects.
> 
> 3)The runtime environment can do with that information what it
> pleases, but 1) and 2) need to be done first, and have a lot of
> potential for use, even without 3).

Yes, yes, and yes.

And:

4) the design needs to incorporate (3) even though we may be deferring its
   implementation.

>...
> > checking. If you're talking about a name having multiple types over a
> > period of time, then I disagree: we can handle that case.
> 
> perhaps for local variables, but I don't see how with global variables
> unless that global variable is explicitly stated to be a union by the
> programmer, and the type model works out OK -- with atleast the option
> of working with my expectation of static typing of (global) unions as
> described above.

The global case is the same as the local case. Sequence through the
statements, looking at what happens to the types at each point. Take this
example:

  a = 1
  f(a)
  a = "foo"
  f(a)

As we step through, we know that "a" starts as an Int and that we call f()
with an Int. "a" then becomes a String and we call f() with a String. Now
that we've processed the global body, we start processing the function
bodies. But first, we create a union of all the types that "a" ever used;
effectively, the functions see the following declaration:

  decl global a: Int or String

[ it is possible to consider adding "or Undefined" in there for
  completeness, but that may cause more trouble than it saves ]

So, to answer your question: we can handle varying types. But it does take
a qualification -- it can only be handled within a single code body. When
you consider the global body vs. a function body, then we must be
conservative and take the union of all the types a name ever had.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/