[Types-sig] Type Inference I

Tim Peters tim_one@email.msn.com
Sun, 19 Dec 1999 22:12:06 -0500


[John Skaller]
> ... accept, temporarily, that we have only THREE types:
> integers, strings, and
> ... a function 'add(x,y)' exists, which throws an exception
> if the types of x and y are not both integers, or both strings.
> ...
> Consider two cases:
>
> 	1) The spec says:
> 	IF the arguments are both ints ..
> 	OR IF the arguments are both strings ..
> 	OTHERWISE an exception is thrown
>
> 	2) The spec says:
> 	IF the arguments are both ints ..
> 	OR IF the arguments are both strings ...
> 	OTHERWISE THE BEHAVIOUR IS UNDEFINED
>
> There is a huge difference between these two cases for
> a compiler. In case (2), the compiler can ASSUME
> that given the call
>
> 	add(x,1)
>
> that x must be an integer.

The philosophy of (at least) CPython is that core dumps and stack faults are
never the user's fault -- any time that happens, it's a bug in Python
itself.  Some of those bugs will never get fixed <0.9 wink>, but they're
considered to be Python bugs all the same.  This is unlike C/C++/Fortran
etc, and is one of Python's selling points relative to them.

Now the kind of assumption you want to make above will lead to generating
code that *can* cause core dumps when the assumption is false.  For example,
if x is actually a file object, you may well generate code that adds 1 to
some internal field of the underlying FILE* struct.  Not good -- not in
Python.  Spec #1 is the *intent* of the language.

> ...
> On the other hand, in case (1), the compiler cannot deduce
> anything, at least from the given fragment, so it can NOT
> generate fast code: it has to call
>
> 	PyAdd(x,One)
>
> or, perhaps do something like:
>
> 	if (PyTypeIsInt(x))
> 		x->value ++;
> 	else PyRaise(SomeException)
>
> .. which involves an extra run time check, at least,
> and is therefore much slower.

The transformation is valid provided your inferencer is strong enough to
prove that x *is* an int at this point.

If it's not, it may be no significant loss anyway:  most program time is
spent inside loops, and runtime type checks can often be floated out; when
they can't, the compiler can provide feedback that a user who gives a rip
about speed can act on.

> Therefore, there is a performance advantage in adopting
> spec (2) as a language specification, instead of (1).

Yes, but I doubt Python will ever go that route.  It's harder to optimize
Python than Fortran -- but, in return, you get to program in *Python*
<wink>.

> Note this does not mean the generated code will crash,
> if x is not an integer. What it means is that if the
> compiler detects that x is not an integer, it can
> generate a compile time error. It is NOT allowed to
> do that with specification (1).

No, but under #1, and under the same assumption (that the compiler detects
that x is not an int), it can generate code to raise the exception and skip
the PyAdd business.  That is, *if* the inferencer can *either* say
"definitely an int" or "definitely not an int", it makes no practical
difference whether spec #1 or #2 is in effect.

> So my point is: the Python documentation contains
> many examples where it says 'such and such an exception
> is thrown', and this prevents generating fast code,
> and it prevents early error detection.

I'm sure that if anything comes of this SIG, people will ask for a compile
*option* to warn about (or error on) every nastiness the analysis code can
detect.

> ...
> But I have spent over five years on a standards committee,
> and have some vague idea of the impact of specifications --
> and in particular lack of them -- on the ability to generate
> fast, conforming code.

Speed wasn't one of Python's primary goals; had you been on the ECMAS
JavaScript committee instead, you wouldn't have heard anyone arguing about
speed <0.9 wink>.  C++ is the Fortran of OO languages ...

not-that-speed-kills-ly y'rs  - tim