[Types-sig] Type Inference I

skaller skaller@maxtal.com.au
Sun, 19 Dec 1999 03:04:16 +1100

Greg Stein wrote:

Ok Greg, lets see where we agree and what we understand.

First, interpreter python is too damn slow for some applications.
Also, errors sometimes get reported later than we'd like.

We'd both like to:

	(OPT)	be able to translate Python sources in to C 
		which runs faster than interpreter python

	(ERR)	find errors in a program, before running it

I hope we agree on these points so far.

Now, here is something I believe, mainly from comments
made at various times by Guido, Tim, and others:
people have tried compiling python before, and found that
the resulting C code didn't run much faster than the
interpreter. Thats mainly because these compilers didn't
know anythong about the types, they just generated API
calls corresponding to what the byte code interpreter would
execute -- and the interpreter is pretty fast already.

So the question arises: how well can we do if we know
what the types of things are, or at least some of them?
I am going to assume, without evidence, that we can do better,
and I'm going to assume you agree.

So now the question arises: how can we find out the types?
Now, I am going to TELL you, that there is some evidence,
that we can in fact do surprisingly well without changing
Python one little bit. We can do better, if we analyse a whole
program. We can do better, if we make some assumptions.
I'd like you to accept this, without argument, because I cannot
prove it: I've only done a micky mouse experiment, but JimH
has done a less micky one, I'm told.

So now the question arises: what is required to make the
type inference we can do WITHOUT changing python one
little bit, even better, if we make some changes ?????

So, when you say 'we' are not going to do this or that
kind of inference, you are missing the point.
I surely AM going to. Others certainly WILL be going to.
This is how it is done. When you are writing a compiler,
you use every bit of information you can to make
it go faster.

So the point is how to give the compiler more information,
while minimising the impact on 'python' (you can read that
as 'keeping it pythonic' if you like).

Now, my point, in Type Inference I and II, is that static
type declarations are only ONE way of providing 
more information, and they are not even the most important
one. In fact, type inference is hampered by quite a lot 
of other factors. 

I'm sure you will agree on some. For example, I'm sure
you understand how 'freezing' a module, or banning
rebinding of variables afer importing, or, disallowing

	module.attribute = xxx

will help type inference: you clearly understand that
python is very dynamic, which makes static analysis
difficult. Right?

So what I am trying to do is list all the things
OTHER than adding optional type declarations,
which might contribute to our agreed upon aim,
namely, to provide a compiler with enough information
to generate fast code.

I think it is clear this is in the scope of the SIG's charter,
for example, there seems to be a consensus that

	module.attribute = xxx

is going to be disallowed -- because if it isn't,
sophisticated global control flow analysis is required
to even be sure _which_ function is being called 
at some point in the program. Tim Peters said that
the standard algoithm for that might not even terminate.
Clearly, this is a problem for a compiler :-)

Now I want to go back to the original example I gave,
and I want you to accept, temporarily, that we have
only THREE types: integers, strings, and files. And assume
that a function 'add(x,y)' exists, which throws
an exception if the types of x and y are not
both integers, or both strings. 

I want you to accept, that given a function call:


that deducing that x is an integer is useful
to a type inferencer, IF it can be done.

The question is: can it be done?

And the answer is: it depends on the

Consider two cases:

	1) The spec says: 
	IF the arguments are both ints ..
	OR IF the arguments are both strings ..
	OTHERWISE an exception is thrown

	2) The spec says:
	IF the arguments are both ints ..
	OR IF the arguments are both strings ...

There is a huge difference between these two cases for
a compiler. In case (2), the compiler can ASSUME
that given the call


that x must be an integer. This is a valid type deduction,
because the compiler doesn't care what happens if the
program has undefined behaviour: the assumption that
x is an integer is STILL CORRECT, because it cannot have
any consequences which break the language specification.
In this case, the compiler could, for example,
just keep x in a C int variable, and add 1 to it
by using the code x + 1 -- which is much faster than

	PyAdd(x, One)

On the other hand, in case (1), the compiler cannot deduce
anything, at least from the given fragment, so it can NOT
generate fast code: it has to call 


or, perhaps do something like:

	if (PyTypeIsInt(x)) 
		x->value ++;
	else PyRaise(SomeException)

.. which involves an extra run time check, at least,
and is therefore much slower.

Therefore, there is a performance advantage in adopting
spec (2) as a language specification, instead of (1).
Note this does not mean the generated code will crash,
if x is not an integer. What it means is that if the
compiler detects that x is not an integer, it can
generate a compile time error. It is NOT allowed to
do that with specification (1).

So my point is: the Python documentation contains
many examples where it says 'such and such an exception
is thrown', and this prevents generating fast code,
and it prevents early error detection. The point
is that throwing an exception is _well defined_ behaviour,
and it would be better if the specification said
the program was in error. That way, a compiler
can report the error at compile time.

At the moment, no errors can be reported by a compiler,
because there is no such thing as an erroneous python
program -- someone may catch the exception that
the specification says is thrown and do something
they think is OK, and would rightly claim the compiler
is breaking their program and not implementing the
Python language faithfully.

Just to make it clear, an example:

	try: return x + 1
	except TypeError: return str(x) + "1"

is a valid python code fragment, and it relies on
x + 1 throwing an exception if x is not an integer.
So IF we continue to allow this, we cannot deduce
that x must be an integer, and this prevents optimising
generated code, both here, and in other places in the

You might just think, seeing

	x + 1

that if x is not an integer, the code must be an error,
but the example above shows that you'd be wrong
if you said that.

For this reason, I think it is important that the Types SIG
also examine when it is legitimate to catch an exception
at run time, and do something, and when the code is just 
plain wrong, and a compiler can reject it.

My proposal, and I thought it was fairly 'concrete' as
you required, was that apart from EnvironmentErrors,
all _standard_ exceptions not trapped within a function body
(I said 'lexically local' before, but now I'll be more
specific), are in fact programming errors: the programmer
may NOT rely on catching any standard exceptions
other than environment errors, generated by code inside
a client written function.

Note this is only a proposal, I'm not sure if I like it,
but I hope the reason for proposing it for discussion is
easier to understand now. Note: I find this difficult too.
I'm not a compiler writer. But I have spent over five years 
on a standards committee, and have some vague idea of the impact
of specifications -- and in particular lack of them --
on the ability to generate fast, conforming code.
The way I learned it, was listening to the arguments
of compiler writers, explaining the impact of various
possible specifications on opportunties for optimisation.
Believe me, some of the arguments are pretty devious :-)

John Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850