[Types-sig] Type Inference I

Sun, 19 Dec 1999 11:42:48 +1100

Greg Stein wrote:

> But to state up front: I don't think we want to rely on whole-program
> analysis. 

	Right. I accept this as a reasonable requirement.
Perhaps to explain my viewpoint: it is worthwhile seeing
just how far it is possible to go, with no changes,
and with a few small changes like optional type checking,
in each case: whole program analysis, single module, and
single function.

	The reason it is worth considering whole program
analysis is that, because there is a LOT more information
available, such a tool can do a much better job, and therefore
less changes are needed to the python language. Establishing
the _minimum_ changes required is useful, even if we both
agree we'd like to do more -- and I share your desire to
work at a much finer grained level like modules or functions.

	Viperc has taken the whole program approach,
NOT because I like it, but because, at the moment,
there is no real alternative. I'd sure like to see
things that made per module compilation possible!!

> At the moment, I am assuming that a type-checking tool will not
> be part of the [byte-code] compiler -- that is is just too much and too
> slow to directly include. 

	You are probably right. however, I think the issue here
is the Python language, and what _might_ be done if we change it,
rather than any particular tool. That is, we should be examining
"what can we do, if we make these changes" and not "what tool
can we write for the CPython 1.6 distribution". I'm not saying
a tool cannot be written, just that the issue isn't writing the tool,
but changing the language so tool writers can actually get better
results.

> To that end, I think we might eventually want to integrate something.

	Actually, I tend to think the most likely tool which will
get people excited is a compiler that generates C code: nothing
to do with the bytecode interpreter at all. But I could be wrong.
[The JPython people, for example, won't care :-]

> And to do that, we definitely cannot rely on whole-program analysis. 

	Agreed. It makes sense to consider how to change Python
so a more localised tool can work well.

> In other words, if we depend on whole-program analysis, then I don't think the
> builtin, byte-code compiler will ever be able to take advantage of type
> information.

	You are probably right, but I don't think the bytecode
compiler is the target. Well, i _didn't_ think that, until you said
that's what you were interested in right now. Forgive my
misunderstanding!

> I agree that you want to use every bit of information possible. I disagree
> that I'm missing the point: I think we are discussing what will happen to
> the native compiler. To that end, I *am* positing that 'we' will do <this>
> or <that>.

	I see. This appears to be where we are crossing wires.
It hadn't even occurred to me that this had anything at all to do with
the bytecode compiler. My assumption was people were interested in:

	a) a stand alone type checker to help diagnose errors
	b) a compiler to convert functions, modules, or whole programs 
		into C.

	Thanks for pointing out my assumptions were overly restrictive:
you are right, the bytecode compiler might benefit from analysis too.

> > Now, my point, in Type Inference I and II, is that static
> > type declarations are only ONE way of providing
> > more information, and they are not even the most important
> > one. In fact, type inference is hampered by quite a lot
> > of other factors.
> 
> This was entirely unclear. I saw it as some kind of weird ramble about
> changing Python's exception behavior in some unclear way, and for some
> unknown purpose.

	I accept it was a weird ramble. Sorry. I was trying my
best to explain something which is, in fact, difficult to
understand for me. Perhaps that's why the ramble was so long
winded and rambly (or perhaps I write like that anyhow :-)

> > I'm sure you will agree on some. For example, I'm sure
> > you understand how 'freezing' a module, or banning
> > rebinding of variables afer importing, or, disallowing
> >
> >       module.attribute = xxx
> >
> > will help type inference: you clearly understand that
> > python is very dynamic, which makes static analysis
> > difficult. Right?
> 
> Yup.

	So, you agree with my point that there are OTHER
things than optional type declarations that can improve
the situation wrt typing/optimisation?

	You understand the freezing issue, but not
the exception handling one? I don't fully understand it either.
That's why the ramble, to promote discussion: if I fully
understood it, I would have posted a tightly worded proposal
instead.

> >... module attribute assignments ...
> >... add() example, explaining exceptions mess up compiler ...
> >
> >       1) The spec says:
> >       IF the arguments are both ints ..
> >       OR IF the arguments are both strings ..
> >       OTHERWISE an exception is thrown
> >
> >       2) The spec says:
> >       IF the arguments are both ints ..
> >       OR IF the arguments are both strings ...
> >       OTHERWISE THE BEHAVIOUR IS UNDEFINED
> >...
> > Therefore, there is a performance advantage in adopting
> > spec (2) as a language specification, instead of (1).
> > Note this does not mean the generated code will crash,
> > if x is not an integer. What it means is that if the
> > compiler detects that x is not an integer, it can
> > generate a compile time error. It is NOT allowed to
> > do that with specification (1).
> 
> Interesting point. As a user of Python, I like (1) and do not want to see
> Python to change to use (2). Sure, it hurts the compiler, but it ensures
> that I always know what will happen.

	Right. But you can see the tension between these two
specifications and the impact on performance??

	So now: let me put it to you, that we could
try for specification (3) -- which is a compromise
between (1) and (2), which provides BOTH advantages
with some restrictions:

	(3) ....

	OTHERWISE, IF the function call is
	enclosed in a try block within the same
	function as the call, AND that try block
	has a handler which explicitly catches
	the exception or a base thereof, or,
	any exception, THEN an exception
	will be thrown, OTHERWISE the program 
	is in error.

EXAMPLE:

	def f(x):
		try:
			return x + 1
		except:
			return str(x)

REQUIRED: x + 1 requires an exception be thrown at run time,
and the function f will never fail. No compile time
diagnostics can be printed. It is hard to optimise
the code, but it does what Python does right now.

EXAMPLE:

	def f(x):
		return x + 1

PERMITTED: a static analyser can report a compile
time error, if it sees a call:

	def g(x):
		if type(x) is StringType:
			return f(x)

THIS IS A CHANGE FROM CPYTHON. We're saying
that this code IS AN ERROR, and that a compiler
can REJECT the program as invalid python.

Let me explain the rationale: even a simple,
local, per function analyser can _see_ that
a function call is wrapped inside a try/except
clause, and can examine the exceptions that
will be handled .. if the exception is a Python
defined one, and is named with the Guido given
name (rather than a variable bound to it),
then this case is 'static' enough to determine
that the user DELIBERATELY executed possibly
faulty code, with the aim of handling the exception.

In this case, we should respect the users wishes.

Now, if the user puts the handler well down the stack,
somewhere else .. then the user is NOT deliberately
trying to use exception handling to do type checking,
they're just trying to make the program continue to
run without falling over, or, print some diagnostic
before terminating .. in other words, in this
case, it is reasonable to assume that the function
call is a programming error. That is,
THE CODE IS NOT VALID PYTHON. In the first example,
the code IS valid python though.

	The question I would ask is: does this
rule cover enough existing code, that if it is
taken as a Python language rule, it will
not break too many programs if a compiler
REJECTS the code in the case that the handler
is not local to the function call??

	And the second question is: will this
really provide opportunities for better code generation
by a compiler, or better diagnostics from a static
analyser?

	Finally: is there a better rule??

	I'm really hoping here that you (Greg)
will like the idea, because it helps a per function
or per module static checker more than a whole program
analyser.

	It is clear Guido is willing to add restrictions
on the existing language rules, to allow better
static analysis for ERR or OPT: he already said that
module freezing is a goer. I do not know if the
rule above is a goer: I don't really know what all
the issues are.

> While true, I think the compiler will still know enough about the type
> information to generate very good code. 

	I'm not so sure. I tried it, and the result was that
because of the  EH (exception handling) issue I mentioned,
it wasn't possible EVEN with global whole program analysis,
to get good results -- without doing a full control flow
analysis as well. And that, unlike type inference, is probably
too hard.

> If somebody needs to squeak even
> more performance, then I'd say Python is the wrong language for the job. I
> do understand that you believe Python can be the right language, IFF it
> relaxes its specification. I hope it doesn't, though, as I like the
> rigorous definition.

	I would say python needs to _tighten_ its specification,
not relax it, so we had better watch out when we use these terms
we don't cross wires, since we clearly agree on the semantics
we're refering to :-) [Technically, I think you're right,
and I'm wrong]

> It only prevents it in the presence of exception handlers. 

	Yes. But 'presence' is a dynamic thing, not a lexical one.
You need whole program control flow analysis to detect presence
statically. The kind of rule I proposed above changes that.

>You can still
> do a lot of optimization outside of them. And a PyIntType_Check() test
> here and there to validate your assumptions is techically more expensive
> than not having it, but (IMO) it is not expensive in absolute terms.

	That depends on where the check is. If it is in 
a tight inner loop, where the code would otherwise

	a) do no function calls
	b) all fit in the machines hardware cache
	c) reduce to register operations

then I'm sure you would agree you were wrong.
And these are the cases of most interest, because it
is the really tight Python coded inner loops where
Python falls so badly behind C in performance.

With just the right specifications and tweaks to the
language, plus static type declarations, it may well
be possible to compile down to extremely fast C code.

> All right. This is clear now. And it is clearly something that I would not
> want to see :-)

	So you are willing to throw out optimisation
opportunities in favour of preserving the existing
semantics. This is valid viewpoint. I accept one vote
against any change here. I'd like to see other peoples
opinions, and then perhaps a consensus can be
reached: I am quite willing to accept whatever
is decided by Guido in the end .. I can always add
features to Viper than make it much faster than Python :-)
But I'd prefer not to, I'd rather it compile Guido Python
with good performance :-)

-- 
John Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850