[Types-sig] Type Inference I

skaller skaller@maxtal.com.au
Sat, 18 Dec 1999 09:06:54 +1100


I've just been reading the December Types-Sig archive.
Wow, from an almost dead SIG ...

I've got a lot of comments, and I'm not sure if to roll them
all into one post, or lots of small ones. So I'll try a small one,
to see if I get suitably flammed :-)

Proposition: it is possible to do whole program type analysis of
existing Python programs, and generate optimised code.

Evidence: JimH has in fact done this, and claimed some
astounding (almost unbelievable) results.
I have done some 'micky mouse' testing, and found that
in some cases, it can indeed work.

So I think the proposition is proven that it can be _done_,
but how effective is it? What obstacles stand in the way?
Here are some thoughts.

OBSTACLES TO TYPE INFERENCE IN PYTHON, I: POOR SPECIFICATIONS
----------------------------------------------------------

First, the biggest obstacle to doing this is the state of the
current language definition! The exclamation mark is there
because I know you didn't expect this. [You thought that
optional type declarations would be the biggest help]
Of course, Guido (and ONLY guido) could go ahead and
do some type inferencing, and then 'explain' some extra
restrictions on Python. But no other third party could
do this, and claim to faithfully compile Python.

I've presented this argument before, but I'll give it again
anyhow. At present, there is no such thing as an 'erroneous'
or 'ill formed' python program. Every single file on your
computer is a correct python program. If you run python
on some arbitrary file, and it throws a SyntaxError,
then the program is correct: it has well defined
semantics, namely to raise a SyntaxError.

Perhaps you think this example is extreme (it is,
deliberately), but when we come down to compiling files
which are more 'obviously' python, this issue
almost completely prevents any optimisation
by type inference -- and that is why the most
important change for Python is to very carefully
define what stuff is NOT correct python.
It is very important that in these cases,
the language specification does NOT say an exception
is raised because that is _precisely_ what the problem is.
Raising an exception is well specified behaviour, and when
it happens according to specification, the client code
which causes it to happen is CORRECT python -- precisely
because the behaviour is specified.

For example, consider:

	x = 1
	y = open("something")
	try: x + y
	except: print "OK"

This code is CORRECT python at the moment (AFAIK).
It is NOT 'illegal' to add a file and an integer,
it is perfectly correct to do it, and then handle
the resulting exception.

There is no hope for any kind of type inference 
until this is fixed. What must be said is that
this case is an error, and that Python can
do anything if the user does this: the result
of executing the code MUST be undefined.

The fact that a particular implementation throws an exception,
is good behavour on the part of that particular implementation,
but it must NOT be required -- because that would prevent
a compiler rejecting the program.

FIXING THE PROBLEM -- WITHOUT DOING TOO MUCH WORK
-------------------------------------------------

It is relatively easy to fix SyntaxError: it is easy to say
that a 'python' program is one that (at least) conforms to the grammar.

It is not so easy to fix all the other cases, because 
in _some_ of these cases, we would want 'undefined'
behaviour, and in other cases, we would actually
_want_ to throw an exception. For example, it is common
to say this:

	try:
		import X
	except ImportError:
		import Y


and many people think this is reasonable, and changing
the existing semantics would break a lot of existing
code. It is possible to manually look at EVERY possible
place in the language specification, and decide
in which cases an exception must be raised, and in which
cases the _program_ is plain wrong, and the result is
undefined -- and perhaps go on to specify implementation
details such as "The current CPython implementation raises
an XXX exception here".

But that is a lot of work. The example of "SyntaxError"
suggests an alternative: we examine exceptions directly,
and specify which ones must be thrown, and which ones
are the result of an invalid program. This _might_
require some reworking of the exception tree
(I can't spell heirarchy :-)

The way to do this is to pretend you are a compiler
implementor for Python, and want to know which things
you MUST generate code for, and which things you
will either assume (and let the clients program
go haywire or coredump if the assumption is violated),
or which things you can reject at compile time as WRONG.

For example, os errors clearly require runtime support:
they're not (necessarily) program errors.

On the other hand , TypeError and ValueError are difficult.
Here's why: when the top level catches a type error,
it is almost certainly a program error. But in the case:

	def f(arg):
		try:
			return "<" + arg + ">"
		except TypeError:
			return repr(arg)

the client is doing something sensibly pythonic <g>.
Here, a much more complex rule may be required:

	[Badly worded WRAPPING rule}
	----------------------------

	"If an operator has 'invalid' argument
	combinations, then, unless it is wrapped
	in a lexically enclosing try block which
	catches a TypeError, the program is ill formed."

In other words, if you really want to catch a TypeError
when evaluating some expression, you are REQUIRED to
make sure it is lexically enclosed in a try block
which explicitly catches the TypeError. The reason
is clear -- a type inference algorithm can note
the try/except block here, and generate code
that does type checking. But in the case of plain old:

	"<" + arg + ">"

_without_ lexical enclosure, the type inference is
entitles to assume that 'arg' is a string.
And then, when the client calls the function containing
this fragment, a compiler may reject the program
because it can see that 'arg' is not a string, as required
by the semantics.

	The key word in that sentence is 'required'.
At present, the semantics don't really _require_ anything,
and so optimising python by local type inference is impossible.
[This does not prevent whole program analysis, but it does
make the results much less effective]

Finally, I note that contrary to my simplified assumptions,
the current reference DOES in fact say, in places,
that something is 'illegal' (or whatever).

The contention of this particular article is that this is
the first, and most important, work that can be done
by people wanting to compile python to efficient code:
clean up the semantics.

In many cases, I feel Guido will already have an opinion
on what is 'intended' to be an error in the program, and what is
'intended' to throw an exception: that is, I think Guido
could sensibly resolve some of the cases that other SIG members
disagreed on, or found difficult. Here is the 1.5.2 exception tree,
with comments:


Exception(*)
 |
 +-- SystemExit
 +-- StandardError(*)
      |
      +-- KeyboardInterrupt    NO! This should be TOP LEVEL
      +-- ImportError          RAISED ONLY IF WRAPPED, ELSE ERROR ****
      +-- EnvironmentError(*)  MUST BE RAISED IN COMPILED CODE
      |    |
      |    +-- IOError
      |    +-- OSError(*)
      |
      +-- EOFError             MUST BE RAISED IN COMPILED CODE
      +-- RuntimeError
      |    |
      |    +-- NotImplementedError(*)  UNDEFINED BEHAVIOUR
      |
      +-- NameError            UNDEFINED BEHAVIOUR (except in 'exec,
eval')
      +-- AttributeError       UNDEFINED (use getattr to get defined
behaviour)
      +-- SyntaxError          UNDEFINED
      +-- TypeError            UNDEFINED UNLESS WRAPPED LOCALLY
      +-- AssertionError       UNDEFINED
      +-- LookupError(*)       UNDEFINED UNLESS WRAPPED LOCALLY
      |    |
      |    +-- IndexError
      |    +-- KeyError
      |
      +-- ArithmeticError(*)    UNDEFINED
      |    |
      |    +-- OverflowError
      |    +-- ZeroDivisionError
      |    +-- FloatingPointError
      |
      +-- ValueError           UNDEFINED UNLESS WRAPPED
      +-- SystemError          UNDEFINED
      +-- MemoryError          UNDEFINED


The outcome of this is that really, the only times python
guarrantees to raise an exception is for environment errors,
or for typing/indexing/lookup errors which are locally
wrapped.

The biggest problem here is clearly ImportError,
since it cannot be raised _unless_ the importer
catches an exception -- and the only possible
exception it could catch would be an environment error,
which is unlikely, except if the module cannot
be found on the file system.

I'm NOT sure I like the wrapping rule. An alternative
is just to say that the client is required to test,
instead of relying on exceptions. But we need _something_.
Comments needed here: this is the hard part  (a suitably
'pythonic' rule)

------------------------------------------------------------
** Keyboard Interrupt: this is wrong wrong wrong!! <g>
Ocaml does this too. The reason is,
that Ctrl-C (or whatever) can occur _anywhere_ in a program,
and catching it therefore clashes with:

	try: something
	except: handler

Here, 'except' is probably intended to catch SYNCHRONOUS
errors caused by code in 'something', but it
inadvertantly catches a KeyboardInterrupt as well.

SOLUTION: At least, KeyboardInterrupt should be a
top level exception, or, better, divide the exception
tree into two kinds: synchronous and asynchronous.
At least then the programmer can write:

	try: something
	except SynchronousException: handler

which will still let the Ctrl-C escape through to
the top level and kill the program (or, get
explicitly handled).

A better solution, perhaps, is to get rid of Keyboard
Interrupt altogether, and handle signals by a different
mechanism (perhaps in the signal module, perhaps with
core language support)

-- 
John Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850