[Types-sig] type-assert operator optimizations

Thu, 23 Dec 1999 10:18:01 +1100

Greg Stein wrote:

> The assert statement has different meanings based on the compiler you use.
> That's all. Nothing funny about that.
> 
> In one case, it is defined to test and value and raise an
> exception. In the other, it is a no-op.

	The semantics of a programming language are a property
of the LANGUAGE and not the compiler.

	So what you are saying is that there are TWO python
languages, which you disagreed with before.

> But you're still missing the point. "assert" has a very specific
> definition, and that is to raise an exception if its value is zero. It is
> also defined to be a no-op in an optimizing compiler.

	Make up your mind :-)

> I do not see where you're going with all this. 

	I know. I've got five or so years intensive
experience with standardisation committees -- and quite a few
more as a 'lingerer'. :-) It takes a long
while to understand some of the subtle distinctions and issues,
and I can't claim to understand them myself.

	This is NOT a put down, I know you are not
stupid, etc etc .. I just happen to know something
about these issues because I spent lots of time
dealing with them.  Be glad I have done so, and spared
you most of the agony :-)

	A programming language isn't defined by an implementation. 
Semantics comes from documented specifications -- and sometimes the
specifications don't agree with practice, are suboptimal,
etc .. which makes discussing changing the specifications
all that much harder.

	At present, the python language reference
confuses what a single implementation does with semantics.
[but, it's fairly good, because the distinction IS made
in some places] This is because it was written as a 
description of a  single implementation. However, there are now
many implementations -- several versions of CPython
including 1.5.0, 1.5.1, 1.5.2, 1.6 (under development)
and 2.0, as well as versions of JPython, Viper,
and other tools like type checkers. And then,
it runs on multiple platforms. Whew! That's a lot
of Pythons. What do they ALL have in common?
What will run on ALL these implementations on ALL platforms?

	It is no longer sensible to merely describe
what a particular version of CPython does on one platform: 
the Python Language itself needs an implementation independent 
specification. This is so (ultimately) programmers can write programs
and 
DEPEND on a particular behaviour -- even when compiled with
an optimising compiler (and I'm not talking about merely optimising
byte code)

	One of my complaints about Python was that
it requires exceptions in far too many places, where
in fact one would consider that the program is in error:
the fact that CPython 1.5.2 raises exceptions in these cases
ought to be considered a quality of implementation issue,
and NOT mandatory semantics.

	The reason is that all these requirements for
exceptions prevent effective compilation, by preventing
many optimisations to be done, and by preventing
early error diagnosis. There is an exception,
effectively, in the case of assert statements --
in this case the intention is quite clearly reflected
in the optimising compiler: the language semantics
required for assert are 'none'. This specification
only applies to correct programs, so that 
each implementation is free to choose a behviour
for incorrect ones -- throw an exception, do nothing,
or reject the program -- or even core dump.
The point is the semantics of assert are there
to allow the program to declare intentions, and discover
if in fact their program is wrong.

	The same applies in many other areas,
where one can NOT produce an efficient code generator
unless some of the requirements are relaxed.
For example, if a program recurses past some reasonable
limit, CPython reports the error. But this cannot
be a sensible language requirement, because 
stack overflow is VERY expensive to check for in
compiled code. So it is a Quality Of Implementation (QOI)
issue what happens on stack overflow, NOT part
of the language specification. [At least, this is
a sensible position if one wants efficient code to
be generated] Did YOU check for stack overflow
in the last extension module you wrote? <g>

	What I expect you (Greg) will find
when you try to write the type checking stuff,
is that just a few, judicious, language changes,
-- mainly restrictions on what is considered
a correct Python program -- will vastly improve
what you can deduce. You will need to experience
this yourself, to really begin to understand
how sensitive optimisation is to small changes
in semantic requirements. [And when you do,
please report your experiences!]

	There is a classic example of this:
FORTRAN vs C. It is well known that Fortran
produces considerably more efficient code than C.
It is also well known why, and that in retrospect,
C was just a bit too dynamic. The problem
relates to aliasing, especially of function
arguments -- Fortran does NOT permit this.
C does, and it pays dearly.

	The C committee tried to fix this
with a new keyword 'noalias', and they stuffed up the design,
and had to withdraw it. A new proposal for keyword 'restricted'
has been accepted for C9X: the semantics are subtly different.
It remains to be seen whether this allows C to catch up
with Fortran.

	Now, I myself have a desire to compile
Python to efficient code. I know that the lack of
explicit type declarations, by itself, does not
prevent this -- although allowing explicit declarations
can certainly improve performance, provided that a failure
to comply means that the program is in error -- and NOT
that exceptions be throw: that would defeat the (OPT) purpose. 
(Not entirely though)

	It is in fact quite conventional
in most programming languages to provide assertions
with NO semantics, to allow debugging dynamic situations
(for example, Eiffel).

	I can't see any reason Python shouldn't
follow this conformance model. I also cannot see a reason
why CPython should not raise exceptions in these cases.
But it would be doing so because that is what a quality
INTERPRETER would do, and NOT because it was a requirement
of the language.

	That is why I tried to distinguish two uses
of exceptions: for reporting program errors, and for
reporting environmental problems (like end of file,
or cannot open file) -- in the latter case, we would
not dispute that the program is correct and should handle
the exception (whether or not it is interpreted or compiled).

	In the other cases there is a tradeoff-- the more
code that is banned, the faster the code we can generate,
and the more likely we are to get a core dump or unexpected
behaviour. in trying to decide on where the tradeoff should
lie, your position at one extreme is not helping.

	The optimisations which can be performed
by NOT requiring assert statements to raise exceptions
are minimal but existant. The reason that they're minimal
is that, apart from eliding the actual test, it is hard
to 'understand' what an assertion is asserting (for a compiler).

	On the other hand, your typecheck operator seems to
provide MORE information, precisely because it is a far more
specific test: assert statements can check almost anything,
whereas the typecheck operator very specifically asserts
something has a particular type, allowing inferencing
to do a lot more work, and a lot of typechecking and
method lookup to be reduced or eliminated.

	You designed it for this reason!
But we will lose quite a bit of performance if we
have to keep checking at run time, and raising
exceptions. The reason is that if the exception
is caught -- the type that it asserts an object
have cannot be assumed anymore -- the object can
have any type:

	try: y = x ! Int
 		# assert y,x are ints ..
	except TypeError: pass
	y .. x

We cannot say anything about the type of y and x here.
To be sure, we can do some control flow analysis
in this case, and notice that we lose information
where the TypeError is caught.

So you COULD argue that requiring an exception be
thrown doesn't cost that much (only the test) 
because information is lost only outside the
'try' body. I might even agree. But the point
is that we would then be discussing the cost of various
possible restrictions -- not insisting that every exception
must be generated exactly as in the CPython 1.5.2 interpreter.
We do not HAVE to require the CPython 1.5.2 behaviour
in a compiler, we should just weigh up the tradeoffs.

Just to give you an example:

	try: y = x ! int
	except: ValueError ...

You might think this didn't catch a failed
exception, and that y and x had to be ints .. until
you realised that because Python is so dynamic,
someone may have said:

	ValueError = TypeError 

somewhere. So you might have to throw away all
type information whenever you saw ANY except clause,
just to be sure you conformed to the requirements.
And you might just want to ban changing the meaning
of the standard exceptions!

	I have a final word. I have a vested interest
in exceptions being thrown in many cases: my interscript
program relies on catching exceptions when client script
contains errors: it reports the error, and then continues
on. The program is robust: a useful property for a literate
programming tool, where bugs result in scrambled output,
rather than a core dump. But I ALSO want interscript
to run at least 1000 times faster. A compromise is needed
which supports BOTH usages. So I'm NOT in the
'exceptions are evil camp' after all, I in the
'what compromises preserve as much existing behaviour
as possible while still allowing effective optimisation
and type checking?'

	I.e. I think we're on the same side, or ought to be:
looking for a suitable compromise. Probably, we can at best
analyse -- and try out -- various possibilities, with Guido's
Guidance <g> to help, and knowing he's making the final
decision (at least for CPython <g>) anyhow.

-- 
John Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850