[Types-sig] Type Inference II

Sat, 18 Dec 1999 12:49:27 +1100

In part I of this, I discussed the issues related to 
the Python language specification and exception handling
and the impact on type inference. 

Summary: certain operations which can throw exception
limit the effectiveness of type inference, but it may
be relatively easy to clean up the semantics by specifying
that many of the exceptions must be caught locally
or the program is in error.  In this part, I continue
by looking at other parts of the language which may
restrict the ability to generate optimised code.

TYPE INFERENCE II: SOME SPECIAL CASES
-------------------------------------

Freezing.
---------

Type inference can be severely inhibited by the fact
that Python permits programmers to modify modules
after they have been imported. Indeed, it is not
only _type_ inference that is affected, but also 
inability to cache values, or inline functions,
that is affected.

For example, if a module m contains a variable 'x'
and a function f, and we know 'x' is not modified
after the module is loaded, then we can replace 

	m.x

with the actual value of x. There are a lot of such
constants in Python library modules, for example,
in socket and errno.

A similar argument applies to functions: we cannot
inline a function call, if we do not know the body
of the function. In the case of a call like:

	m.f(1,2,3)

we cannot easily inline f, because we don't know the
user hasn't replaced the original f with something else.

It may seem at first this issue is independent of
'type inference', but that is not the case. First,
consider C and C++, in which 'const' is part of the type
system. Second, consider inference applied to an expression

	y + m.x

Here, if we know m.x is an integer -- we don't care about
the value -- we can deduce that y is also an integer
(ignoring __add__ methods for clarity). But if m.x
could be rebound after module loading, we don't know
what the type is.

Now, I want to explain a bit about how Viperc is going to work.
[At least, my current ideas .. which may change :-]
Viperc begins by loading modules: it _executes_ the modules
using the interpreter Viperi. There's no compilation here!
But AFTER the modules are imported dynamically -- with the full
power of all the dynamism of the interpreter -- the modules
are 'frozen'. The compiler assumes that the modules cannot
be tampered with.

As a result, Viperc can do type inference on the mainline,
since the type of _every_ module attribute is known,
and indeed, the values are known too. Indeed, it can inline
the functions, and other nice 'global analysis' -- but it
is NOT working with the 'source code' or 'AST' for a module:
it is working directly with built, run time, objects.

Of course, none of this can work, if the mainline is allowed
to modify the module contents after the modules are imported:
the importing is done at compile time using the interpreter,
the modules don't even _exist_ in the generated executable.

[There is a more difficult case: freezing classes, and even
instances. Perhaps more, later.]

Ok, now I'm going to backstep. Assuming all imported modules
are frozen after importing is overkill. I think -- not sure --
that we need bit more flexibility than that: the optimisation
is massive, but it is useless if it kills sensible semantics.

To see what 'sensible semantics' means, I will first consider
where freezing is more or less mandatory: the answer is,
when you have a threaded implementation or other re-entering
code, then modifying global variables is a bad idea.

But not everyone is doing that. For a start, it isn't
always clear when importing is finished: interscript,
for example imports modules 'on the fly' in response
to user requests -- this is necessary, because some of the
modules represent typesetters (LaTex, HTML etc), and
the set of these is 'user installable'.

Secondly, it is common to 'patch' functions in modules,
just once, in other modules: for example MA-Lemburg's
Tools do that. There is an obvious 'semantic equivalence'
requirement here so code that doesn't need the extra functionality
will still work, but there is still a use for dynamically
changing module variables after modules are loaded.

What can help here? I don't know. Some ideas include

	const MYCONSTANT = 1

with a specification that changing a 'const' value
after initialisation is an error. For imports,

	import const MODULE

says that changing any attribute of MODULE after importing
IN THE CALLING MODULE is an error (this does not stop
other importers changing it though .. :-(

I'd sure like to see some 'pythonic' ideas here.

LOOP VARIABLES
--------------

Another case which inhibits optimisation is loop variables.
In the loop:

	for x in y: ....

is it allowed to assign to x? What about mutating y?
What about mutating x? [Also, an aside: the code

	for x[1] in y: ...
	for x.attr in y: ...

is allowed but I can't see a real use for it.
Is there one? Could we simplify the syntax,
and required the loop control to be a whole
variable, or tuple of whole variables (recurively),
so that the names involved are always bound directly?

The idea with loop optimisation is that we can:

	(1) keep the loop index in a register 
	(2) cache the sequence
	(3) generate sequence values for 'range(..)' lazily

Tightening up for loops will break code that does things like:

	(1) do extra increments on an loop variable to skip cases
	(2) mutate a list while scanning it

but these seem to lead to newbie posts on c.l.p anyhow.

RESTRICTED SCOPE EXTENSION/RENAMING?
------------------------------------

At present it is sometimes necessary to destroy temporary variables
like this:

	x = value
	...
	del x

There are some examples in the standard library; this
occurs when a temporary is created doing calculations
in a module, but is not meant to be there to use
after the module is imported. Another case occurs like this:

	_f = f # protect our f
	from MODULE import * # might destroy f
	f = _f # set f back
	del _f   # get rid of temporary _f

This is ugly. It also makes inference harder, because
there is now a control flow issue.

One idea I had was this:

	import X as Y   # import X, but name it Y in this module
	from M import x as v

A related idea is:

	import private X
	private x = 1
	private def f(qqq): ...

which makes these name visible in the module, so you can use
unqualified lookup within the module, but so that

	MODULE.X  # fails, X is not visible via M
	MODULE.x  # fails
	MODULE.f  # fails

that is, so external clients cannot see X as an attribute
of MODULE. This is actually easy to implement in Viper,
since it uses a concept of 'environments' for unqualified
lookup: a private name would put into a special private
dictionary which is looked up inside the module,
but which is not part of the module dictionary.
[The 'envionment' looks up both dictionaries, qualified
acces via operator dot does not]

Yet another idea is genuinely temporary variables:

	temporary x = 1

which are automatically destroyed at the end of module loading.

I note python currently supports privacy by name mangling,
but really, this is a hack: for Python 2, a more sophisticated
architecture would be better.

------------

A final comment: it is useful to _implement_ ideas to test
them out. It is damn hard to do that with CPython, because it
it written in C, unless you have extensive familiarity with
the source.

Viper, on the other hand, is written in ocaml, and it is MUCH
easier to play with extensions (and write compilers) in ocaml
than in C. Viper is available for evaluation, if anyone wants to play
with it,
but you will need to know some ML. OTOH, I will be playing with some
extensions anyhow, so if people here have some ideas to try out,
I might be able to implement them relatively easily.
[For example: I extended the grammar to include list comprehensions
in 15 minutes, I expect that implementing the semantics will take
less than an hour. This will be available in the second alpha.]

One thing that is required, though, is a _concrete_ syntax.
I can't implement an idea, even if the semantics are specified,
if there is no syntax! I'm very keen to try 'optional
type declarations', but I need a definite syntax,
not just for the declaration (easy, but also easy to argue about
for a long time), but also for the type (I think this is quite hard).
[Remember Viper uses an extended type system, a type can be any object!]

-- 
John Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850