[pypy-dev] Brainstorming at Python UK

Tue Apr 27 23:15:50 CEST 2004

Hi,

Here is as promized a few words about the "brainstorming" at Python UK about
the annotation stuff.

The general idea is to leverage what we have already built for translation,
including the annotation-propagation framework, but replace the annotations
themselves with something more reasonable.  In other words if you don't know
exactly what annotations are, fine: now we want to drop them.

"Annotations" were the mean through which typing information about specific
variables were stored.  We'd like to replace them with explicit "abstract
objects", which would be instances of simple classes like:

class SomeObject:
  "Nothing is known about this value."

class SomeInteger:
  def __init__(self, start=-sys.maxint-1, end=sys.maxint+1):
    self.start = start
    self.end   = end

class SomeList:
  def __init__(self, anyitem):
    self.anyitem = anyitem     # another SomeXxx instance

So the "type inference" phase of the translation would take a control flow
graph as input, as it currently does; then its goal is to associate one
SomeXxx instance to each variable.  This information can then be used by the
code generator (Pyrex or Lisp).

A new idea inspired by Starkiller, which is an important difference with the
old way annotations worked, is that all the SomeXxx instances are always
immutables.  For example, SomeInteger(0,10) is and will always mean an integer
in range(0,10).  If later a more general value is found that could arrive into
the same variable, then the variable gets associated to a new, more general
instance like SomeInteger(-10,10).  This change triggers a recomputation of
the type inference, and the new SomeInteger(-10,10) will eventually propagate
forward to wherever the variable is used.  Here is an example:

  Block(v1,v2,v3):
    v4 = v1+v2
    v5 = v4+v3
    jump to Block2 with arg (v5)

If the first time Block is entered, v1, v2 and v3 all contained
SomeInteger(0,10), then type inference will deduce that v4 contains
SomeInteger(0,19) and v5 contains SomeInteger(0,28), and this
SomeInteger(0,28) will be sent to Block2 as its input argument, and Block2
will be analysed accordingly.  But if later, we find out that Block can also
be entered with v1 being SomeInteger(-5,5), then we compute the union of this
new value with the previous one: SomeInteger(0,10)|SomeInteger(-5,5) =
SomeInteger(-5,10).  Then we re-propagate the values into Block, disregarding
what we previously found, and get SomeInteger(-5,28) into v5.  Then we send
this value to Block2, and the same process is repeated there: if this is more
general than what we have already seen for Block2, then we generalize and
recompute Block2.  This process continues until we reach a fixpoint.

(Of course in practice I suppose that we won't have things like
SomeInteger(0,10), but rather e.g. SomeInteger(0,sys.maxint+1) to mean a
non-negative integer.  We could just define SomeInteger to have a flag "can I
be negative".  The purpose of knowing when integers are not negative is
typically to simplify list indexing: lst[i] is easier to implement if we know
that i>=0.)

Having immutable SomeXxx instances even for mutable objects is very useful for
the object-oriented part of the analysis.  Say that a SomeInstance would
represent a Python instance, in the old-style way: a __class__ and some
attributes.  The SomeInstance is created at the point in the flow graph that
creates the instance:

   v2 = simple_call(<class X>)

This would register somewhere that the class X can be instanciated from this
point in the flow graph.  It would then create and store into v2 a
SomeInstance of class X, which initially has no known attribute. If an
attribute is added later on v2, then the SomeInstance detects it is not
general enough.  It kills the inference process; it records the existence of
the new attribute in some data structure on the class X; and it marks all the
creation point of instances of X as invalid again.  This will restart the type
inference from there, and the creation points will now build SomeInstances
that can accomodate the extra attribute.  The same occurs if we figure out
that an existing attribute wasn't general enough, e.g. when we store a
possibly negative value into an attribute known to be SomeInteger(0,10).

It seems that we will end up cancelling and restarting large parts of the
analysis over and over again this way, but it may not be a big problem in
practice: we can expect that a lot of information about the attributes will be
known already after the __init__ call completed.  We may stop and re-analyse
__init__ itself 7 times consecutively if it defines 7 attributes on self, each
time progressing a bit further until the next new attribute kills us, but that
shouldn't be a big problem because it is fairly local (but we'll have to try
to analyse bigger programs to really know).  This is much cleaner this way
than with the annotation stuff.

A bientôt,

Armin.