
Hi, Here is as promized a few words about the "brainstorming" at Python UK about the annotation stuff. The general idea is to leverage what we have already built for translation, including the annotation-propagation framework, but replace the annotations themselves with something more reasonable. In other words if you don't know exactly what annotations are, fine: now we want to drop them. "Annotations" were the mean through which typing information about specific variables were stored. We'd like to replace them with explicit "abstract objects", which would be instances of simple classes like: class SomeObject: "Nothing is known about this value." class SomeInteger: def __init__(self, start=-sys.maxint-1, end=sys.maxint+1): self.start = start self.end = end class SomeList: def __init__(self, anyitem): self.anyitem = anyitem # another SomeXxx instance So the "type inference" phase of the translation would take a control flow graph as input, as it currently does; then its goal is to associate one SomeXxx instance to each variable. This information can then be used by the code generator (Pyrex or Lisp). A new idea inspired by Starkiller, which is an important difference with the old way annotations worked, is that all the SomeXxx instances are always immutables. For example, SomeInteger(0,10) is and will always mean an integer in range(0,10). If later a more general value is found that could arrive into the same variable, then the variable gets associated to a new, more general instance like SomeInteger(-10,10). This change triggers a recomputation of the type inference, and the new SomeInteger(-10,10) will eventually propagate forward to wherever the variable is used. Here is an example: Block(v1,v2,v3): v4 = v1+v2 v5 = v4+v3 jump to Block2 with arg (v5) If the first time Block is entered, v1, v2 and v3 all contained SomeInteger(0,10), then type inference will deduce that v4 contains SomeInteger(0,19) and v5 contains SomeInteger(0,28), and this SomeInteger(0,28) will be sent to Block2 as its input argument, and Block2 will be analysed accordingly. But if later, we find out that Block can also be entered with v1 being SomeInteger(-5,5), then we compute the union of this new value with the previous one: SomeInteger(0,10)|SomeInteger(-5,5) = SomeInteger(-5,10). Then we re-propagate the values into Block, disregarding what we previously found, and get SomeInteger(-5,28) into v5. Then we send this value to Block2, and the same process is repeated there: if this is more general than what we have already seen for Block2, then we generalize and recompute Block2. This process continues until we reach a fixpoint. (Of course in practice I suppose that we won't have things like SomeInteger(0,10), but rather e.g. SomeInteger(0,sys.maxint+1) to mean a non-negative integer. We could just define SomeInteger to have a flag "can I be negative". The purpose of knowing when integers are not negative is typically to simplify list indexing: lst[i] is easier to implement if we know that i>=0.) Having immutable SomeXxx instances even for mutable objects is very useful for the object-oriented part of the analysis. Say that a SomeInstance would represent a Python instance, in the old-style way: a __class__ and some attributes. The SomeInstance is created at the point in the flow graph that creates the instance: v2 = simple_call(<class X>) This would register somewhere that the class X can be instanciated from this point in the flow graph. It would then create and store into v2 a SomeInstance of class X, which initially has no known attribute. If an attribute is added later on v2, then the SomeInstance detects it is not general enough. It kills the inference process; it records the existence of the new attribute in some data structure on the class X; and it marks all the creation point of instances of X as invalid again. This will restart the type inference from there, and the creation points will now build SomeInstances that can accomodate the extra attribute. The same occurs if we figure out that an existing attribute wasn't general enough, e.g. when we store a possibly negative value into an attribute known to be SomeInteger(0,10). It seems that we will end up cancelling and restarting large parts of the analysis over and over again this way, but it may not be a big problem in practice: we can expect that a lot of information about the attributes will be known already after the __init__ call completed. We may stop and re-analyse __init__ itself 7 times consecutively if it defines 7 attributes on self, each time progressing a bit further until the next new attribute kills us, but that shouldn't be a big problem because it is fairly local (but we'll have to try to analyse bigger programs to really know). This is much cleaner this way than with the annotation stuff. A bientôt, Armin.

Hi Armin, [Armin Rigo Tue, Apr 27, 2004 at 10:15:50PM +0100]
Maybe it makes sense to talk about low-level information/representation objects rather than "abstract objects" which isn't a very expressive term IMO.
Makes sense to me ...
Actually we have triggered re-computation in previous schemes, too. Keeping the low level representation objects "virtually immutable" seems like a good simplifying idea ... if it works ...
Ok, i can see this working for variables containing "naturally" immutable objects like integers, strings and numbers. But how does the example apply to building a list in a loop? I am a bit doubtful about a "virtually immutable" SomeList object unless you intend to use a low-level representation like e.g.: class SomeGrowingList: self.r_items = someitem # a SomeXxx instance general enough to # hold all possible items of the list self.r_indexes = SomeInteger(0, sys.maxint-1) Is something like this the underlying idea to allow "virtually immutable" low level representation objects of would-otherwise-be-mutable objects? Note that i guess that marking low level representations with 'r_' or some such might make sense. This is similar to what we do for app-level representation objects with 'w_' to indicate they are 'wrapped' and only an object space knows how to operate on them.
It should probably store it in the SomeInstance instance associated to class X, right?
Yes, i agree that it seems so. But we have had schemes coming and failing so i am eager to get an idea of how it works for the problem cases (lists, instances, ...) we can identify. Also how to represent exceptions at a lower level and then translate them to the target language is not clear yet. cheers, holger

Hello Holger, On Wed, Apr 28, 2004 at 11:45:08AM +0200, holger krekel wrote:
Yes, though they are not really representations yet. "Representation" objects would be more dependent on the code generator. They are really "types" in the theoretical sense: sets of values. We could say "subset objects" and use the s_ prefix for SomeXxx instances.
There are two different issues here. One is how to make sure that the analysis eventually terminates; a problem would be if a subset object is replaced by a gradually more general one all the time, endlessly. The other is mutability. For an example of the former: SomeInteger(0,1) -> SomeInteger(0,2) -> SomeInteger(0,3) -> SomeInteger(0,4) -> ... To prevent that, we could use more simple subset objects that don't have this problem, e.g. SomeIntegers that can only be either "known-to-be-positive" or not. For lists we can just ignore the index and assume that the list has an unknown length, so that SomeList only has to deal with a 's_item' attribute general enough for all items; if 's_item' cannot be more and more general ad infinitum, then SomeList instances cannot either. If we try to record too much about the length, we could get SomeList(length=0) -> SomeList(length=0 or 1) -> SomeList(length=0 or 1 or 2) -> ... The other issue is that of mutable objects (lists or instances). The idea of recording the creation points and re-analysing from there if necessary applies to lists as well as instances: Block1: v1 = list() This creates a list of "impossible" elements -- i.e. a list whose elements are values in the empty set, which is the less general set. cp = CreationPoint(Block1, r_item=SomeImpossibleValue()) v1 = SomeList(r_item=cp.r_item, creation_point=cp) If later we figure out that the list could contain integers, e.g. because we see an operation like: simple_call(v1.append, v2) # v2 is SomeInteger() Then we stop the analysis, and update the creation point object (which is not immutable, unlike SomeXxx instances) to contain the more general value: v1.creation_point.r_item = union(v1.creation_point.r_item, v2) And we restart the analysis from Block1: cp = find_existing_creation_point() v1 = SomeList(r_item=cp.r_item, creation_point=cp) So the previous SomeList instance is never modified, but just dropped and replaced by a new one. The creation point itself is updated to include information about how general the lists it creates should be. It is much cleaner to do it this way rather than patch the SomeList directly, because the SomeList could already have been used at a lot of places, which would then have to be manually invalidated because the SomeList became more general (this is what we did with annotations). The dependencies are clear: only the CreationPoint object is modified, and it is only read during the creation operation, so only the block containing this operation has to be invalidated. Similarily, for object instances, the SomeInstance is never modified. It contains a reference to an InstanceCreationPoints, which records: * all the blocks where an instance of a specific class is created; * which attributes these instances should be given. Only the InstanceCreationPoints is updated when an attribute must be added or generalized; the already-existing SomeInstances are not altered or actively deleted, but just forgotten. The next generation of SomeInstances for the same InstanceCreationPoints will replace them. To do that, the mutable subset objects SomeList and SomeInstance could have a "generation number": an object for the same creation point but with a more recent generation number is more general (and thus replace during union) an object from an older generation. Using objects from outdated generations could raise an exception that stops further analysis, thus waiting for an object from the latest generation to reach that point and re-trigger the analysis. A bientôt, Armin.

Hi, On Tue, Apr 27, 2004 at 10:15:50PM +0100, Armin Rigo wrote:
I experimented in this direction at: http://codespeak.net/svn/pypy/branch/typeinference/ annotation.model contains the basic Some* classes annotation.binaryop has the logic for the union, plus other binary ops annotation.test.test_model Together they replace the horribly obscure annotation.annset and its merge nightmare, which is gone together with its test suite. annotation.factory factories remember how to create mutable objects The rest (in pypy/translator) is in a no-test-passing state, but very simple functions like simple_func (i -> i+1) can already be successfully analysed and Pyrexed with translator/translator.py. Armin

Hello all, On Sat, May 01, 2004 at 08:11:40AM +0100, Armin Rigo wrote:
The branch has been merged back to the trunk. All tests pass (minus one in pypy/interpreter that didn't pass before), and there are a few new tests too. A good starting point to understand annotation: pypy/annotation/model.py Enjoy! Armin

Hi Armin, [Armin Rigo Tue, Apr 27, 2004 at 10:15:50PM +0100]
Maybe it makes sense to talk about low-level information/representation objects rather than "abstract objects" which isn't a very expressive term IMO.
Makes sense to me ...
Actually we have triggered re-computation in previous schemes, too. Keeping the low level representation objects "virtually immutable" seems like a good simplifying idea ... if it works ...
Ok, i can see this working for variables containing "naturally" immutable objects like integers, strings and numbers. But how does the example apply to building a list in a loop? I am a bit doubtful about a "virtually immutable" SomeList object unless you intend to use a low-level representation like e.g.: class SomeGrowingList: self.r_items = someitem # a SomeXxx instance general enough to # hold all possible items of the list self.r_indexes = SomeInteger(0, sys.maxint-1) Is something like this the underlying idea to allow "virtually immutable" low level representation objects of would-otherwise-be-mutable objects? Note that i guess that marking low level representations with 'r_' or some such might make sense. This is similar to what we do for app-level representation objects with 'w_' to indicate they are 'wrapped' and only an object space knows how to operate on them.
It should probably store it in the SomeInstance instance associated to class X, right?
Yes, i agree that it seems so. But we have had schemes coming and failing so i am eager to get an idea of how it works for the problem cases (lists, instances, ...) we can identify. Also how to represent exceptions at a lower level and then translate them to the target language is not clear yet. cheers, holger

Hello Holger, On Wed, Apr 28, 2004 at 11:45:08AM +0200, holger krekel wrote:
Yes, though they are not really representations yet. "Representation" objects would be more dependent on the code generator. They are really "types" in the theoretical sense: sets of values. We could say "subset objects" and use the s_ prefix for SomeXxx instances.
There are two different issues here. One is how to make sure that the analysis eventually terminates; a problem would be if a subset object is replaced by a gradually more general one all the time, endlessly. The other is mutability. For an example of the former: SomeInteger(0,1) -> SomeInteger(0,2) -> SomeInteger(0,3) -> SomeInteger(0,4) -> ... To prevent that, we could use more simple subset objects that don't have this problem, e.g. SomeIntegers that can only be either "known-to-be-positive" or not. For lists we can just ignore the index and assume that the list has an unknown length, so that SomeList only has to deal with a 's_item' attribute general enough for all items; if 's_item' cannot be more and more general ad infinitum, then SomeList instances cannot either. If we try to record too much about the length, we could get SomeList(length=0) -> SomeList(length=0 or 1) -> SomeList(length=0 or 1 or 2) -> ... The other issue is that of mutable objects (lists or instances). The idea of recording the creation points and re-analysing from there if necessary applies to lists as well as instances: Block1: v1 = list() This creates a list of "impossible" elements -- i.e. a list whose elements are values in the empty set, which is the less general set. cp = CreationPoint(Block1, r_item=SomeImpossibleValue()) v1 = SomeList(r_item=cp.r_item, creation_point=cp) If later we figure out that the list could contain integers, e.g. because we see an operation like: simple_call(v1.append, v2) # v2 is SomeInteger() Then we stop the analysis, and update the creation point object (which is not immutable, unlike SomeXxx instances) to contain the more general value: v1.creation_point.r_item = union(v1.creation_point.r_item, v2) And we restart the analysis from Block1: cp = find_existing_creation_point() v1 = SomeList(r_item=cp.r_item, creation_point=cp) So the previous SomeList instance is never modified, but just dropped and replaced by a new one. The creation point itself is updated to include information about how general the lists it creates should be. It is much cleaner to do it this way rather than patch the SomeList directly, because the SomeList could already have been used at a lot of places, which would then have to be manually invalidated because the SomeList became more general (this is what we did with annotations). The dependencies are clear: only the CreationPoint object is modified, and it is only read during the creation operation, so only the block containing this operation has to be invalidated. Similarily, for object instances, the SomeInstance is never modified. It contains a reference to an InstanceCreationPoints, which records: * all the blocks where an instance of a specific class is created; * which attributes these instances should be given. Only the InstanceCreationPoints is updated when an attribute must be added or generalized; the already-existing SomeInstances are not altered or actively deleted, but just forgotten. The next generation of SomeInstances for the same InstanceCreationPoints will replace them. To do that, the mutable subset objects SomeList and SomeInstance could have a "generation number": an object for the same creation point but with a more recent generation number is more general (and thus replace during union) an object from an older generation. Using objects from outdated generations could raise an exception that stops further analysis, thus waiting for an object from the latest generation to reach that point and re-trigger the analysis. A bientôt, Armin.

Hi, On Tue, Apr 27, 2004 at 10:15:50PM +0100, Armin Rigo wrote:
I experimented in this direction at: http://codespeak.net/svn/pypy/branch/typeinference/ annotation.model contains the basic Some* classes annotation.binaryop has the logic for the union, plus other binary ops annotation.test.test_model Together they replace the horribly obscure annotation.annset and its merge nightmare, which is gone together with its test suite. annotation.factory factories remember how to create mutable objects The rest (in pypy/translator) is in a no-test-passing state, but very simple functions like simple_func (i -> i+1) can already be successfully analysed and Pyrexed with translator/translator.py. Armin

Hello all, On Sat, May 01, 2004 at 08:11:40AM +0100, Armin Rigo wrote:
The branch has been merged back to the trunk. All tests pass (minus one in pypy/interpreter that didn't pass before), and there are a few new tests too. A good starting point to understand annotation: pypy/annotation/model.py Enjoy! Armin
participants (2)
-
Armin Rigo
-
holger krekel