From jeremy@alum.mit.edu  Wed Apr 10 05:04:08 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 10 Apr 2002 00:04:08 -0400 (EDT)
Subject: [Compiler-sig] progress on new AST
Message-ID: <0GUC009TO3AVOB@mtaout03.icomcast.net>

I've been working on a new AST defined in ASDL (the Zephyr abstract
syntax definition language).  I've checked in the current work in
python/nondist/sandbox/ast. 

There is a python.asdl that defines an AST that is reasonably
complete, although it has rough edges (slices, etc.).  I've also
written a simple C code generator that turns the ast definition into C
code that defines structs and constructor functions.

I think the next step is to work on a transformer that translates that
concrete syntax into the AST.  I'd also like to write a code to
generate ASDL pickles, but that's a lower priority.

Note that while there is a next step, I don't think any of the earlier
steps is done.  I expect the AST will change several times before
we're done.  

I expect the C representation will also change.  I belive, for
example, that Guido prefers something more like the current
representation which uses a variable length array of child pointers.
And the NCH(), CHILD(), and REQ() macros.  I've been trying to avoid
this style, because it leaves the author stuck with a bunch of small
integers in place of names.

Anyone interested in pitching in?  I'd be happy to have feedback or
help.

Jeremy





From skip@pobox.com  Wed Apr 10 05:23:36 2002
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 9 Apr 2002 23:23:36 -0500
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
Message-ID: <15539.48712.780052.112027@12-248-41-177.client.attbi.com>

    Jeremy> I've been working on a new AST defined in ASDL (the Zephyr
    Jeremy> abstract syntax definition language).  I've checked in the
    Jeremy> current work in python/nondist/sandbox/ast.
    ...
    Jeremy> Anyone interested in pitching in?  I'd be happy to have feedback
    Jeremy> or help.

What needs to be pitched?  I'm generally more familiar with pitching stuff
out, but not in a software setting.

Unfamiliar as I am with where this is headed, I will abstract a post to
c.l.py from a couple days ago that has so far gone unanswered and ask a
question:

    How difficult is it to change the parser that the following:

    ==guettli@sonne:~/tmp$ python ~/scripts/replace_recursive.py
      File "/home/guettli/scripts/replace_recursive.py", line 17
        in=open(temp)
         ^
    SyntaxError: invalid syntax

    will print "SyntaxError: invalid syntax. 'in' is an reserved word"?

Will the new ASDL code eventually lead to more user-friendly error messages
and decent enough error recovery that it won't have to give up after the
first syntax error it encounters?

Skip



From jeremy@zope.com  Wed Apr 10 13:55:26 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Wed, 10 Apr 2002 08:55:26 -0400
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <15539.48712.780052.112027@12-248-41-177.client.attbi.com>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <15539.48712.780052.112027@12-248-41-177.client.attbi.com>
Message-ID: <15540.13886.642125.41974@slothrop.zope.com>

>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:

  SM> What needs to be pitched?  I'm generally more familiar with
  SM> pitching stuff out, but not in a software setting.

Reviewing the AST to make sure it accurately describes Python.  Once I
get started on the transformer, there will be lots of code to write.
I don't know how easy it will be to split that task up.

  SM> Will the new ASDL code eventually lead to more user-friendly
  SM> error messages and decent enough error recovery that it won't
  SM> have to give up after the first syntax error it encounters?

Unfortunately, no.  The ASDL stuff describes the AST -- a compiler
intermediate representation.  The error recovery needs to be added to
the parser.

Jeremy




From skip@pobox.com  Wed Apr 10 15:40:16 2002
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 10 Apr 2002 09:40:16 -0500
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <15540.13886.642125.41974@slothrop.zope.com>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <15539.48712.780052.112027@12-248-41-177.client.attbi.com>
 <15540.13886.642125.41974@slothrop.zope.com>
Message-ID: <15540.20176.479968.66870@12-248-41-177.client.attbi.com>

    SM> What needs to be pitched?  I'm generally more familiar with pitching
    SM> stuff out, but not in a software setting.

    Jeremy> Reviewing the AST to make sure it accurately describes Python.
    Jeremy> Once I get started on the transformer, there will be lots of
    Jeremy> code to write.  I don't know how easy it will be to split that
    Jeremy> task up.

Reading?  I think I can read.

    SM> Will the new ASDL code eventually lead to more user-friendly error
    SM> messages and decent enough error recovery that it won't have to give
    SM> up after the first syntax error it encounters?

    Jeremy> Unfortunately, no.  The ASDL stuff describes the AST -- a
    Jeremy> compiler intermediate representation.  The error recovery needs
    Jeremy> to be added to the parser.

My mistake.  I was looking at the checkin messages and thinking I was
looking at grammar changes.

Skip





From bckfnn@worldonline.dk  Wed Apr 10 18:05:25 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Wed, 10 Apr 2002 17:05:25 GMT
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
Message-ID: <3cb46b73.25619278@mail.wanadoo.dk>

[Jeremy Hylton]

>I've been working on a new AST defined in ASDL (the Zephyr abstract
>syntax definition language).  I've checked in the current work in
>python/nondist/sandbox/ast.

Thanks.

>There is a python.asdl that defines an AST that is reasonably
>complete, although it has rough edges (slices, etc.). 

Keep in mind that I'm a newbie at reading asdl, but how is it expressed
that a 'Module' contain a list of 'stmts', while a FunctionDef only
contain one 'name'?

>I've also
>written a simple C code generator that turns the ast definition into C
>code that defines structs and constructor functions.

I'm playing around with generating java code and all the needed
information seems to be available, but I can't quite make sense of the
basic idea behind the datastructures we are generating from. What is a
Sum and what is a Product in this sense?

regards,
finn



From jeremy@zope.com  Wed Apr 10 23:55:18 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Wed, 10 Apr 2002 18:55:18 -0400
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <3cb46b73.25619278@mail.wanadoo.dk>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <3cb46b73.25619278@mail.wanadoo.dk>
Message-ID: <15540.49878.108194.364668@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> [Jeremy Hylton]
  >> There is a python.asdl that defines an AST that is reasonably
  >> complete, although it has rough edges (slices, etc.).

  FB> Keep in mind that I'm a newbie at reading asdl,

I'd recommend you read Dan Wang's DSL 97 paper:
http://www.cs.princeton.edu/~danwang/Papers/dsl97/dsl97-abstract.html.
It's an easy read.  It describes the ASDL syntax and shows small
examples of an AST and code generated for C and Java.  

  FB> Keep in mind that I'm a newbie at reading asdl, but how is it
  FB> expressed that a 'Module' contain a list of 'stmts', while a
  FB> FunctionDef only contain one 'name'?

Your question pointed out an embarassing bug in the python.asdl file
:-).  If we take an example "constructor" (with fix applied):

    stmt = ClassDef(identifier name, expr* bases, stmt* body)

The lhs is the name of the type, the rhs is a constructor signature.
The constructor takes three arguments.  The type is on the left, the
name is on the right.  identifier is a builtin type.  expr and stmt
are defined in python.asdl.  There are two type modifiers * and ?.
The * means sequence of 0 or more.  The ? means optional.

So a class has a single name, an arbitrary number of base class
expressions, and an arbitrary number of stmts.  

The bug is that Module, FunctionDef, and ClassDef were define to
contain a single statement.  I'm sure that's what confused you.

  >> I've also written a simple C code generator that turns the ast
  >> definition into C code that defines structs and constructor
  >> functions.

  FB> I'm playing around with generating java code and all the needed
  FB> information seems to be available, but I can't quite make sense
  FB> of the basic idea behind the datastructures we are generating
  FB> from. What is a Sum and what is a Product in this sense?

A Sum is a set of type constructors -- so stmt is a sum type.  A
Product is like listcomp -- a single unnamed constructor.  For a sum
type, a value can be any one of the constructors.  For a product,
there is only one constructor.

The DSL paper represents a sum as a C union with a struct element for
each constructor.  It is silent on products, but I've chosen to
represent it as a single struct.

Feel free to check in any Java-generating code in the sandbox.

Jeremy





From bckfnn@worldonline.dk  Thu Apr 11 13:38:13 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 11 Apr 2002 12:38:13 GMT
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <15540.49878.108194.364668@slothrop.zope.com>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb46b73.25619278@mail.wanadoo.dk> <15540.49878.108194.364668@slothrop.zope.com>
Message-ID: <3cb56519.1150524@mail.wanadoo.dk>

[Jeremy]

>I'd recommend you read Dan Wang's DSL 97 paper:
>http://www.cs.princeton.edu/~danwang/Papers/dsl97/dsl97-abstract.html.
>It's an easy read.  It describes the ASDL syntax and shows small
>examples of an AST and code generated for C and Java.  

Thanks.

>
>  FB> Keep in mind that I'm a newbie at reading asdl, but how is it
>  FB> expressed that a 'Module' contain a list of 'stmts', while a
>  FB> FunctionDef only contain one 'name'?
>
>Your question pointed out an embarassing bug in the python.asdl file
>:-).  If we take an example "constructor" (with fix applied):
>
>    stmt = ClassDef(identifier name, expr* bases, stmt* body)
>
>The lhs is the name of the type, the rhs is a constructor signature.
>The constructor takes three arguments.  The type is on the left, the
>name is on the right.  identifier is a builtin type.  expr and stmt
>are defined in python.asdl.  There are two type modifiers * and ?.
>The * means sequence of 0 or more.  The ? means optional.
>
>So a class has a single name, an arbitrary number of base class
>expressions, and an arbitrary number of stmts.  
>
>The bug is that Module, FunctionDef, and ClassDef were define to
>contain a single statement.  I'm sure that's what confused you.

Indeed, I couldn't quite make it add up. I then guess the same problem
still exists for the remaining uses of 'stmt' in For, While, If,
TryExcept and TryFinally?

Will the optional 'else:' part of For, While and If be handled as a zero
length list stmt's? Or maybe the optional '?' operator can be used for
an optional sequence?


>  >> I've also written a simple C code generator that turns the ast
>  >> definition into C code that defines structs and constructor
>  >> functions.
>
>  FB> I'm playing around with generating java code and all the needed
>  FB> information seems to be available, but I can't quite make sense
>  FB> of the basic idea behind the datastructures we are generating
>  FB> from. What is a Sum and what is a Product in this sense?
>
>A Sum is a set of type constructors -- so stmt is a sum type.  A
>Product is like listcomp -- a single unnamed constructor.  For a sum
>type, a value can be any one of the constructors.  For a product,
>there is only one constructor.

Thanks, that helped.

>The DSL paper represents a sum as a C union with a struct element for
>each constructor.  It is silent on products, but I've chosen to
>represent it as a single struct.


>Feel free to check in any Java-generating code in the sandbox.

Will do, eventually.

There are some restrictions on the java code, typically naming that I
have to deal with somehow and i'm not sure what can be changed in the
.asdl and what must be handled in my generator. 

- Would it be OK to rename the 'final' arg in TryFinally to f.ex
'finalbody'? 'final' is a java reserved word.

- Would it be OK to change the name of 'String' and 'Number'? Java
classes with these names already exists in the java.lang package and it
is annoying to work with userclasses with these names.


For example:

Index: python.asdl
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v
retrieving revision 1.8
diff -u -r1.8 python.asdl
--- python.asdl 10 Apr 2002 23:03:32 -0000      1.8
+++ python.asdl 11 Apr 2002 12:34:31 -0000
@@ -24,7 +24,7 @@
              -- 'type' is a bad name
              | Raise(expr? type, expr? inst, expr? tback)
              | TryExcept(stmt body, except* handlers)
-             | TryFinally(stmt body, stmt final)
+             | TryFinally(stmt body, stmt finalbody)
              | Assert(expr test, expr? msg)

              -- may want to factor this differently perhaps excluding
@@ -59,8 +59,8 @@
                         expr? starargs, expr? kwargs)
             | Repr(expr value)
             | Lvalue(assign lvalue)
-            | Number(string n) -- string representation of a number
-            | String(string s) -- need to specify raw, unicode, etc?
+            | Num(string n) -- string representation of a number
+            | Str(string s) -- need to specify raw, unicode, etc?
             -- other literals? bools?

        -- the subset of expressions that are valid as the target of


regards,
finn



From bckfnn@worldonline.dk  Thu Apr 11 20:53:54 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 11 Apr 2002 19:53:54 GMT
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
Message-ID: <3cb5e798.34557581@mail.wanadoo.dk>

[Jerymy]

>I've been working on a new AST defined in ASDL ...

Another question: Whats with the Lvalue constructor? It strikes me as a
somewhat strange way of describing a 'expr'. Is it just a consequence of
some asdl limitation? Are there a reason for having a Lvalue node in the
AST?

regards,
finn



From jeremy@zope.com  Thu Apr 11 21:15:51 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 11 Apr 2002 16:15:51 -0400
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <3cb5e798.34557581@mail.wanadoo.dk>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <3cb5e798.34557581@mail.wanadoo.dk>
Message-ID: <15541.61175.477803.278880@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> [Jerymy]
  >> I've been working on a new AST defined in ASDL ...

  FB> Another question: Whats with the Lvalue constructor? It strikes
  FB> me as a somewhat strange way of describing a 'expr'. Is it just
  FB> a consequence of some asdl limitation? Are there a reason for
  FB> having a Lvalue node in the AST?

The Lvalue node captures the notion that a limited subset of
expressions can occur in two contexts -- as an expression or as the
target of an assignment.  A single constructor can appear only once;
otherwise its type would be ambiguous.  Example: Name() can be an
expression and the target of an assignment.  A ListComp() is an
expression, but can not be assigned to.  The extra Lvalue()
constructor captures the distinction.

Jeremy





From jeremy@zope.com  Thu Apr 11 21:41:38 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 11 Apr 2002 16:41:38 -0400
Subject: [Compiler-sig] progress on new AST
In-Reply-To: <3cb56519.1150524@mail.wanadoo.dk>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <3cb46b73.25619278@mail.wanadoo.dk>
 <15540.49878.108194.364668@slothrop.zope.com>
 <3cb56519.1150524@mail.wanadoo.dk>
Message-ID: <15541.62722.977090.518394@slothrop.zope.com>

The suggested changes look fine.  I'll add them today.

Jeremy




From bckfnn@worldonline.dk  Sat Apr 13 14:01:58 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sat, 13 Apr 2002 13:01:58 GMT
Subject: [Compiler-sig] Lvalue
In-Reply-To: <15541.61175.477803.278880@slothrop.zope.com>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb5e798.34557581@mail.wanadoo.dk> <15541.61175.477803.278880@slothrop.zope.com>
Message-ID: <3cb803f6.4679168@mail.wanadoo.dk>

[Jeremy]

>The Lvalue node captures the notion that a limited subset of
>expressions can occur in two contexts -- as an expression or as the
>target of an assignment.

Ok, but is it important to express that notion in the AST typesystem?
I'm sure you have given this more though than I have, but I tend to
disagree. Maybe it is just that I loath to see AST like:

: Expr[value=Lvalue[lvalue=Attribute[value=Lvalue[lvalue=Name[id=A]], attr=b]]]

to capture the expression "A.b".

I would prefer an asdl without the "assign" type and instead:

	      | Del(expr* targets)
	      | Assign(expr* targets, expr value)
	      | AugAssign(expr target, operator op, expr value)

Yes, I know that would allow a user to manually create an semanticly
incorrect AST tree, but IMO that is what TypeErrors are good at
expressing.

Just my 2 cent.

regards,
finn



From bckfnn@worldonline.dk  Sat Apr 13 14:02:40 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sat, 13 Apr 2002 13:02:40 GMT
Subject: [Compiler-sig] if .. elif:
In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
Message-ID: <3cb829ba.14347340@mail.wanadoo.dk>

Hi

Maybe I'm missing something, but I think the If() constructor is a bit
too simple to handle a list of 'elif:' parts.

	If(expr test, stmt* body, stmt* orelse)


This here is my take on a solution:


Index: python.asdl
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v
retrieving revision 1.9
diff -w -u -r1.9 python.asdl
--- python.asdl 11 Apr 2002 21:20:19 -0000      1.9
+++ python.asdl 13 Apr 2002 12:55:43 -0000
@@ -19,7 +19,7 @@
              -- need a better solution for that
              | For(expr target, expr iter, stmt* body, stmt* orelse)
              | While(expr test, stmt* body, stmt* orelse)
-             | If(expr test, stmt* body, stmt* orelse)
+             | If(ifpart* tests, stmt* orelse)

              -- 'type' is a bad name
              | Raise(expr? type, expr? inst, expr? tback)
@@ -96,4 +96,6 @@

         -- keyword arguments supplied to call
         keyword = (identifier arg, expr value)
+
+        ifpart = (expr test, stmt* body)
 }


regards,
finn



From bckfnn@worldonline.dk  Sat Apr 13 17:35:59 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sat, 13 Apr 2002 16:35:59 GMT
Subject: [Compiler-sig] Handler suite in except.
Message-ID: <3cb85bc7.27160504@mail.wanadoo.dk>

Hi,

Another rough edge: I can't see where the except: codeblock should go.
I'm guessing that 'except' should read:

     except = (expr type, identifier? name, stmt* body)

?

regards,
finn



From bckfnn@worldonline.dk  Sat Apr 13 22:13:43 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sat, 13 Apr 2002 21:13:43 GMT
Subject: [Compiler-sig] Import and ImportFrom
Message-ID: <3cb89c85.43734336@mail.wanadoo.dk>

Hi,

I think a refactoring of Import is required in order to support a list
of aliases:

   from p import a, b as c, d as e
   import a, b as c, d as e

How about this definition:

	      | Import(alias* names)
	      | ImportFrom(identifier module, alias* names)

	alias = (identifier name, identifier? asname)

?

regards,
finn



From jeremy@zope.com  Sun Apr 14 05:04:33 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Sun, 14 Apr 2002 00:04:33 -0400
Subject: [Compiler-sig] Re: Lvalue
In-Reply-To: <3cb803f6.4679168@mail.wanadoo.dk>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <3cb5e798.34557581@mail.wanadoo.dk>
 <15541.61175.477803.278880@slothrop.zope.com>
 <3cb803f6.4679168@mail.wanadoo.dk>
Message-ID: <15544.65489.151544.482696@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> [Jeremy]
  >> The Lvalue node captures the notion that a limited subset of
  >> expressions can occur in two contexts -- as an expression or as
  >> the target of an assignment.

  FB> Ok, but is it important to express that notion in the AST
  FB> typesystem?  I'm sure you have given this more though than I
  FB> have, but I tend to disagree. Maybe it is just that I loath to
  FB> see AST like:

  FB> :
  FB> Expr[value=Lvalue[lvalue=Attribute[value=Lvalue[lvalue=Name[id=A]],
  FB> attr=b]]]

  FB> to capture the expression "A.b".

I was (am?) undecided about whether the typesystem should express the
limitation on expressions that are targets of assignments.  The
example above seems a strong argument against an explicit lvalue
type. 

I wonder, though, if we might have a separate set of constructors for
the assign type.  Perhaps instead of LValue we have AssignAttribute,
AssignName, etc.  The AST from the compiler package in the std library
uses this approach, although I don't like the names it uses.  (I don't
like the names I just used either.)

It seems useful to distinguish the two cases, because they are handled
differently inside the compiler.  The bytecode generated is
different, and there needs to be some way for the code generator to
distinguish the cases.  Using LValue or just Expr means that the code
generator needs to track context explicitly to see if the LValue is
being used as an expression or an assignment.  If the constructor is
different for the two cases, there is no need to track the context
separately. 

Jeremy




From jeremy@zope.com  Sun Apr 14 05:09:53 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Sun, 14 Apr 2002 00:09:53 -0400
Subject: [Compiler-sig] Re: if .. elif:
In-Reply-To: <3cb829ba.14347340@mail.wanadoo.dk>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net>
 <3cb829ba.14347340@mail.wanadoo.dk>
Message-ID: <15545.273.951129.448987@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> Hi Maybe I'm missing something, but I think the If() constructor
  FB> is a bit too simple to handle a list of 'elif:' parts.

  FB> 	If(expr test, stmt* body, stmt* orelse)

I was expecting to encode 'elif' parts as a series of new If()
constructors in the orelse slot.

if x == 1:
    print 1
elif x == 2:
    print 2
else:
    print 3

If(Compare(Lvalue(Name(x)), Num("1")),
   [Print(NULL, [Num("1")], False)],
   [If(Compare(Lvalue(Name(x)), Num("2")),
       [Print(NULL, [Num("2")], False)],
       [Print(NULL, [Num("3")], False)])])

Does that make sense?

(And, yuck, the Lvalue is a pain.)

Jeremy





From bckfnn@worldonline.dk  Sun Apr 14 11:10:15 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 14 Apr 2002 10:10:15 GMT
Subject: [Compiler-sig] Changes to TyrExcept
Message-ID: <3cb953a5.3802147@mail.wanadoo.dk>

Hi,

I have checked in a fix to TryExcept and except. If I have misunderstood
how it should be used, please scold me gently.

regards,
finn

Index: python.asdl
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v
retrieving revision 1.10
diff -w -u -r1.10 python.asdl
--- python.asdl 14 Apr 2002 09:23:01 -0000      1.10
+++ python.asdl 14 Apr 2002 10:01:10 -0000
@@ -23,7 +23,7 @@

              -- 'type' is a bad name
              | Raise(expr? type, expr? inst, expr? tback)
-             | TryExcept(stmt* body, except* handlers)
+             | TryExcept(stmt* body, except* handlers, stmt* orelse)
              | TryFinally(stmt* body, stmt* finalbody)
              | Assert(expr test, expr? msg)

@@ -82,7 +82,7 @@

        -- not sure what to call the first argument for raise and except

-       except = (expr type, identifier? name)
+       except = (expr? type, assign? name, stmt* body)

        -- XXX need to handle 'def f((a, b)):'
        arguments = (identifier* args, identifier? vararg,




From bckfnn@worldonline.dk  Sun Apr 14 14:47:32 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 14 Apr 2002 13:47:32 GMT
Subject: [Compiler-sig] funcdef parameters
Message-ID: <3cb96341.7797742@mail.wanadoo.dk>

Hi,

I'm trying to understand the 'arguments' production.

	arguments = (identifier* args, identifier? vararg, 
		     identifier? kwarg, expr* defaults)

First, I'll ignore the posibility of tuple parameter.

I'm guessing that 'vararg' is what is called 'starargs' in the Call
ctor. I'm also guessing that 'defaults' contain the keyword values like
this:


   def foo(a, b, c=1, d=2, *lst, *kw): pass
->
   arguments([a, b, c, d], lst, kw, [1, 2])

or maybe?

   arguments([a, b, c, d], lst, kw, [None, None, 1, 2])



Second, we have to add tuple parameters and the simplest way I can see
looks like this.

@@ -85,8 +85,11 @@
        except = (expr? type, assign? name, stmt* body)

        -- XXX need to handle 'def f((a, b)):'
-       arguments = (identifier* args, identifier? vararg,
+       arguments = (fpdef* args, identifier? vararg,
                     identifier? kwarg, expr* defaults)
+
+        fpdef = FpList(fpdef* list)
+              | FpName(identifier id)

         -- keyword arguments supplied to call
         keyword = (identifier arg, expr value)


regards,
finn



From bckfnn@worldonline.dk  Sun Apr 14 14:49:13 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 14 Apr 2002 13:49:13 GMT
Subject: [Compiler-sig] Re: if .. elif:
In-Reply-To: <15545.273.951129.448987@slothrop.zope.com>
References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb829ba.14347340@mail.wanadoo.dk> <15545.273.951129.448987@slothrop.zope.com>
Message-ID: <3cb98861.17302519@mail.wanadoo.dk>

[Jeremy]

>I was expecting to encode 'elif' parts as a series of new If()
>constructors in the orelse slot.
>
>if x == 1:
>    print 1
>elif x == 2:
>    print 2
>else:
>    print 3
>
>If(Compare(Lvalue(Name(x)), Num("1")),
>   [Print(NULL, [Num("1")], False)],
>   [If(Compare(Lvalue(Name(x)), Num("2")),
>       [Print(NULL, [Num("2")], False)],
>       [Print(NULL, [Num("3")], False)])])
>
>Does that make sense?

I guess it will work, but I don't like it much. It doesn't feel honest
to the actual python syntax. 

regards,
finn



From bckfnn@worldonline.dk  Sun Apr 14 16:23:57 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 14 Apr 2002 15:23:57 GMT
Subject: [Compiler-sig] Question about slice
Message-ID: <3cb98fb0.19172778@mail.wanadoo.dk>

Hi,

Yet another question, this time about the slice ctors and how they
should be used. I made a little change locally:

-             | ExtSlice(expr* dims)
+             | ExtSlice(slice* dims)

and I have used the Slice and ExtSlice like this (only showing the
actual slice):

L[1] -->
    Slice[lower=Num[n=1], upper=null]

L[1:2] -->
    Slice[lower=Num[n=1], upper=Num[n=2]]

L[1:2, 3] -->
    ExtSlice[dims=[
        Slice[lower=Num[n=1], upper=Num[n=2]], 
        Slice[lower=Num[n=3], upper=null]
    ]]


Is that about right? It seem to work OK.

Except that jython happens to support a step argument to its
slicesyntax. 

Jython 2.1+ on java1.4.0-beta3 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> "1234567890"[::2]
'13579'
>>>

How do you feel about adding a step argument to the Slice ctor?

regards,
finn



From bckfnn@worldonline.dk  Sun Apr 14 16:43:19 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 14 Apr 2002 15:43:19 GMT
Subject: [Compiler-sig] Jython progres
Message-ID: <3cb99f68.23197065@mail.wanadoo.dk>

Hi,

I can now transform all the standard Lib .py files from CPython with the
AST tree builder I have made for jython. I have used a slightly modified
python.asdl, but most of the changes have been discussed here
previously.

A cute little detail about my approach is that I have no intermediate
parse tree structure. Instead I can create the AST nodes directly.

Obviously I can't generate java bytecode from the AST yet, that is next
phase that I will work on (and I'm sure that bugs will surface when I
start).

Below are the changes I made. I can work without the 'ifpart' change,
but the rest should IMO be committed in some form.

regards,
finn



Index: python.asdl
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v
retrieving revision 1.11
diff -w -u -r1.11 python.asdl
--- python.asdl	14 Apr 2002 10:10:02 -0000	1.11
+++ python.asdl	14 Apr 2002 15:28:07 -0000
@@ -6,7 +6,7 @@
 
 	stmt = FunctionDef(identifier name, arguments args, stmt* body)
 	      | ClassDef(identifier name, expr* bases, stmt* body)
-	      | Return(expr value) | Yield(expr value)
+	      | Return(expr? value) | Yield(expr value)
 
 	      | Del(assign* targets)
 	      | Assign(assign* targets, expr value)
@@ -19,7 +19,7 @@
 	      -- need a better solution for that
 	      | For(expr target, expr iter, stmt* body, stmt* orelse)
 	      | While(expr test, stmt* body, stmt* orelse)
-	      | If(expr test, stmt* body, stmt* orelse)
+	      | If(ifpart* tests, stmt* orelse)
 
 	      -- 'type' is a bad name
 	      | Raise(expr? type, expr? inst, expr? tback)
@@ -66,7 +66,7 @@
 
         slice = Ellipsis | Slice(expr? lower, expr? upper) 
 	      -- maybe Slice and ExtSlice should be merged...
-	      | ExtSlice(expr* dims) 
+	      | ExtSlice(slice* dims) 
 
 	boolop = And | Or 
 
@@ -85,11 +85,17 @@
 	except = (expr? type, assign? name, stmt* body)
 
 	-- XXX need to handle 'def f((a, b)):'
-	arguments = (identifier* args, identifier? vararg, 
+	arguments = (fpdef* args, identifier? vararg, 
 		     identifier? kwarg, expr* defaults)
 
+        fpdef = FpList(fpdef* list)
+              | FpName(identifier id)
+
         -- keyword arguments supplied to call
         keyword = (identifier arg, expr value)
+
+        ifpart = (expr test, stmt* body)
+
 
         -- import name with optional 'as' alias.
         alias = (identifier name, identifier? asname)



From bckfnn@worldonline.dk  Sun Apr 14 21:45:50 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 14 Apr 2002 20:45:50 GMT
Subject: [Compiler-sig] Visitor pattern
Message-ID: <3cb9e7dc.41745046@mail.wanadoo.dk>

Hi,

Any thoughs about how a visitor pattern should be added to the AST
nodes? I took a quick look at the visitor in the 'compiler' package but
it wasn't immediately obvious to me how it works.

My own thoughts goes like this (I'm clearly thinking in java here). In
each AST node a method is generated:

    public Object accept(Visitor visitor) throws Exception {
        return visitor.visit_ClassDef(this);
    }

and a Visitor interface is generated like this:

public interface Visitor {
    public Object visit_Module(Module node) throws Exception;
    public Object visit_FunctionDef(FunctionDef node) throws Exception;
    public Object visit_ClassDef(ClassDef node) throws Exception;
    ...
}


As a result, the visitor implementation is itself responsible for
calling the .accept() method on all its children and there is no default
recursion.

Something as simple as that would fulfill the needs for visitor patterns
as used by jython itself, but if you are thinking about something more
powerful I might as well use that in jython's CodeCompiler.

regards,
finn



From jeremy@zope.com  Mon Apr 15 01:50:26 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Sun, 14 Apr 2002 20:50:26 -0400
Subject: [Compiler-sig] Visitor pattern
In-Reply-To: <3cb9e7dc.41745046@mail.wanadoo.dk>
References: <3cb9e7dc.41745046@mail.wanadoo.dk>
Message-ID: <15546.9170.111452.105217@slothrop.zope.com>

I've been meaning to get the compiler package's visitor documented,
but I'm not yet sure where to start.  (That is, I'm not sure which
things are novel or unusual and which are obvious.)

I think there are three key features:

    - Embed the name of the class being visited in the visitor
      method.  This wouldn't be necessary in Java, I assume, because
      each method would have the same name buts its argument list
      would be different.

    - The AST walker exploits knowledge of the AST structure to
      provide a default visit() method that will work on any node
      type.   

    - Responsibility for implementing the pattern is divided between a
      visitor and a walker.

The visitor implements a visitXXX() method for each node of interest.
The walker takes a visitor instance and the root of a node graph.  It
applies the visitor methods to the node graph starting with the root.

The second seems to apply regardless of language and can be quite
convenient.  If you're writing a simple symbol table visitor, you only
care about a few of the node types.  The If stmt, e.g., has no effect
on the symbol table, only its children do.  The default method makes
it possible to write a visitor that only has methods for the nodes it
cares about.

If we have a full specification of the AST (we do: python.asdl) and we
assume that the spec includes the children of each node in a "natural"
order, then we can generate a visitor method automatically.  By
natural, I mean a pre-order traversal of the tree, which has always
been what I've needed.

In the compiler package, each node type has a method getChildren()
that returns a tuple containing the children in the natural order.  

class If:

    def getChildren(self):
        return self.test, self.body, self.orelse

We should be able to define these as well, since the python.asdl
presents the children in the natural order.

If we have a visitor that isn't interested in If nodes, then it simply
doesn't define a visitIf() method.  If the AST contains an If node,
the default method of the walker handles it instead.

The default method on the walker does the following:

    for child in node.getChildren():
        if child is not None:
            self.visit(child)

The visit() method is defined by the walker.  It takes an arbitrary
node as its argument, looks up the appropriate method on the
visitor, and calls it.  The visitor can also use this method to
delegate responsibility for method lookup to the walker.

Does that help?  

I'm not sure how much of this maps naturally from Python to Java.

Jeremy




From bckfnn@worldonline.dk  Mon Apr 15 15:01:34 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Mon, 15 Apr 2002 14:01:34 GMT
Subject: [Compiler-sig] Visitor pattern
In-Reply-To: <15546.9170.111452.105217@slothrop.zope.com>
References: <3cb9e7dc.41745046@mail.wanadoo.dk> <15546.9170.111452.105217@slothrop.zope.com>
Message-ID: <3cbab5ce.6552301@mail.wanadoo.dk>

[Jeremy]

>I've been meaning to get the compiler package's visitor documented,
>but I'm not yet sure where to start.  (That is, I'm not sure which
>things are novel or unusual and which are obvious.)
>
>I think there are three key features:
>
>    - Embed the name of the class being visited in the visitor
>      method.  This wouldn't be necessary in Java, I assume, because
>      each method would have the same name buts its argument list
>      would be different.

Java can do that to some extend but lets ignore that and just decide
that the visitor have method like visitAssign() and visitIf().

>
>    - The AST walker exploits knowledge of the AST structure to
>      provide a default visit() method that will work on any node
>      type.   

I think it is the walker idea that threw me off. When I hear 'visitor'
together with trees I immediately thinks the visitor pattern in the GOF
book, and there is no walker in that pattern. 

>    - Responsibility for implementing the pattern is divided between a
>      visitor and a walker.

And with no responsibility placed in the nodes (other then the
getChildren() method), right ? That is OK, but it is a bit of a stretch
to call it a visitor (IMHO).

>The visitor implements a visitXXX() method for each node of interest.
>The walker takes a visitor instance and the root of a node graph.  It
>applies the visitor methods to the node graph starting with the root.
>
>The second seems to apply regardless of language and can be quite
>convenient.  If you're writing a simple symbol table visitor, you only
>care about a few of the node types.  The If stmt, e.g., has no effect
>on the symbol table, only its children do.  The default method makes
>it possible to write a visitor that only has methods for the nodes it
>cares about.
>
>If we have a full specification of the AST (we do: python.asdl) and we
>assume that the spec includes the children of each node in a "natural"
>order, then we can generate a visitor method automatically.  By
>natural, I mean a pre-order traversal of the tree, which has always
>been what I've needed.

That is interesting, I can think of only one times where I would need a
autotraversel: When detecting fastlocals. When generating code I need to
control the traversal order myself.

>In the compiler package, each node type has a method getChildren()
>that returns a tuple containing the children in the natural order.  
>
>class If:
>
>    def getChildren(self):
>        return self.test, self.body, self.orelse

Yuck! For performance reason in jython, I would much rather call a
method on the node that does the traversing:

class If:
    def traverse(self, walker):
        walker.dispatch(self.test)
        for stmt in self.body:
            walker.dispatch(stmt)
        for stmt in self.orelse:
            walker.dispatch(stmt)

That is because calling a method can be cheaper than creating a tuple.

I suppose that I will do something like that and then hide the
'traverse' method from python.

>We should be able to define these as well, since the python.asdl
>presents the children in the natural order.
>
>If we have a visitor that isn't interested in If nodes, then it simply
>doesn't define a visitIf() method.  If the AST contains an If node,
>the default method of the walker handles it instead.
>
>The default method on the walker does the following:
>
>    for child in node.getChildren():
>        if child is not None:
>            self.visit(child)

With my 'If' class above the 'default' method becomes:

    child.traverse(this)

>The visit() method is defined by the walker.  It takes an arbitrary
>node as its argument, looks up the appropriate method on the
>visitor, and calls it.  The visitor can also use this method to
>delegate responsibility for method lookup to the walker.
>
>Does that help?  

Yes it did, thanks.

A question still remain: if I have a visitTryExcept() method, how would
I then cause or prevent the default traversion of the children? In the
example below I assume that a visitXXX function must deal with its
children itself either one-by-one or by calling 'default()'.

>I'm not sure how much of this maps naturally from Python to Java.

I'm guessing the main problem will be whether the concrete visitor class
must inherit from some base class. I like that requirement, you probably
don't.

The requirement of a visitor base class is easily met if the visitor and
walker is joined into one class. That way an example visitor to find
potential fastlocals would look like this:


class FastLocalVisitor(ASTVisitor):
    def __init__():
        self.infunc = False
        self.mode = 'GET'

    def visitFuncDef(self, node):
        self.infunc = True
        self.default(node)
        self.infunc = False

    def visitName(self, node):
        if self.infunc and self.mode == 'SET':
            print node.id, "is a potential fastlocal"

    def visitTryExcept(self, node):
        self.mode = 'SET'
        for e in node.handlers:
            self.dispatch(e.name)
        self.mode = 'GET'
        for e in node.handlers:
            self.dispatch(e.type)
            self.dispatch(e.body)
        self.dispatch(node.body)
        self.dispatch(node.orelse)

    def visitAssign(self, node):
        self.mode = 'SET'
        for t in node.targets:
            self.dispatch(t)
        self.mode = 'GET'
        self.dispatch(node.value)

    def visitFor(self, node):
        self.mode = 'SET'
        self.dispatch(node.target)
        self.mode = 'GET'
        self.dispatch(node.iter)
        self.dispatch(node.body)
        self.dispatch(node.value)

FastLocalVisitor().visit(tree)


So the ASTVisitor class is publicly defined as:

class ASTVisitor:
    def default(self, node):  # Should be called traverse IMO
        """Visit each of the children of node."""

    def dispatch(self, node): # Should be called visit IMO
        """Visit the node (or call self.default())."""

    def visitXXX(self, node):
        """Visit the node of type XXX. Defined in subclass"""

if we want to get fancy, we can consider these methods:

    def post_visitXXX(self, node):
        """Visit the node of type XXX after the children have 
        been visited. Defined in subclass"""

    def unhandled_node(self, node):
        """Called when no visitXXX method exists for the node. Defined
        in subclass"""

    def open_level(self, node):
        """Called just before the visitXXX is called. Defined
        in subclass"""

    def close_level(self, node):
        """Called just after the post_visitXXX is called. Defined
        in subclass"""



I would also like to avoid documenting the use of cls.__name__ in the
dispatching. The cls.__name__ for a java class should not be depended on
because it looks like this: 'org.python.parser.ast.If'. I can deal with
this in the implementation of ASTVisitor.dispatch() but I wouldn't want
other application to depending too much on the __name__ of AST nodes. It
would be better if we added a class attribute to the nodes that
contained the official name.

regards,
finn



From bckfnn@worldonline.dk  Tue Apr 16 10:56:59 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Tue, 16 Apr 2002 09:56:59 GMT
Subject: [Compiler-sig] Assign nodes
In-Reply-To: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net>
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <3cbbe809.10645387@mail.wanadoo.dk>

[Jeremy, in a checkin msg]

>Instead of extra Lvalue production, have separate constructors for the
>assign type.  This makes sense because the code generate for an
>expression as the target of an assignment is quite different than an
>expression elsewhere.

An assign node in a 'del' statement is also quite different from an
assignment so you would want a special set of nodes for deletions as
well (DelAttribute, DelSubscript, DelName, DelList and DelTuple) ?

>+ 	-- the subset of expressions that are valid as the target of
>+ 	-- assignments. 
>+ 	assign = AssignAttribute(expr value, identifier attr)
>+ 	     | AssignSubscript(expr value, slice slice)
>+ 	     | AssignName(identifier id)
>+ 	     | AssignList(expr* elts) | AssignTuple(expr *elts)

I think that should be:

>+ 	     | AssignList(assign* elts) | AssignTuple(assign *elts)


And I don't like this change one single bit. Yes sure, it looks prettier
than using Lvalue and it captures useful information, but it makes a
one-pass AST builder a lot harder to do. You probably don't care because
you have an intermediate parse tree with all the context needed to know
if you must create a Name node or a AssignName node. When my parser
detect a name, I have no way of knowing if an equal sign will turn up
later. So I'll have to pick one node type (like Name) and rebuild that
part of the tree later if it turned out that it was part of an
assignment or deletion. Ugly.

As I have said before, I think the desire for correct typing in the tree
is overrated and I would rather remove the 'assign' sum altogether and
add a flag to the Attribute, Subscript, Name, List and Tuple nodes that
captured the use of the node (like 'Get', 'Set' and 'Del').

regards,
finn



From skip@pobox.com  Tue Apr 16 15:55:59 2002
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 16 Apr 2002 09:55:59 -0500
Subject: [Compiler-sig] dipping my toe in...
Message-ID: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>

              -- 'type' is a bad name
              | Raise(expr? type, expr? inst, expr? tback)

How about "exc" instead?

Does the "?" imply the preceeding fields must be present if that is?  For
example, if we have an "inst", does that imply we also have a "type" the
same way optional args work in Python?

Skip



From jeremy@zope.com  Tue Apr 16 16:45:31 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Tue, 16 Apr 2002 11:45:31 -0400
Subject: [Compiler-sig] Assign nodes
In-Reply-To: <3cbbe809.10645387@mail.wanadoo.dk>
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net>
 <3cbbe809.10645387@mail.wanadoo.dk>
Message-ID: <15548.18203.915142.66377@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> And I don't like this change one single bit. Yes sure, it looks
  FB> prettier than using Lvalue and it captures useful information,
  FB> but it makes a one-pass AST builder a lot harder to do.

I felt it simplified the code generator for the compiler package, but
I don't have any experience with a one-pass AST builder.  So it's hard
for me to judge what the tradeoffs are.  You haven't written a code
generator, right?  If so, we've each got experience with one end of
the problem, but not with both.

  FB> You probably don't care because you have an intermediate parse
  FB> tree with all the context needed to know if you must create a
  FB> Name node or a AssignName node.

I'd be interested in taking a look at the one-pass AST builder.
Eventually, I'd like to have one for CPython.  

  FB> When my parser detect a name, I have no way of knowing if an
  FB> equal sign will turn up later. So I'll have to pick one node
  FB> type (like Name) and rebuild that part of the tree later if it
  FB> turned out that it was part of an assignment or deletion. Ugly.

Indeed, and perhaps a compelling case against the richer AST.  I
assume the difference is that CPython's compiler is top-down and yours
is bottom-up?

  FB> As I have said before, I think the desire for correct typing in
  FB> the tree is overrated and I would rather remove the 'assign' sum
  FB> altogether and add a flag to the Attribute, Subscript, Name,
  FB> List and Tuple nodes that captured the use of the node (like
  FB> 'Get', 'Set' and 'Del').

I'll noodle with the code generator in the compiler package and see
what it would look like if the AssignXXX nodes went away.

Jeremy




From bckfnn@worldonline.dk  Tue Apr 16 16:49:46 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Tue, 16 Apr 2002 15:49:46 GMT
Subject: [Compiler-sig] dipping my toe in...
In-Reply-To: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>
References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>
Message-ID: <3cbc4211.33692707@mail.wanadoo.dk>

On Tue, 16 Apr 2002 09:55:59 -0500, you wrote:

>
>              -- 'type' is a bad name
>              | Raise(expr? type, expr? inst, expr? tback)
>
>How about "exc" instead?
>
>Does the "?" imply the preceeding fields must be present if that is?  For
>example, if we have an "inst", does that imply we also have a "type" the
>same way optional args work in Python?'

No, asdl cannot capture that requirement in its syntax.

Since we are controlling the codegeneration, we could decided to
interpret the '?' the same way as optional args, but we would get some
trouble with the lower/upper bounds in Slice and with the
starargs/kwargs where both can optional.

For easier mapping into java, I would prefer if we only used positional
arguments but if we want them bad enough, I can also add support for
keyword args to the ctors.

regards,
finn



From bckfnn@worldonline.dk  Tue Apr 16 17:17:48 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Tue, 16 Apr 2002 16:17:48 GMT
Subject: [Compiler-sig] Assign nodes
In-Reply-To: <15548.18203.915142.66377@slothrop.zope.com>
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net> <3cbbe809.10645387@mail.wanadoo.dk> <15548.18203.915142.66377@slothrop.zope.com>
Message-ID: <3cbc489a.35366063@mail.wanadoo.dk>

[Jeremy]

>>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:
>
>  FB> And I don't like this change one single bit. Yes sure, it looks
>  FB> prettier than using Lvalue and it captures useful information,
>  FB> but it makes a one-pass AST builder a lot harder to do.
>
>I felt it simplified the code generator for the compiler package, but
>I don't have any experience with a one-pass AST builder.  So it's hard
>for me to judge what the tradeoffs are.  You haven't written a code
>generator, right?

No, not from scratch. I did add the augassign code to both our bytecode
generator and our javacode generator and so I do appreciate the benefit
of being able to tell an evaluate Slice node from an assignment Slice
node (and from a delete Slice node and from a AugAssign Slice node).

>If so, we've each got experience with one end of
>the problem, but not with both.
>
>  FB> You probably don't care because you have an intermediate parse
>  FB> tree with all the context needed to know if you must create a
>  FB> Name node or a AssignName node.
>
>I'd be interested in taking a look at the one-pass AST builder.
>Eventually, I'd like to have one for CPython.  
>
>  FB> When my parser detect a name, I have no way of knowing if an
>  FB> equal sign will turn up later. So I'll have to pick one node
>  FB> type (like Name) and rebuild that part of the tree later if it
>  FB> turned out that it was part of an assignment or deletion. Ugly.
>
>Indeed, and perhaps a compelling case against the richer AST.  I
>assume the difference is that CPython's compiler is top-down and yours
>is bottom-up?

Yes, jjtree creates the AST bottom-up.

>  FB> As I have said before, I think the desire for correct typing in
>  FB> the tree is overrated and I would rather remove the 'assign' sum
>  FB> altogether and add a flag to the Attribute, Subscript, Name,
>  FB> List and Tuple nodes that captured the use of the node (like
>  FB> 'Get', 'Set' and 'Del').
>
>I'll noodle with the code generator in the compiler package and see
>what it would look like if the AssignXXX nodes went away.

Don't forget the flag!

I prefer the flag (instead of seperate classes) because it is a lot
easier and faster to change an int in the sub-tree than it is to
recreate the sub-tree with different classes.

Lets not get too hung up on it, I can also implement it with AssignXXXX
(and DeleteXXXX and AugAssignXXXX) nodes.

regards,
finn



From jeremy@zope.com  Tue Apr 16 17:47:11 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Tue, 16 Apr 2002 12:47:11 -0400
Subject: [Compiler-sig] Assign nodes
In-Reply-To: <3cbc489a.35366063@mail.wanadoo.dk>
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net>
 <3cbbe809.10645387@mail.wanadoo.dk>
 <15548.18203.915142.66377@slothrop.zope.com>
 <3cbc489a.35366063@mail.wanadoo.dk>
Message-ID: <15548.21903.521485.622844@slothrop.zope.com>

It looks like the code generator, as you mentioned, has to distinguish
between store and del anyway.  So getting rid of the special case for
load, just means making it a three part default test.

As it happens, the case of names is handled by three separate methods
-- loadName(), storeName(), and delName() -- that are called from the
visitor.  So the separate node types doesn't really buy anything.

Does the following patch look good?

Jeremy

Index: python.asdl
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v
retrieving revision 1.12
diff -c -c -r1.12 python.asdl
*** python.asdl	16 Apr 2002 03:20:45 -0000	1.12
--- python.asdl	16 Apr 2002 16:43:55 -0000
***************
*** 8,16 ****
  	      | ClassDef(identifier name, expr* bases, stmt* body)
  	      | Return(expr value) | Yield(expr value)
  
! 	      | Del(assign* targets)
! 	      | Assign(assign* targets, expr value)
! 	      | AugAssign(assign target, operator op, expr value)
  
  	      -- not sure if bool is allowed, can always use int
   	      | Print(expr? dest, expr* value, bool nl)
--- 8,16 ----
  	      | ClassDef(identifier name, expr* bases, stmt* body)
  	      | Return(expr value) | Yield(expr value)
  
! 	      | Del(expr* targets)
! 	      | Assign(expr* targets, expr value)
! 	      | AugAssign(expr target, operator op, expr value)
  
  	      -- not sure if bool is allowed, can always use int
   	      | Print(expr? dest, expr* value, bool nl)
***************
*** 55,71 ****
  	     | Num(string n) -- string representation of a number
  	     | Str(string s) -- need to specify raw, unicode, etc?
  	     -- other literals? bools?
! 	     | Attribute(expr value, identifier attr)
! 	     | Subscript(expr value, slice slice)
! 	     | Name(identifier id)
! 	     | List(expr* elts) | Tuple(expr *elts)
! 
! 	-- the subset of expressions that are valid as the target of
! 	-- assignments. 
! 	assign = AssignAttribute(expr value, identifier attr)
! 	     | AssignSubscript(expr value, slice slice)
! 	     | AssignName(identifier id)
! 	     | AssignList(expr* elts) | AssignTuple(expr *elts)
  
          slice = Ellipsis | Slice(expr? lower, expr? upper) 
  	      -- maybe Slice and ExtSlice should be merged...
--- 55,69 ----
  	     | Num(string n) -- string representation of a number
  	     | Str(string s) -- need to specify raw, unicode, etc?
  	     -- other literals? bools?
! 
! 	     -- the following expression can appear in assignment context
! 	     | Attribute(expr value, identifier attr, expr_context ctx)
! 	     | Subscript(expr value, slice slice, expr_context ctx)
! 	     | Name(identifier id, expr_context ctx)
! 	     | List(expr* elts, expr_context ctx) 
! 	     | Tuple(expr *elts, expr_context ctx)
! 
! 	expr_context = Load | Store | Del
  
          slice = Ellipsis | Slice(expr? lower, expr? upper) 
  	      -- maybe Slice and ExtSlice should be merged...
***************
*** 84,90 ****
  
  	-- not sure what to call the first argument for raise and except
  
! 	except = (expr? type, assign? name, stmt* body)
  
  	-- XXX need to handle 'def f((a, b)):'
  	arguments = (identifier* args, identifier? vararg, 
--- 82,88 ----
  
  	-- not sure what to call the first argument for raise and except
  
! 	except = (expr? type, expr? name, stmt* body)
  
  	-- XXX need to handle 'def f((a, b)):'
  	arguments = (identifier* args, identifier? vararg, 




From bckfnn@worldonline.dk  Tue Apr 16 19:09:48 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Tue, 16 Apr 2002 18:09:48 GMT
Subject: [Compiler-sig] Assign nodes
In-Reply-To: <15548.21903.521485.622844@slothrop.zope.com>
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net> <3cbbe809.10645387@mail.wanadoo.dk> <15548.18203.915142.66377@slothrop.zope.com> <3cbc489a.35366063@mail.wanadoo.dk> <15548.21903.521485.622844@slothrop.zope.com>
Message-ID: <3cbc5a61.39917458@mail.wanadoo.dk>

[Jeremy]

>It looks like the code generator, as you mentioned, has to distinguish
>between store and del anyway.  So getting rid of the special case for
>load, just means making it a three part default test.
>
>As it happens, the case of names is handled by three separate methods
>-- loadName(), storeName(), and delName() -- that are called from the
>visitor.  So the separate node types doesn't really buy anything.
>
>Does the following patch look good?

At first glance it seems to be exactly what I needed. I'll take a closer
look tomorrow.

regards,
finn



From jeremy@zope.com  Tue Apr 16 22:24:53 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Tue, 16 Apr 2002 17:24:53 -0400
Subject: [Compiler-sig] dipping my toe in...
In-Reply-To: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>
References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>
Message-ID: <15548.38565.81743.936691@slothrop.zope.com>

>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:

  SM> -- 'type' is a bad name
  SM>               | Raise(expr? type, expr? inst, expr? tback)

  SM> How about "exc" instead?

Yes.

  SM> Does the "?" imply the preceeding fields must be present if that
  SM> is?  For example, if we have an "inst", does that imply we also
  SM> have a "type" the same way optional args work in Python?

It's not that powerful.  It just says that the type is optional.  So
the Raise ctor above would actually accept a raise statement with only
a traceback.

Jeremy




From skip@pobox.com  Tue Apr 16 23:03:13 2002
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 16 Apr 2002 17:03:13 -0500
Subject: [Compiler-sig] dipping my toe in...
In-Reply-To: <15548.38565.81743.936691@slothrop.zope.com>
References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>
 <15548.38565.81743.936691@slothrop.zope.com>
Message-ID: <15548.40865.696305.300429@12-248-41-177.client.attbi.com>

    SM> Does the "?" imply the preceeding fields must be present if that is?
    SM> For example, if we have an "inst", does that imply we also have a
    SM> "type" the same way optional args work in Python?

    Jeremy> It's not that powerful.  It just says that the type is optional.
    Jeremy> So the Raise ctor above would actually accept a raise statement
    Jeremy> with only a traceback.

So the parser is the "guard at the gate" to prevent such stuff from turning
up in the AST?

I ask these because I'm still a little confused about what exactly this
stuff does.  Looking at test.py

    import ast
    print ast.transform("""global a, b, c
    a + b - c * 3
    """)

suggests that somehow it's parsing the Python code, but that's not what's
happening.  Still, it's not clear why ast_transform always returns None.

Sorry for the frontal lobe density...

Skip




From jeremy@zope.com  Tue Apr 16 23:11:19 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Tue, 16 Apr 2002 18:11:19 -0400
Subject: [Compiler-sig] dipping my toe in...
In-Reply-To: <15548.40865.696305.300429@12-248-41-177.client.attbi.com>
References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com>
 <15548.38565.81743.936691@slothrop.zope.com>
 <15548.40865.696305.300429@12-248-41-177.client.attbi.com>
Message-ID: <15548.41351.22256.793404@slothrop.zope.com>

>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:

  SM> Does the "?" imply the preceeding fields must be present if that
  SM> is?  For example, if we have an "inst", does that imply we also
  SM> have a "type" the same way optional args work in Python?

  Jeremy> It's not that powerful.  It just says that the type is
  Jeremy> optional.  So the Raise ctor above would actually accept a
  Jeremy> raise statement with only a traceback.

  SM> So the parser is the "guard at the gate" to prevent such stuff
  SM> from turning up in the AST?

Someone is responsible, but it's not clear who.  

In compile.c, there are a variety of tests that occur during code
generator, like illegal expressions as assignment targets.

In any particular implementation, the front end is going to produce
the AST.  It's got to guarantee that the AST is valid.

The ast module I'm working on will probably do those checks on the
completed AST before returning it.  But I haven't gotten that far
yet.  It might end up detecting the problem while creating the AST.
In fact, the expr_context patch I sent around earlier suggests that
the error would be determined sooner, because only valid expression
types have the expr_context slot.

  SM> I ask these because I'm still a little confused about what
  SM> exactly this stuff does.  Looking at test.py

  SM>     import ast print ast.transform("""global a, b, c a + b - c *
  SM>     3 """)

  SM> suggests that somehow it's parsing the Python code, but that's
  SM> not what's happening.

Actually, it is what's happening.  It compiles the source, then
converts it to an AST.

  SM> not what's happening.  Still, it's not clear why ast_transform
  SM> always returns None.

Only because there isn't anything else useful to return.  The AST is
not a PyObject *.  I thought about writing the pickling code and
returning a pickle, but that seemed like too much work.  Instead, I'm
just exercise the ast transformation code and then throwing away the
result.

(You may also have noticed that I never free any memory. :-)

  SM> Sorry for the frontal lobe density...

It's not your fault.  I'm not good at explaining what I'm currently
doing, and I haven't finished yet :-).

Jeremy




From neal@metaslash.com  Wed Apr 17 13:16:33 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Wed, 17 Apr 2002 08:16:33 -0400
Subject: [Compiler-sig] Assign nodes
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net> <3cbbe809.10645387@mail.wanadoo.dk>
Message-ID: <3CBD67A1.F3B58841@metaslash.com>

Jeremy:

In ast_for_augassign (astmodule.c), you are attempting to find
the assign type, but is it correct for <<= and >>= ?

Don't you need to add something like:

	case '<':
	        if (STR(n)[1] == '<')
			return LShift;
		fprintf(stderr, "invalid augassign: %s", STR(n));
		return 0;

And the same for the '>'?

Also, do you want me to do some little cleanups and checkin?
Or would you prefer to discuss here first?

Do you have any small pieces you want me to try to help with?

Neal



From jeremy@zope.com  Wed Apr 17 16:47:40 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Wed, 17 Apr 2002 11:47:40 -0400
Subject: [Compiler-sig] Assign nodes
In-Reply-To: <3CBD67A1.F3B58841@metaslash.com>
References: <E16xJWR-0004TZ-00@usw-pr-cvs1.sourceforge.net>
 <3cbbe809.10645387@mail.wanadoo.dk>
 <3CBD67A1.F3B58841@metaslash.com>
Message-ID: <15549.39196.341872.192917@slothrop.zope.com>

>>>>> "NN" == Neal Norwitz <neal@metaslash.com> writes:

  NN> Jeremy: In ast_for_augassign (astmodule.c), you are attempting
  NN> to find the assign type, but is it correct for <<= and >>= ?

  NN> Don't you need to add something like:

  NN> 	case '<':
  NN> 	        if (STR(n)[1] == '<')
  NN> 			return LShift;
  NN> 		fprintf(stderr, "invalid augassign: %s", STR(n));
  NN> 		return 0;

  NN> And the same for the '>'?

What is this checking for?

  NN> Also, do you want me to do some little cleanups and checkin?  Or
  NN> would you prefer to discuss here first?

Feel free to cleanup first and ask questions later.

  NN> Do you have any small pieces you want me to try to help with?

I was thinking it would be helpful for someone to work on the pickler
code.  I started, but didn't get very far.  I haven't given a lot of
thought about how to expose the AST objects to Python.  It seems like
making the PyObject's introduces a lot of overhead that doesn't serve
any purpose in the common case.

    (I'm assuming that the common case is the compiler using the AST
    internally to generate bytecode and then throwing it away.)

Pickling the AST from C and unpickling it from Python seems like a
simple way to share the AST without needing a PyObject-style
interface.

Two other possible projects are better memory management and better
error handling.  I decided to basically punt on that for now and
re-visit it after the astmodule is complete.  I suspect that an arena
style of allocation may be useful, where memory for the AST is
allocated from the arena and the arena is freed by one call when the
AST is no longer needed.

Jeremy





From ecn@metaslash.com  Thu Apr 18 04:02:24 2002
From: ecn@metaslash.com (Eric C. Newton)
Date: Wed, 17 Apr 2002 23:02:24 -0400
Subject: [Compiler-sig] AST observations
Message-ID: <20020417230224.A21385@ecn>

I've noticed a few things with the current implementation of ASTs in
Python 2.2 from my work on PyChecker2.  These are just some notes.

compiler.visitor:

   ASTVisitor
        As the comment says, it's not a visitor, it's a walker.
        Someone mentioned earlier that this is rather confusing.

   There is this comment:

        "If the visitor method returns a true value, the
        ASTVisitor will not traverse the child nodes."

   I see no code which checks the return value.

   Performance:

        For me, the _cache mechanism actually slows down
        visitation. I found that pre-caching the method names in
        preorder() _is_ faster.

        Most of my dispatching uses the default dispatcher; the
        getChildNodes() method, along with compiler.ast.flatten
        and compiler.ast.flatten_nodes are significant overheads.

        All that said, I re-wrote it using a number of techniques:

           removed the ability to add variable arguments to the walk
                (no improvement)

           fewer transformations from lists to tuples (minor improv.)

           smarter construction of lists (minor improv.)

           custom getChildNodes() for each class to eliminate
           calls to flatten() (minor improv.)

           pass a visit() function to a visitChildren() method (slower!)

           write a default recursive dispatcher in C (slower!)

   Convention:

        Setting the "visit" method on the Visitor is, ahem, a novel
        approach.  It's a convention that PyChecker (the use of, not
        the development of) doesn't like ("unknown method visit()").
        I don't know if I don't like it, but it was unexpected.
        Passing the walker to the dispatch function is the sort of
        thing I would expect.

In general, pychecker2 calls "walk(node, Visitor())" a LOT.  The first
version of pychecker did a lot of things in a single pass.  That is
pretty efficient, but it's harder to add more checks without creating
really dense code.  I'm trying to structure pychecker2 around lots of
independent checks, so it will be easier to contribute new code.  The
consequence is: I would really like the visitor stuff to run
efficiently.

I gave up on trying to use recursion to figure out a Node's parents in
an AST tree.  Very often I need to know the parents of the Node I'm
looking at, and using recursion to hold this information was becoming
cumbersome.  For the time being, I'm adding a parent link to all AST
nodes just after a file is parsed.  An alternative data-structure
(heap?) would be just as swell so long as I can compute this
efficiently.

Line numbers appear to be added to AST nodes in arbitrary ways.

Some interesting projects which re-write existing code might like
other token information, like comments.

People have requested features for pychecker, like detecting
unnecessary parens and semicolons, which is not possible, since these
are not part of the AST.

-Eric



From bckfnn@worldonline.dk  Thu Apr 18 09:52:00 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 18 Apr 2002 08:52:00 GMT
Subject: [Compiler-sig] AST observations
In-Reply-To: <20020417230224.A21385@ecn>
References: <20020417230224.A21385@ecn>
Message-ID: <3cbe77b9.3930812@mail.wanadoo.dk>

[Eric C. Newton]

>I've noticed a few things with the current implementation of ASTs in
>Python 2.2 from my work on PyChecker2.  These are just some notes.
>[...]
>
>   There is this comment:
>
>        "If the visitor method returns a true value, the
>        ASTVisitor will not traverse the child nodes."
>
>   I see no code which checks the return value.

And that is IMO a very good thing that is isn't implemented as
documented. If the visitor/walker hijacked the return value of visitXXXX
methods for its own purposes, the visitor would be completely useless in
both our bytecode compiler and our javacode compiler.

>   Performance:
>
>        For me, the _cache mechanism actually slows down
>        visitation. I found that pre-caching the method names in
>        preorder() _is_ faster.
>
>        Most of my dispatching uses the default dispatcher; the
>        getChildNodes() method, along with compiler.ast.flatten
>        and compiler.ast.flatten_nodes are significant overheads.
>
>        All that said, I re-wrote it using a number of techniques:
>
>           removed the ability to add variable arguments to the walk
>                (no improvement)
>
>           fewer transformations from lists to tuples (minor improv.)
>
>           smarter construction of lists (minor improv.)
>
>           custom getChildNodes() for each class to eliminate
>           calls to flatten() (minor improv.)
>
>           pass a visit() function to a visitChildren() method (slower!)
>
>           write a default recursive dispatcher in C (slower!)
>
>   Convention:
>
>        Setting the "visit" method on the Visitor is, ahem, a novel
>        approach. 

I think you are way too kind here. It sucks; it is plainly a hack and it
is quite hard to map into efficient java.

>        It's a convention that PyChecker (the use of, not
>        the development of) doesn't like ("unknown method visit()").
>        I don't know if I don't like it, but it was unexpected.
>        Passing the walker to the dispatch function is the sort of
>        thing I would expect.
>
>In general, pychecker2 calls "walk(node, Visitor())" a LOT.  The first
>version of pychecker did a lot of things in a single pass.  That is
>pretty efficient, but it's harder to add more checks without creating
>really dense code.  I'm trying to structure pychecker2 around lots of
>independent checks, so it will be easier to contribute new code.  The
>consequence is: I would really like the visitor stuff to run
>efficiently.

I also want the visitor pattern to be superfast because I want to use it
for the on-the-fly javabytecode generation. If it turns out that the
chosen visitor pattern isn't sufficiently efficient I'll be forced to
make our own visitor pattern in parallel with the one in the compiler
package

>I gave up on trying to use recursion to figure out a Node's parents in
>an AST tree.  Very often I need to know the parents of the Node I'm
>looking at, and using recursion to hold this information was becoming
>cumbersome.  For the time being, I'm adding a parent link to all AST
>nodes just after a file is parsed.  An alternative data-structure
>(heap?) would be just as swell so long as I can compute this
>efficiently.

I don't need a parent link for codegeneration (if I did, it would have
been added already <wink>) so from my primary POV, adding a parent link
is pure memory overhead. 

OTOH in my treebuilder I have hooks what would allow me to create a
dictionary of node->parentnode mappings while I'm creating the AST tree.

So how is this for an alternative idea: The main methods (parse() and
parseFile()) grows an optional dict=None argument. When that argument if
not None each created AST node is inserted as key with the parent node
as its value.
   
>Line numbers appear to be added to AST nodes in arbitrary ways.
>
>Some interesting projects which re-write existing code might like
>other token information, like comments.
>
>People have requested features for pychecker, like detecting
>unnecessary parens and semicolons, which is not possible, since these
>are not part of the AST.

Again it seems like we have been overly focused on codegen. I'm looking
forward to seing Jeremy's thougths on adding parsetree info to the AST.

regards,
finn



From bckfnn@worldonline.dk  Thu Apr 18 10:37:52 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 18 Apr 2002 09:37:52 GMT
Subject: [Compiler-sig] AST observations
In-Reply-To: <20020417230224.A21385@ecn>
References: <20020417230224.A21385@ecn>
Message-ID: <3cbe9072.10260573@mail.wanadoo.dk>

[Eric C. Newton]

>I gave up on trying to use recursion to figure out a Node's parents in
>an AST tree.  Very often I need to know the parents of the Node I'm
>looking at, and using recursion to hold this information was becoming
>cumbersome.

If you only need the ancestry of the current node in the visitXXXX
methods, then the open_level(), close_level() visitor methods that I
suggested previously should work nicely:

class MyVisitor(ASTVisitor):
    def __init__(self):
        self.ancestry = []

    def open_level(self, node):
        self.ancestry.append(node)

    def close_level(self, node):
        self.ancestry.pop()

    def visitFunctionDef(self, node):
        print "parent is", self.ancestry[-2]


regards,
finn



From ecn@metaslash.com  Thu Apr 18 12:14:08 2002
From: ecn@metaslash.com (Eric C. Newton)
Date: Thu, 18 Apr 2002 07:14:08 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <3cbe9072.10260573@mail.wanadoo.dk>; from bckfnn@worldonline.dk on Thu, Apr 18, 2002 at 09:37:52AM +0000
References: <20020417230224.A21385@ecn> <3cbe9072.10260573@mail.wanadoo.dk>
Message-ID: <20020418071408.A24756@ecn>

Yes, I did a lot of this, too.

I gave up when I wanted to re-use something like a "find all Name
Nodes" visitor, and then look up the parents later.

I can certainly keep a reverse map.  I'm already keeping a reverse map
from symbol to node, too.

On Thu, Apr 18, 2002 at 09:37:52AM +0000, Finn Bock wrote:
> [Eric C. Newton]
> 
> >I gave up on trying to use recursion to figure out a Node's parents in
> >an AST tree.  Very often I need to know the parents of the Node I'm
> >looking at, and using recursion to hold this information was becoming
> >cumbersome.
> 
> If you only need the ancestry of the current node in the visitXXXX
> methods, then the open_level(), close_level() visitor methods that I
> suggested previously should work nicely:
> 
> class MyVisitor(ASTVisitor):
>     def __init__(self):
>         self.ancestry = []
> 
>     def open_level(self, node):
>         self.ancestry.append(node)
> 
>     def close_level(self, node):
>         self.ancestry.pop()
> 
>     def visitFunctionDef(self, node):
>         print "parent is", self.ancestry[-2]
> 
> 
> regards,
> finn



From jeremy@zope.com  Thu Apr 18 16:48:46 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 18 Apr 2002 11:48:46 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <20020417230224.A21385@ecn>
References: <20020417230224.A21385@ecn>
Message-ID: <15550.60126.578441.518744@slothrop.zope.com>

Starting with the last stuff first...

>>>>> "ECN" == Eric C Newton <ecn@metaslash.com> writes:

  ECN> Line numbers appear to be added to AST nodes in arbitrary ways.

Indeed.  It hasn't been obvious when to add line numbers.  The
original transformer added line numbers to statements, as far as I
could tell.  This didn't seem sufficient, because, e.g., the except
handler lines aren't individual statements.

  ECN> Some interesting projects which re-write existing code might
  ECN> like other token information, like comments.

Yes.  Refactoring tools would really like to have detailed position
information about each character.  I wouldn't find it acceptable if
such a tool reformatted my code.

  ECN> People have requested features for pychecker, like detecting
  ECN> unnecessary parens and semicolons, which is not possible, since
  ECN> these are not part of the AST.

That's by design.  An AST is a compiler intermediate representation
and parens and semicolons aren't part of the intermediate
representation.  If the analysis has to do with the syntax of the
language, I don't think the AST is the right place to check it.

How do you tell when a pair of parens is unnecessary, BTW?  I've often
used parens around the text part of an if statement so that emacs
formats it nicely when it takes up more than one line.  I find this a
completely acceptable use of "unnecessary" parens.

But regardless of whether the AST should be used for simple syntax
checking (maybe parens aren't just a syntactic issue), it would be
really helpful to decorate the AST with information about the tokens
that make up each node.

I don't know enough about the Python parser to know if it's possible
to get the parser to pass it along to the AST transformer in the
compiler.  I have the impression that things like comments get tossed
pretty early on.

In general, the AST doesn't want to have all the detailed token
information, because it doesn't care about them.  It would waste time
and space to record the information for the compiler.

So if a particular app needs the extra token info, it seems like we
could use the tokeinze module to collect the token info and associate
it with the AST.  I'm not sure how this would work in any detail, or I
would have tried it already :-).

Jeremy





From jeremy@zope.com  Thu Apr 18 16:50:25 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 18 Apr 2002 11:50:25 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <3cbe77b9.3930812@mail.wanadoo.dk>
References: <20020417230224.A21385@ecn>
 <3cbe77b9.3930812@mail.wanadoo.dk>
Message-ID: <15550.60225.950885.692091@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> [Eric C. Newton]
  >> There is this comment:
  >>
  >> "If the visitor method returns a true value, the ASTVisitor will
  >> not traverse the child nodes."
  >>
  >> I see no code which checks the return value.

  FB> And that is IMO a very good thing that is isn't implemented as
  FB> documented. If the visitor/walker hijacked the return value of
  FB> visitXXXX methods for its own purposes, the visitor would be
  FB> completely useless in both our bytecode compiler and our
  FB> javacode compiler.

Yes.  I think the comment just needs to be removed.  It seemed like a
good idea when I started the project, but I don't think I ever found a
use for it.

Jeremy




From bckfnn@worldonline.dk  Thu Apr 18 19:20:47 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 18 Apr 2002 18:20:47 GMT
Subject: [Compiler-sig] Changes to python.asdl
Message-ID: <3cbf0ab3.41557296@mail.wanadoo.dk>

Hi,

I just committed a few small changes to the python.asdl (something
caused syncmail to blow up so the checkin mail might be missing) and I'm
little worried that such changes cause a lot of pain to your code for
the CPython AST tree. I don't have a toolchain available so that I can
compile and test your C code yet.

I know that we are still in the sandbox and so we should be allowed to
play, I want to play nice.

regards,
finn



From jeremy@zope.com  Thu Apr 18 19:31:41 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 18 Apr 2002 14:31:41 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <20020417230224.A21385@ecn>
References: <20020417230224.A21385@ecn>
Message-ID: <15551.4365.375570.60402@slothrop.zope.com>

>>>>> "ECN" == Eric C Newton <ecn@metaslash.com> writes:

  ECN> compiler.visitor:

  ECN>    ASTVisitor
  ECN>         As the comment says, it's not a visitor, it's a walker.
  ECN>         Someone mentioned earlier that this is rather
  ECN>         confusing.

I'm sorry this is confusing, but I think it is one of the standard
variations on the visitor pattern.  It's certainly the case that the
visitors we've all been writing have the signs of being a visitor,
e.g. a method for each class of object.

As for how the traversal occurs, GoF (p. 339) says:

"Who is responsible for traversing the object structure?  A visitor
must visit each element of the object structure.  The question is, how
does it get there?  We can put the responsibility in any of three
places: in the object structure, in the visitor, or in a separate
iterator object."

The text goes on to discuss these alternatives and notes that you
could also use an internal iterator that is a kind of hybrid between
having the traversal in the object structure and using an iterator.
In this case, the iterator calls a method on the visitor with the
object as an argument as opposed to calling a method of the object
with the visitor as the argument.

It might be clearer to merge the walker and the visitor into a single
class using inheritance.  (I think the Walkabout variant described by
Palsberg and Jay does this,
    cf. http://citeseer.nj.nec.com/palsberg97essence.html.)  But I
thought delegation would be clearer and would avoid the need for a
magic base class that all visitors must inherit from.

Jeremy




From jeremy@zope.com  Thu Apr 18 19:36:41 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 18 Apr 2002 14:36:41 -0400
Subject: [Compiler-sig] Changes to python.asdl
In-Reply-To: <3cbf0ab3.41557296@mail.wanadoo.dk>
References: <3cbf0ab3.41557296@mail.wanadoo.dk>
Message-ID: <15551.4665.848301.588586@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> I know that we are still in the sandbox and so we should be
  FB> allowed to play, I want to play nice.

No problem here.  We've got to get the AST right in the end.  If my
code is going to break, sooner is better than later :-).

Jeremy




From bckfnn@worldonline.dk  Thu Apr 18 19:44:22 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 18 Apr 2002 18:44:22 GMT
Subject: [Compiler-sig] Slices
Message-ID: <3cbf10ad.43086805@mail.wanadoo.dk>

Hi,

Trying again for some feedback on the slice production. I still can't
figure out how to use the existing Slice and ExtSlice so I tried to make
some changes like this:

         slice = Ellipsis | Slice(expr? lower, expr? upper)
              -- maybe Slice and ExtSlice should be merged...
-             | ExtSlice(expr* dims)
+             | ExtSlice(slice* dims)
+             | Index(expr value)

        boolop = And | Or


Which I use in the following way (one the slice part of the Subscript is
included):

L[1]

    slice=Index[value=Num[n=1]]

L[1:2]

    slice=Slice[lower=Num[n=1], upper=Num[n=2]]

L[1:2, 3]

    slice=ExtSlice[dims=[
        Slice[lower=Num[n=1], upper=Num[n=2]],
        Index[dims=Num[n=3]
    ]]


A single expr is wrapped by the Index(), a slice is ofcourse wrapped
with a Slice() and comma seperated list of slices is wrapped by a
ExtSlice() object.

Does it make sense?

regards,
finn



From jeremy@zope.com  Thu Apr 18 19:44:32 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 18 Apr 2002 14:44:32 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <20020417230224.A21385@ecn>
References: <20020417230224.A21385@ecn>
Message-ID: <15551.5136.674941.912241@slothrop.zope.com>

(I hope it's okay that I'm responding in little chunks.  There's been
a lot to digest.)

>>>>> "ECN" == Eric C Newton <ecn@metaslash.com> writes:

  ECN>         Setting the "visit" method on the Visitor is, ahem, a
  ECN>         novel approach.  It's a convention that PyChecker (the
  ECN>         use of, not the development of) doesn't like ("unknown
  ECN>         method visit()").  I don't know if I don't like it, but
  ECN>         it was unexpected.  Passing the walker to the dispatch
  ECN>         function is the sort of thing I would expect.

Ahem is a nicer word that sucks :-).

Anyway, it seemed like a clear delegation to me and the alternative
seemed to involve a lot of helper methods that didn't serve any
functional purpose and required visitors to inherit from a special
base class.  The joy of Python is not in writing reams of boring code,
or something like that.

I think the boring code would go something like this:

class VisitorBase:

      def __init__(self):
	  self._visit_hook = None # a hook for the walker
	  
      def register_walker(self, walker):
          self._visit_hook = walker.dispatch

      def unregister_walker(self):
          self._visit_hook = None

      def visit(self, *args, **kwds):
          self._visit_hook(*args, **kwds)

That's a dozen lines of code to do something that still needs to be
explained in the doc string.  So I figured, I'd just use an assignment
and explain it in the doc string.

Jeremy




From jeremy@zope.com  Thu Apr 18 19:53:48 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 18 Apr 2002 14:53:48 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <3cbe77b9.3930812@mail.wanadoo.dk>
References: <20020417230224.A21385@ecn>
 <3cbe77b9.3930812@mail.wanadoo.dk>
Message-ID: <15551.5692.594518.799386@slothrop.zope.com>

Finally, I should say that I don't have any strong attachment to the
visitor code in the compiler package.  Nor do I have any strong
attachment to the AST it defines.  I've started over with python.asdl,
and I don't have any problems with starting over on a new visitor
structure.  Let's find the ideas that work best for all our
applications.  I'm probably going to want something visitor like
implemented in C for the builtin bytecode compiler; I don't have any
idea what that will look like yet.

Efficiency was a non-goal for compiler.visitor.  I didn't even
consider whether generating efficient Java code was possible.  (What
would be the point of writing in the subset of Python that can be
translated efficiently to Java? <wink>)

We're kindof stuck with compiler package as it exists now, since it's
a std part of 2.2.  But there's no reason it can't grow new classes
that provide new or improved functionality.

On the subject of what the right visitor style is, it looks like Eelco
and Joost Visser are doing interesting work in this area on the guide
pattern and visitor combinators, respectively.  (I don't have any idea
of the people are related, although the ideas are at some level :-).
I don't have links handy, but I a google search on name + visitor will
get you there.

Eric-- Feel free to contribute concrete patches that make the existing
visitor code faster.  I tried to speed it up too, once, and didn't
make much progress.  Much as you saw, I didn't see obvious changes
that made a big difference.

Jeremy




From bckfnn@worldonline.dk  Thu Apr 18 21:03:11 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Thu, 18 Apr 2002 20:03:11 GMT
Subject: [Compiler-sig] AST observations
In-Reply-To: <15551.5692.594518.799386@slothrop.zope.com>
References: <20020417230224.A21385@ecn> <3cbe77b9.3930812@mail.wanadoo.dk> <15551.5692.594518.799386@slothrop.zope.com>
Message-ID: <3cbf20a4.47173832@mail.wanadoo.dk>

[Jeremy]

>Finally, I should say that I don't have any strong attachment to the
>visitor code in the compiler package.  Nor do I have any strong
>attachment to the AST it defines.  I've started over with python.asdl,
>and I don't have any problems with starting over on a new visitor
>structure.  Let's find the ideas that work best for all our
>applications.  I'm probably going to want something visitor like
>implemented in C for the builtin bytecode compiler; I don't have any
>idea what that will look like yet.
>
>Efficiency was a non-goal for compiler.visitor.  I didn't even
>consider whether generating efficient Java code was possible.  (What
>would be the point of writing in the subset of Python that can be
>translated efficiently to Java? <wink>)

I see the wink, but the reason is portability between CodeCompiler.java
(which creates java bytecode) and SimpleCompiler.py (which generates
java sourcecode).

Both codegenerators have to do the same work and they uses a lot of the
same tricks to do it, and right now they use the same Visitor API. I
would not be happy if the two codegens had to use two different Visitor
API's just because one is written in java and the other in python.

I would rather not use the official and documented visitor pattern and
roll my own visitors (for both my uses), like you plan on doing for your
C bytecode codegen.

>We're kindof stuck with compiler package as it exists now, since it's
>a std part of 2.2.  But there's no reason it can't grow new classes
>that provide new or improved functionality.

Which begs the question: where do you intend to put the new AST classes
and the supporting functions? A new module in the 'compiler' package?

>On the subject of what the right visitor style is, ...

It might just be my expectation for a visitor pattern that is wrong and
visitor.py that is right. I believe I understood visitor.py when I read
your explanation.

regards,
finn



From ecn@metaslash.com  Fri Apr 19 03:52:53 2002
From: ecn@metaslash.com (Eric C. Newton)
Date: Thu, 18 Apr 2002 22:52:53 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <15551.5136.674941.912241@slothrop.zope.com>; from jeremy@zope.com on Thu, Apr 18, 2002 at 02:44:32PM -0400
References: <20020417230224.A21385@ecn> <15551.5136.674941.912241@slothrop.zope.com>
Message-ID: <20020418225253.A30822@ecn>

> Anyway, it seemed like a clear delegation to me

Right, but I expected the walker to pass itself, rather than giving
the visitor a new method.  I would be quite happy if the object was
assigned, rather than a bound method:

        visitor.walker = self

A bit more conventional, but you won't need to pass the walker to
every visit method.

> So I figured, I'd just use an assignment and explain it in the doc
> string.

That's fine.  I don't see a need for accessors.

-Eric



From ecn@metaslash.com  Fri Apr 19 12:26:52 2002
From: ecn@metaslash.com (Eric C. Newton)
Date: Fri, 19 Apr 2002 07:26:52 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <15551.4365.375570.60402@slothrop.zope.com>; from jeremy@zope.com on Thu, Apr 18, 2002 at 02:31:41PM -0400
References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com>
Message-ID: <20020419072652.B30822@ecn>

> I'm sorry this is confusing, but I think it is one of the standard
> variations on the visitor pattern.

Clearly, the class with "visit" methods is a Visitor.  Now this other
thing is also called ASTVisitor, even though it delagates visitation
to a third class with visit methods.

Reminds me of java code, where the same word appears 4 times:

        Thing thing = new Thing(thingy);

> "Who is responsible for traversing the object structure?  A visitor
> must visit each element of the object structure.  The question is, how
> does it get there?  We can put the responsibility in any of three
> places: in the object structure, in the visitor, or in a separate
> iterator object."

Ok, TreeIterator works for me, too. 8-)

> It might be clearer to merge the walker and the visitor into a single
> class using inheritance.  (I think the Walkabout variant described by
> Palsberg and Jay does this,
>     cf. http://citeseer.nj.nec.com/palsberg97essence.html.)  But I
> thought delegation would be clearer and would avoid the need for a
> magic base class that all visitors must inherit from.

The only advantage I can see for this approach is faster visitation:
the base class could have default visit methods that would know how to
iterate over the child nodes.  getChildNodes() would no longer be
necessary.

-Eric



From neal@metaslash.com  Fri Apr 19 14:07:43 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Fri, 19 Apr 2002 09:07:43 -0400
Subject: [Compiler-sig] Error checking macros
Message-ID: <3CC0169F.A8BB29B0@metaslash.com>

I don't know if others have seen the following technique before.  
Eric and I have used it to greatly reduce lines of code for error
checking.  We borrowed this technique from ACE http://www.cs.wustl.edu/~schmidt/ACE.html (I think).

/* START MACRO */

#define ERR_NULL_CHECK(arg, func_name)                                  \
do {                                                                    \
    if (!(arg)) {                                                       \
        PyErr_SetString(PyExc_ValueError,                               \
                        "field " ##arg " required for " func_name);     \
        return NULL;                                                    \
    }                                                                   \
} while (0)

/* END MACRO */

Then code that used to look like this (FunctionDef):

        if (!name) {
                PyErr_SetString(PyExc_ValueError,
                                "field name is required for FunctionDef");
                return NULL;
        }
        if (!args) {
                PyErr_SetString(PyExc_ValueError,
                                "field args is required for FunctionDef");
                return NULL;
        }
        if (!body) {
                PyErr_SetString(PyExc_ValueError,
                                "field body is required for FunctionDef");
                return NULL;
        }

Would become:

	ERR_NULL_CHECK(name, "FunctionDef");
	ERR_NULL_CHECK(args, "FunctionDef");
	ERR_NULL_CHECK(body, "FunctionDef");

Often, the macro has RETURN in the name, to something to indicate
that the macro could return.

Is there any interest in this?

Neal



From ecn@metaslash.com  Fri Apr 19 14:16:44 2002
From: ecn@metaslash.com (Eric C. Newton)
Date: Fri, 19 Apr 2002 09:16:44 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <15551.5136.674941.912241@slothrop.zope.com>; from jeremy@zope.com on Thu, Apr 18, 2002 at 02:44:32PM -0400
References: <20020417230224.A21385@ecn> <15551.5136.674941.912241@slothrop.zope.com>
Message-ID: <20020419091644.B2537@ecn>

> Ahem is a nicer word than sucks :-).

Poisonous invective lowers morale.  Except when directed at Neal.

-Eric



From skip@pobox.com  Fri Apr 19 14:21:13 2002
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 19 Apr 2002 08:21:13 -0500
Subject: [Compiler-sig] Error checking macros
In-Reply-To: <3CC0169F.A8BB29B0@metaslash.com>
References: <3CC0169F.A8BB29B0@metaslash.com>
Message-ID: <15552.6601.918072.944843@12-248-41-177.client.attbi.com>

    Neal> I don't know if others have seen the following technique before.
    Neal> Eric and I have used it to greatly reduce lines of code for error
    Neal> checking.

This looks like it would be okay as long as you only call it when you hold
no references to Python objects.  In Python code you frequently can't just
return NULL, but have to DECREF some Python objects first.  I don't know if
this problem will arise here (I don't see a lot of context in your example -
is it just for NULL checking input args?), but I assume it might.

Skip



From neal@metaslash.com  Fri Apr 19 14:36:50 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Fri, 19 Apr 2002 09:36:50 -0400
Subject: [Compiler-sig] Error checking macros
References: <3CC0169F.A8BB29B0@metaslash.com> <15552.6601.918072.944843@12-248-41-177.client.attbi.com>
Message-ID: <3CC01D72.ED1E0CB9@metaslash.com>

Skip Montanaro wrote:
> 
>     Neal> I don't know if others have seen the following technique before.
>     Neal> Eric and I have used it to greatly reduce lines of code for error
>     Neal> checking.
> 
> This looks like it would be okay as long as you only call it when you hold
> no references to Python objects.  In Python code you frequently can't just
> return NULL, but have to DECREF some Python objects first.  I don't know if
> this problem will arise here (I don't see a lot of context in your example -
> is it just for NULL checking input args?), but I assume it might.

In this case, yes, I am only talking about checking input args.
This is for generated code which is very regular.  It's used
to store AST info.

Here's a bit more context.  I actually wrote code to test this.
The old code for a function looked like this:

stmt_ty
ClassDef(identifier name, asdl_seq * bases, asdl_seq * body)
{
        stmt_ty p;
        if (!name) {
                PyErr_SetString(PyExc_ValueError,
                                "field name is required for ClassDef");
                return NULL;
        }
        if (!bases) {
                PyErr_SetString(PyExc_ValueError,
                                "field bases is required for ClassDef");
                return NULL;
        }
        if (!body) {
                PyErr_SetString(PyExc_ValueError,
                                "field body is required for ClassDef");
                return NULL;
        }
        p = (stmt_ty)malloc(sizeof(*p));
        if (!p) {
                PyErr_SetString(PyExc_MemoryError, "no memory");
                return NULL;
        }
        p->kind = ClassDef_kind;
        p->v.ClassDef.name = name;
        p->v.ClassDef.bases = bases;
        p->v.ClassDef.body = body;
        return p;
}



The new code looks like this:

stmt_ty
ClassDef(identifier name, asdl_seq * bases, asdl_seq * body)
{
        stmt_ty p;
        ERR_NULL_CHECK(name, "ClassDef");
        ERR_NULL_CHECK(bases, "ClassDef");
        ERR_NULL_CHECK(body, "ClassDef");
        CHECK_MALLOC(p, stmt_ty);

        p->kind = ClassDef_kind;
        p->v.ClassDef.name = name;
        p->v.ClassDef.bases = bases;
        p->v.ClassDef.body = body;
        return p;
}



From neal@metaslash.com  Fri Apr 19 14:53:31 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Fri, 19 Apr 2002 09:53:31 -0400
Subject: [Compiler-sig] Error checking macros
References: <3CC0169F.A8BB29B0@metaslash.com> <15552.6601.918072.944843@12-248-41-177.client.attbi.com>
Message-ID: <3CC0215B.281E627E@metaslash.com>

I forgot to mention that using the macros drops the lines of code
from 1061 to 646, about 40%.

Neal



From bckfnn@worldonline.dk  Fri Apr 19 15:33:49 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Fri, 19 Apr 2002 14:33:49 GMT
Subject: [Compiler-sig] AST observations
In-Reply-To: <20020419072652.B30822@ecn>
References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com> <20020419072652.B30822@ecn>
Message-ID: <3cc02a9c.23311390@mail.wanadoo.dk>

[Jeremy]

> It might be clearer to merge the walker and the visitor into a single
> class using inheritance.  (I think the Walkabout variant described by
> Palsberg and Jay does this,
>     cf. http://citeseer.nj.nec.com/palsberg97essence.html.)  But I
> thought delegation would be clearer and would avoid the need for a
> magic base class that all visitors must inherit from.

[Eric]

>The only advantage I can see for this approach is faster visitation:
>the base class could have default visit methods that would know how to
>iterate over the child nodes.  getChildNodes() would no longer be
>necessary.

You would not need a base class to get that benefit. I proposed a
'traverse' method on the AST nodes that will iterator over the children
of the node like this:

class Module:
   def traverse(self, walker):
      for stmt in self.body:
          walker.visit(stmt)

That way the information about children is kept in the AST nodes.

You would only need the visitor base class to please Jython.

regards,
finn



From bckfnn@worldonline.dk  Fri Apr 19 15:41:37 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Fri, 19 Apr 2002 14:41:37 GMT
Subject: [Compiler-sig] AST observations
In-Reply-To: <15551.4365.375570.60402@slothrop.zope.com>
References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com>
Message-ID: <3cc02ad4.23366799@mail.wanadoo.dk>

[Jeremy]

>It might be clearer to merge the walker and the visitor into a single
>class using inheritance.  (I think the Walkabout variant described by
>Palsberg and Jay does this,
>    cf. http://citeseer.nj.nec.com/palsberg97essence.html.)

Yes, and so does the the Visitor pattern they describe in 2.3. Based on
the performance measurement in chapter 4, at least I hope you understand
why I argue for a static double dispatch Visitor instead of a dynamic
dispatching Walkabout pattern. The added flexibility of dynamic dispatch
is pure YAGNI for me. Since I can control the code generated for each
AST node, it would be plain wrong not to add an 'accept' method.

regards,
finn



From bckfnn@worldonline.dk  Fri Apr 19 15:43:36 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Fri, 19 Apr 2002 14:43:36 GMT
Subject: [Compiler-sig] Number classes
Message-ID: <3cc02c99.23820542@mail.wanadoo.dk>

Hi,

I would like to seperate the number types in some way. I have used 
the patch below in jython but adding a flag to the Num() ctor would
also be fine. I think this is needed because all my codegens need
to generate different code for each of the number types.

Also, I have the information available from my lexer and it would be
a pity to loose the info just to recreate it again in the visitNum()
method.

             | Call(expr func, expr* args, keyword* keywords,
                         expr? starargs, expr? kwargs)
             | Repr(expr value)
-            | Num(string n) -- string representation of a number
+            | IntNum(string n) -- string representation of a integer
+            | FloatNum(string n) -- string representation of a float 
+            | LongNum(string n) -- string representation of a long
+            | ComplexNum(string n) -- string representation of a complex
             | Str(string s) -- need to specify raw, unicode, etc?
             -- other literals? bools?

['Float' and 'Long' are unfortunate classnames in java]

I think the compiler package also have the number type available as
the type of the Const() argument and that solution would be even 
better but I don't know how to express that in asdl. Could we invent
an anonymous 'object' builtin asdl type?

Thoughts?

regards,
finn



From jeremy@zope.com  Fri Apr 19 15:42:31 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Fri, 19 Apr 2002 10:42:31 -0400
Subject: [Compiler-sig] Error checking macros
In-Reply-To: <3CC0215B.281E627E@metaslash.com>
References: <3CC0169F.A8BB29B0@metaslash.com>
 <15552.6601.918072.944843@12-248-41-177.client.attbi.com>
 <3CC0215B.281E627E@metaslash.com>
Message-ID: <15552.11479.408447.247737@slothrop.zope.com>

I think the macros are probably a good idea here.  In general I don't
like macros that hide a return, because it obscures the control flow.
On the third hand, this is all generated code so it doesn't actually
matter what it looks like.

You've probably noticed, though, that I skipped error checking almost
completely in the astmodule.  I'm worried about how much longer the
code will be in *that* module.

Jeremy




From jeremy@zope.com  Fri Apr 19 15:59:39 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Fri, 19 Apr 2002 10:59:39 -0400
Subject: [Compiler-sig] AST observations
In-Reply-To: <3cc02ad4.23366799@mail.wanadoo.dk>
References: <20020417230224.A21385@ecn>
 <15551.4365.375570.60402@slothrop.zope.com>
 <3cc02ad4.23366799@mail.wanadoo.dk>
Message-ID: <15552.12507.204189.483828@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> [Jeremy]
  >> It might be clearer to merge the walker and the visitor into a
  >> single class using inheritance.  (I think the Walkabout variant
  >> described by Palsberg and Jay does this,
  >> cf. http://citeseer.nj.nec.com/palsberg97essence.html.)

  FB> Yes, and so does the the Visitor pattern they describe in
  FB> 2.3. Based on the performance measurement in chapter 4, at least
  FB> I hope you understand why I argue for a static double dispatch
  FB> Visitor instead of a dynamic dispatching Walkabout pattern. The
  FB> added flexibility of dynamic dispatch is pure YAGNI for
  FB> me. Since I can control the code generated for each AST node, it
  FB> would be plain wrong not to add an 'accept' method.

Yes, indeed.  I wasn't trying to do anything efficiently, and I
definitely did not care about Java performance when I wrote all the
code.

For use in the core of Jython or CPython, however, performance is an
important consideration.  It makes complete sense to generate all the
visitor dispatch code statically for Java and C.  (I wonder how much
performance difference it makes for 100% pure Python.)

Jeremy




From jeremy@zope.com  Fri Apr 19 16:10:39 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Fri, 19 Apr 2002 11:10:39 -0400
Subject: [Compiler-sig] Number classes
In-Reply-To: <3cc02c99.23820542@mail.wanadoo.dk>
References: <3cc02c99.23820542@mail.wanadoo.dk>
Message-ID: <15552.13167.585346.444380@slothrop.zope.com>

For CPython, I've got a single routine called parsenumber().  It
converts a string to a PyObject * of the appropriate type.  So for my
code, the only way I could find out if I've got an int vs. a complex
is to parse it and check the return type.  But once I've got the
PyObject *, there's no need to pass a string to ctor.

Is the 'object' type your thinking of for ASDL the generic object type
of the Python implementation?  So I would actually pass the ComplexNum
ctor a PyObject *?

For the code generator, the various number objects are all treated the
same way once they're parsed.  (Unless the compiler was doing some
constant folding, I suppose.)  If there's no difference to the way the
numbers are handled, it would be better to mark a generic Num type
with some flags or attributes that provide the extra info.

Jeremy




From neal@metaslash.com  Fri Apr 19 16:44:57 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Fri, 19 Apr 2002 11:44:57 -0400
Subject: [Compiler-sig] Bug in astmodule?
Message-ID: <3CC03B79.AFF48E1F@metaslash.com>

            if (strcmp(STR(CHILD(n, 1)), "in") == 0)
                return NotIn;
            if (strcmp(STR(CHILD(n, 0)), "is") == 0)
                return IsNot;

Shouldn't the 2nd if, do CHILD(n, 1) like the first?

Neal



From bckfnn@worldonline.dk  Fri Apr 19 16:56:38 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Fri, 19 Apr 2002 15:56:38 GMT
Subject: [Compiler-sig] Number classes
In-Reply-To: <15552.13167.585346.444380@slothrop.zope.com>
References: <3cc02c99.23820542@mail.wanadoo.dk> <15552.13167.585346.444380@slothrop.zope.com>
Message-ID: <3cc03743.26549916@mail.wanadoo.dk>

[Jeremy]

>For CPython, I've got a single routine called parsenumber().  It
>converts a string to a PyObject * of the appropriate type.  So for my
>code, the only way I could find out if I've got an int vs. a complex
>is to parse it and check the return type.  But once I've got the
>PyObject *, there's no need to pass a string to ctor.
>
>Is the 'object' type your thinking of for ASDL the generic object type
>of the Python implementation?  So I would actually pass the ComplexNum
>ctor a PyObject *?

Yes, that was the idea, but then there would only be one Num(object n)
ctor. Is it a useless brainfart?

>For the code generator, the various number objects are all treated the
>same way once they're parsed.  (Unless the compiler was doing some
>constant folding, I suppose.)

Interesting. I need to do something different for each num type:

    public Object visitIntNum(IntNum node) throws Exception {
        module.PyInteger(Integer.parseInt(node.n)).get(code);
        return null;
    }

    public Object visitLongNum(LongNum node) throws Exception {
        module.PyLong(node.n).get(code);
        return null;
    }

>If there's no difference to the way the
>numbers are handled, it would be better to mark a generic Num type
>with some flags or attributes that provide the extra info.

That will work for me too, but as I understand your first paragraph
above, CPython don't know the type without parsing the string. 

regards,
finn



From jeremy@zope.com  Fri Apr 19 17:58:39 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Fri, 19 Apr 2002 12:58:39 -0400
Subject: [Compiler-sig] Number classes
In-Reply-To: <3cc03743.26549916@mail.wanadoo.dk>
References: <3cc02c99.23820542@mail.wanadoo.dk>
 <15552.13167.585346.444380@slothrop.zope.com>
 <3cc03743.26549916@mail.wanadoo.dk>
Message-ID: <15552.19647.972709.314308@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  >> If there's no difference to the way the numbers are handled, it
  >> would be better to mark a generic Num type with some flags or
  >> attributes that provide the extra info.

  FB> That will work for me too, but as I understand your first
  FB> paragraph above, CPython don't know the type without parsing the
  FB> string.

If we have 

expr = Num(object value, num_type type)
num_type = Int | Long | Float | Complex

Then I can parse the string and create a num passing the PyObject *
and setting the appropriate type flag.  I could also handle separate
IntNum, LongNum, etc. ctors, but that seems like more nodes that we
really need.

If the Num node(s) need to have the type specified, that I'd like it
to take an object not a string.

Jeremy






From jeremy@zope.com  Fri Apr 19 18:19:48 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Fri, 19 Apr 2002 13:19:48 -0400
Subject: [Compiler-sig] Bug in astmodule?
In-Reply-To: <3CC03B79.AFF48E1F@metaslash.com>
References: <3CC03B79.AFF48E1F@metaslash.com>
Message-ID: <15552.20916.228865.907994@slothrop.zope.com>

>>>>> "NN" == Neal Norwitz <neal@metaslash.com> writes:

  NN>             if (strcmp(STR(CHILD(n, 1)), "in") == 0)
  NN>                 return NotIn;
  NN>             if (strcmp(STR(CHILD(n, 0)), "is") == 0)
  NN>                 return IsNot;

  NN> Shouldn't the 2nd if, do CHILD(n, 1) like the first?

No.  One is checking "not in" and the other is "is not".

I'll add a comment, though.

Jeremy




From neal@metaslash.com  Fri Apr 19 18:28:12 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Fri, 19 Apr 2002 13:28:12 -0400
Subject: [Compiler-sig] Bug in astmodule?
References: <3CC03B79.AFF48E1F@metaslash.com> <15552.20916.228865.907994@slothrop.zope.com>
Message-ID: <3CC053AC.13306123@metaslash.com>

Jeremy Hylton wrote:
> 
> >>>>> "NN" == Neal Norwitz <neal@metaslash.com> writes:
> 
>   NN>             if (strcmp(STR(CHILD(n, 1)), "in") == 0)
>   NN>                 return NotIn;
>   NN>             if (strcmp(STR(CHILD(n, 0)), "is") == 0)
>   NN>                 return IsNot;
> 
>   NN> Shouldn't the 2nd if, do CHILD(n, 1) like the first?
> 
> No.  One is checking "not in" and the other is "is not".

Oops, I obviously wasn't thinking.  It makes perfect sense.

Neal



From bckfnn@worldonline.dk  Fri Apr 19 19:05:37 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Fri, 19 Apr 2002 18:05:37 GMT
Subject: [Compiler-sig] Number classes
In-Reply-To: <15552.19647.972709.314308@slothrop.zope.com>
References: <3cc02c99.23820542@mail.wanadoo.dk> <15552.13167.585346.444380@slothrop.zope.com> <3cc03743.26549916@mail.wanadoo.dk> <15552.19647.972709.314308@slothrop.zope.com>
Message-ID: <3cc059cd.35392381@mail.wanadoo.dk>

[Jeremy]

>If we have 
>
>expr = Num(object value, num_type type)
>num_type = Int | Long | Float | Complex
>
>Then I can parse the string and create a num passing the PyObject *
>and setting the appropriate type flag.  I could also handle separate
>IntNum, LongNum, etc. ctors, but that seems like more nodes that we
>really need.
>
>If the Num node(s) need to have the type specified, that I'd like it
>to take an object not a string.

I think this is a little overkill. I only needed the 'type' arg when the
value was a string. If we decide to use a 'object value' I have no need
for the type argument.

So this is sufficient for me:

   expr = Num(object value)

and I'm guessing it is for you too.

I just implemented the 'object' type and the only drawback is that the
parser package now depends on the classes in the core package. It is a
little ugly but it is in no way a technical problem.

regards,
finn



From bckfnn@worldonline.dk  Sun Apr 21 13:49:02 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 21 Apr 2002 12:49:02 GMT
Subject: [Compiler-sig] More Jython progress
Message-ID: <3cc2aa24.4619202@mail.wanadoo.dk>

Hi,

Based on the current AST tree and a modified CodeCompiler, I can now
generate javabytecode. I'm sure there are still a few bugs in the
generated code but it passes our initial (rather small) test suite.

The patch to current jython CVS is available here:

: http://sourceforge.net/tracker/?func=detail&atid=312867&aid=546737&group_id=12867

The next phase for Jython is to port jythonc (the java sourcecode
generator) to use the new AST tree.



Some observations about the AST:


I wonder if it would make sense to map the 'identifier' type to a Name
node. In some cases I create a Name node based on an identifier just so
it can play part of the visitor. From the end of visitFunctionDef():

   set(new Name(node.name, Name.Store));

Since the function name was initially parsed as a Name node anyway I
think it would be better to maintain the original Name node in the
FunctionDef node.


The ListComp and the way I uses it bugs me a little. I'll admit it is a
clever way of representing a listcomp but I have been reusing the
visitFor() and visitIf() methods to generate the loop and branching
code. Since I wanted to continue to do that I builds a series of For()
and If() statements from the listcomp:

    set(new Name(tmp_append, Name.Store));

    stmtType n = new Expr(new Call(new Name(tmp_append, Name.Load),
                                   new exprType[] { node.target },
                                   new keywordType[0], null, null));

    for (int i = node.generators.length - 1; i >= 0; i--) {
        listcompType lc = node.generators[i];
        for (int j = lc.ifs.length - 1; j >= 0; j--) {
            n = new If(lc.ifs[j], new stmtType[] { n }, null);
        }
        n = new For(lc.target, lc.iter, new stmtType[] { n }, null);
    }
    visit(n);
    visit(new Delete(new exprType[] {
	                 new Name(tmp_append, Name.Del) }));

I think it is quite clever to reuse the For() and If() codegen but I
don't like to create new nodes on the fly. These new nodes will not have
the right linenumbers and whatever other additional information that we
attach to the nodes. I would rather prefer that the For() and If() nodes
was stored in the ListComp node.


regards,
finn



From bckfnn@worldonline.dk  Sun Apr 21 16:20:08 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Sun, 21 Apr 2002 15:20:08 GMT
Subject: [Compiler-sig] Re: [Python-checkins] python/nondist/sandbox/ast asdl.h,1.4,1.5 astmodule.c,1.5,1.6
In-Reply-To: <E16zIb2-0003mj-00@usw-pr-cvs1.sourceforge.net>
References: <E16zIb2-0003mj-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <3cc2d51f.15622413@mail.wanadoo.dk>

[Neal in a checkin message]

>Get build working again, 

Thanks Neal, I'm sorry for breaking the build deliberately without
supplying a fix.

regards,
finn



From neal@metaslash.com  Sun Apr 21 16:30:52 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Sun, 21 Apr 2002 11:30:52 -0400
Subject: [Compiler-sig] Re: [Python-checkins] python/nondist/sandbox/ast asdl.h,1.4,1.5
 astmodule.c,1.5,1.6
References: <E16zIb2-0003mj-00@usw-pr-cvs1.sourceforge.net> <3cc2d51f.15622413@mail.wanadoo.dk>
Message-ID: <3CC2DB2C.3D6ADCA@metaslash.com>

Finn Bock wrote:
> 
> [Neal in a checkin message]
> 
> >Get build working again,
> 
> Thanks Neal, I'm sorry for breaking the build deliberately without
> supplying a fix.

No problem.  I'm glad we can make progress on both CPython & Jython.

Neal



From jeremy@zope.com  Sun Apr 21 17:49:32 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Sun, 21 Apr 2002 12:49:32 -0400
Subject: [Compiler-sig] Re: [Python-checkins] python/nondist/sandbox/ast asdl.h,1.4,1.5 astmodule.c,1.5,1.6
In-Reply-To: <E16zIb2-0003mj-00@usw-pr-cvs1.sourceforge.net>
References: <E16zIb2-0003mj-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <15554.60828.475903.594444@slothrop.zope.com>

>>>>> "NN" == nnorwitz  <nnorwitz@sourceforge.net> writes:

  NN> Add XXX question about why we are using #define rather than
  NN> typedef

I've got an uncommitted change for this, too.  I don't know why it's a
define.  (Who wrote that code? ;-)

Jeremy




From jeremy@zope.com  Mon Apr 22 05:31:53 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Mon, 22 Apr 2002 00:31:53 -0400
Subject: [Compiler-sig] More Jython progress
In-Reply-To: <3cc2aa24.4619202@mail.wanadoo.dk>
References: <3cc2aa24.4619202@mail.wanadoo.dk>
Message-ID: <15555.37433.560666.702582@slothrop.zope.com>

>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:

  FB> Hi, Based on the current AST tree and a modified CodeCompiler, I
  FB> can now generate javabytecode. I'm sure there are still a few
  FB> bugs in the generated code but it passes our initial (rather
  FB> small) test suite.

  FB> The patch to current jython CVS is available here:

  FB> :
  FB> http://sourceforge.net/tracker/?func=detail&atid=312867&aid=546737&group_id=12867

  FB> The next phase for Jython is to port jythonc (the java
  FB> sourcecode generator) to use the new AST tree.

I had hoped to look over your code this weekend, but didn't get to
it.  The subtleties of converting list comprehensions delayed me <0.6
wink>.  Is it your intent to re-do the compiler(s) in Jython?  In
hindsight, it seems clear that you weren't doing this just to kill
time, but I didn't realize that both Pythons were in for a compiler
overhaul at the same time.

  FB> Some observations about the AST:

I'll have to think about these tomorrow.  I hope it's not too much
trouble that I changed Dict.

  FB> The ListComp and the way I uses it bugs me a little. I'll admit
  FB> it is a clever way of representing a listcomp but I have been
  FB> reusing the visitFor() and visitIf() methods to generate the
  FB> loop and branching code. Since I wanted to continue to do that I
  FB> builds a series of For() and If() statements from the listcomp:

I just looked at the compiler package and saw that it's visitFor() and
visitListFor() are quite similar.  The visitIf() and visitListIf()
aren't very similar, presumably because a lot of logic is in the
visitListComp() method.

    The compiler package uses a ListComp() object with two children --
    a binding expression and a list of ListCompFor and ListCompIf
    nodes.)

I'd need to think harder about how the two kinds of fors and ifs could
be merged here.  Perhaps you could accomplish this with helper methods
instead of creating throwaway nodes?  _visit_generic_for() that could
be called be either visitFor() or visitlistcomp()?

Jeremy




From jeremy@zope.com  Mon Apr 22 05:35:46 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Mon, 22 Apr 2002 00:35:46 -0400
Subject: [Compiler-sig] sharing AST between C and Python
Message-ID: <15555.37666.880198.190043@slothrop.zope.com>

Has anyone given thought about how to share an AST between the Python
core and user code written in Python?  I think I mentioned earlier
that I was leaning towards an explicit "pickling" phases to copy an
AST across the boundary rather than trying to share references to a
single struct.  Does anyone else have an opinion?

I'm asking because the first draft of astmodule.c is winding down, and
I'll need the pickler soon if I want to do any noodling with the
converted AST.

Jeremy

PS That's pickling in the ASDL sense, which is similar to but not the
same as pickling in the Python sense.




From bckfnn@worldonline.dk  Mon Apr 22 11:37:14 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Mon, 22 Apr 2002 10:37:14 GMT
Subject: [Compiler-sig] More Jython progress
In-Reply-To: <15555.37433.560666.702582@slothrop.zope.com>
References: <3cc2aa24.4619202@mail.wanadoo.dk> <15555.37433.560666.702582@slothrop.zope.com>
Message-ID: <3cc3d9de.2059691@mail.wanadoo.dk>

>>>>>> "FB" == Finn Bock <bckfnn@worldonline.dk> writes:
>
>  FB> Hi, Based on the current AST tree and a modified CodeCompiler, I
>  FB> can now generate javabytecode. I'm sure there are still a few
>  FB> bugs in the generated code but it passes our initial (rather
>  FB> small) test suite.
>
>  FB> The patch to current jython CVS is available here:
>
>  FB> :
>  FB> http://sourceforge.net/tracker/?func=detail&atid=312867&aid=546737&group_id=12867
>
>  FB> The next phase for Jython is to port jythonc (the java
>  FB> sourcecode generator) to use the new AST tree.

[Jeremy]

>I had hoped to look over your code this weekend, but didn't get to
>it.  The subtleties of converting list comprehensions delayed me <0.6
>wink>.  Is it your intent to re-do the compiler(s) in Jython? 

Yes. One is done already, still one compiler to do.

If you want to look, the AST is created by org.python.p2.TreeBuilder
which is called by the actions specified in the JavaCC grammar. The
compiler are located in org.python.c2.CodeCompiler. In the c2 package
there are also a ScopesCompiler that handling the symbol types
(fast_locals, cells, etc) and an ArgListCompiler that deals with default
argument values and argtuple unpacking. All three compiler classes are
using the visitor pattern I outlined a while ago.

>In
>hindsight, it seems clear that you weren't doing this just to kill
>time, but I didn't realize that both Pythons were in for a compiler
>overhaul at the same time.

The size of the overhaul is significantly smaller for jython. Our old
syntax tree was almost node-by-node exactly the same as the new AST but
all the children was anonymous. Except for a few smaller differences
(like listcomp and function arguments) the transformation have been
straightforward.

The main reason I wanted to switch the new AST is because we have to
create yet another AST visitor, one that does on the fly interpretation
of the python code. I did not want to start on this visitor using the
old tree, I guessed it would be faster to switch the other compilers to
the new AST instead.

>  FB> Some observations about the AST:
>
>I'll have to think about these tomorrow.  I hope it's not too much
>trouble that I changed Dict.

The new way of representing Dict keys and values is rather unnatural for
jython because all the existing support functions and PyDictionary()
ctors assume that the elements are alternating keys and values.

It is not a big issue and I don't think the slowdown of rearranging the
elements will be noticable.

>  FB> The ListComp and the way I uses it bugs me a little. I'll admit
>  FB> it is a clever way of representing a listcomp but I have been
>  FB> reusing the visitFor() and visitIf() methods to generate the
>  FB> loop and branching code. Since I wanted to continue to do that I
>  FB> builds a series of For() and If() statements from the listcomp:
>
>I just looked at the compiler package and saw that it's visitFor() and
>visitListFor() are quite similar.  The visitIf() and visitListIf()
>aren't very similar, presumably because a lot of logic is in the
>visitListComp() method.
>
>    The compiler package uses a ListComp() object with two children --
>    a binding expression and a list of ListCompFor and ListCompIf
>    nodes.)
>
>I'd need to think harder about how the two kinds of fors and ifs could
>be merged here.  Perhaps you could accomplish this with helper methods
>instead of creating throwaway nodes?  _visit_generic_for() that could
>be called be either visitFor() or visitlistcomp()?

I'll think more about it, but the main problem is that visitFor() and
visitIf() are using recursion while a ListComp is a sequence.

regards,
finn



From jeremy@zope.com  Wed Apr 24 00:12:34 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Tue, 23 Apr 2002 19:12:34 -0400
Subject: [Compiler-sig] More Jython progress
In-Reply-To: <3cc3d9de.2059691@mail.wanadoo.dk>
References: <3cc2aa24.4619202@mail.wanadoo.dk>
 <15555.37433.560666.702582@slothrop.zope.com>
 <3cc3d9de.2059691@mail.wanadoo.dk>
Message-ID: <15557.60002.639949.876456@slothrop.zope.com>

I'm afraid I won't have any more time to work on this until the end of
the week.  A bunch of customer-related projects appeared this week,
and I need to devote some time to them.  

Jeremy

PS If anyone else wants to extend astmodule.c, be my guess.