From ldl@LDL.HealthPartners.COM  Wed Nov  1 01:23:47 2000
From: ldl@LDL.HealthPartners.COM (LD Landis)
Date: Tue, 31 Oct 2000 19:23:47 -0600 (CST)
Subject: [Compiler-sig] Parser Options
In-Reply-To: <20001031170146.4F9471CFE3@dinsdale.python.org> from "compiler-sig-request@python.org" at Oct 31, 2000 12:01:46 PM
Message-ID: <200011010123.eA11NmR27092@LDL.HealthPartners.COM>

Hi,

  I am working on another project that has the need for (preferably)
  a grammar driven parser.  I've looked at the approaches the CPython,
  JPython and Python.net have taken to this problem, and am thinking
  that there should be a way to generate some 'target language independent'
  scheme for parser generation.

  Also, I find something more along the lines of an Earley algorithm
  quite interesting (recently found the Accent Compiler Compiler), so
  am looking at how that all works, in general.

  So, my question is sort of along the lines: Guido, have you thought
  about, have pointers to ideas, etc, a way to unify the Python language
  grammar over the now-three implementations?

  The current approaches seem to be lacking nice connectivity between
  AST generation and rule "reduce" actions... It seems that it could be
  possible to generate some sort of an intermediary level that would
  handle (abstractly) the "action code" specification... so that a 
  separate "back end" could generate low-level (C, Python, Net-C-variant)
  code (bindings?).

  I am somewhat interested/motivated to look at this issue, but am not
  interested in starting a duplicate path... would rather hitch up with
  others that have thought longer about the problems.  I have no real
  experience in anything beyond serious yacc/lex hacking (no attribute
  grammar usage... only read about them)...
  
  I want the "tool" to be useful in my other interest, which has some
  keywordish/context sensitivity too... and would like to see the Python
  compiler world benefit as well.  For example, ideas on how to allow
  parsing of a "[<command>[ <options>]]+<nl>" language, where things are 
  context dependent:

     FOR I=0:1:31,127 SET X="WRITE "_I XECUTE X IF I%5=0 WRITE !

  which can equivalently be written:

     F I=0:1:31,127 S X="WRITE "_I X X I I%5=0 W !

  In [<command> <options>] ([C O] below) pairing is:

     F I=0:1:31,127   S X="WRITE "_I   X X   I I%5=0   W !
    [C OOOOOOOOOOOO] [C OOOOOOOOOOOO] [C O] [C OOOOO] [C O]

  TIA for any discussion/pointers/ideas!

Cheers,
	--ldl


From skip@mojam.com (Skip Montanaro)  Wed Nov  1 03:11:56 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 31 Oct 2000 21:11:56 -0600 (CST)
Subject: [Compiler-sig] Parser Options
In-Reply-To: <200011010123.eA11NmR27092@LDL.HealthPartners.COM>
References: <20001031170146.4F9471CFE3@dinsdale.python.org>
 <200011010123.eA11NmR27092@LDL.HealthPartners.COM>
Message-ID: <14847.35324.488461.119829@beluga.mojam.com>

    ldl>   Also, I find something more along the lines of an Earley
    ldl>   algorithm quite interesting (recently found the Accent Compiler
    ldl>   Compiler), so am looking at how that all works, in general.

John Aycock's SPARK toolkit uses an Earley algorithm:

    http://www.csr.UVic.CA/~aycock/python/

-- 
Skip Montanaro (skip@mojam.com)
http://www.mojam.com/
http://www.musi-cal.com/


From jeremy@alum.mit.edu  Wed Nov  1 20:11:19 2000
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 1 Nov 2000 15:11:19 -0500 (EST)
Subject: [Compiler-sig] changes to ast
Message-ID: <14848.30951.662662.646264@bitdiddle.concentric.net>

I made several checkins last week (or the week before), when SF did
not seem to be sending email with the log messages.  I thought I'd
update briefly and do a little hand-wringing about making changes
without consulting anyone.

The key change was to replace the hand-edited ast.py module with an
automatically generated one.  The astgen module uses ast.txt -- both
inside the compiler package -- to generate ast.py, which is still
under CVS control.  I think that's a relatively safe change.

I'm a little concerned about the interface changes I made to ast
nodes.  I eliminated the _children tuple that each node had and
eliminated the support for sequence access (node[0] to get type,
node[1] to get first child node, etc) and updated all the code
elsewhere to reflect that change.  

I also updated transformer to instantiate Node subclasses directly
instead of going through the Node function to call the appropriate
method.

These changes made the code faster and, I hope, clearer, but I wonder
how many people depended on the old sequence-style access protocol.
If its eliminate causes problems for anyone, let me know soon; it can
be restored if it's a serious issue for anyone.

Jeremy


From MarkH@ActiveState.com  Wed Nov  1 22:06:09 2000
From: MarkH@ActiveState.com (Mark Hammond)
Date: Thu, 2 Nov 2000 09:06:09 +1100
Subject: [Compiler-sig] changes to ast
In-Reply-To: <14848.30951.662662.646264@bitdiddle.concentric.net>
Message-ID: <LCEPIIGDJPKCOIHOBJEPCELFCFAA.MarkH@ActiveState.com>

> These changes made the code faster

How much faster?  Significantly?  The speed of this is still my biggest
issue, and may get around to looking at the C implemented compiler tools
announced here (by the Wing people?) not too long ago...


> but I wonder how many people depended on the old
> sequence-style access protocol.

.NET does, but I am happy to move to this new scheme - the code is
definitely clearer without using node indexing...

Mark.



From jeremy@alum.mit.edu  Wed Nov  1 22:28:44 2000
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 1 Nov 2000 17:28:44 -0500 (EST)
Subject: [Compiler-sig] changes to ast
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPCELFCFAA.MarkH@ActiveState.com>
References: <14848.30951.662662.646264@bitdiddle.concentric.net>
 <LCEPIIGDJPKCOIHOBJEPCELFCFAA.MarkH@ActiveState.com>
Message-ID: <14848.39196.736684.639089@bitdiddle.concentric.net>

>>>>> "MH" == Mark Hammond <MarkH@ActiveState.com> writes:

  >> These changes made the code faster
  MH> How much faster?  Significantly?  The speed of this is still my
  MH> biggest issue, and may get around to looking at the C
  MH> implemented compiler tools announced here (by the Wing people?)
  MH> not too long ago...

I don't know if it is significantly faster or not.  I wasn't guided by
a specific goal; I just spent an afternoon seeing if the profiler
showed anything obvious.  There were a couple of hot spots that I
fixed, but nothing dramatic.

I actually haven't compared old times and new times.

Do you have specific performances goals that could be addressed?

What kind of tree does the Wing IDE code produce?  I saw the
announcement, but haven't had time to look at it.

  >> but I wonder how many people depended on the old sequence-style
  >> access protocol.

  MH> .NET does, but I am happy to move to this new scheme - the code
  MH> is definitely clearer without using node indexing...

Ok.

Jeremy



From MarkH@ActiveState.com  Wed Nov  1 22:50:25 2000
From: MarkH@ActiveState.com (Mark Hammond)
Date: Thu, 2 Nov 2000 09:50:25 +1100
Subject: [Compiler-sig] changes to ast
In-Reply-To: <14848.39196.736684.639089@bitdiddle.concentric.net>
Message-ID: <LCEPIIGDJPKCOIHOBJEPMELJCFAA.MarkH@ActiveState.com>

[Jeremy]
> Do you have specific performances goals that could be addressed?

Nothing too specific ATM.  However, my compiler is woefully slow.  The
profiler shows that roughly 1/2 the time is spent in COM (talking to .NET)
and the other half is in the AST transformation code.

I identified some hotspots in my code, but was still left with these rough
ratios.  I didn't really look into the AST code, just noted the fact I
should ;-)

Some of the time spent in COM will be Python's fault, but much of it will be
.NET doing its thing, creating the DLL, doing its "assembly" step, etc - so
the AST code would appear to offer the most potential.

> What kind of tree does the Wing IDE code produce?  I saw the
> announcement, but haven't had time to look at it.

I am in the exact same boat - I have no idea either.

However, I should mention the big advantage to me of keeping the compiler in
.py code is the potential for the compiler to compile itself.  Once this
happens, we can support eval, exec, etc.  The more .py code the better for
this goal.

Mark.



From jpe@archaeopteryx.com  Wed Nov  1 23:06:01 2000
From: jpe@archaeopteryx.com (John Ehresman)
Date: Wed, 1 Nov 2000 18:06:01 -0500 (EST)
Subject: [Compiler-sig] changes to ast
In-Reply-To: <14848.39196.736684.639089@bitdiddle.concentric.net>
Message-ID: <Pine.LNX.4.10.10011011731540.2647-100000@egret>

On Wed, 1 Nov 2000, Jeremy Hylton wrote:
> What kind of tree does the Wing IDE code produce?  I saw the
> announcement, but haven't had time to look at it.

The tree produced by the nodetransformer module in the parsetools package
is similiar, but not identical to the tree produced by the transformer
module.  It can either be generated as a series of nested tuples, where
the 1st item is the name of the node's class, or it can be generated as a
series of class instances if the ast module is available.  There are also
a number of other, more substantial, differences, such as how assignments
are encoded; rather than multiple varients of assigns for tuples, attribs,
& so forth, there is a single assign class with right-hand-side &
left-hand-side children.  At one point, I was going to propose changes to
the ast, but now I wonder if it would be simpler to mimic the Python
transformer exactly -- because it's easier for me to document and justify
the behavior ;).

If I recall correctly, the 2 main hotspots in the python transformer were
converting the parser tree to a tuple and transforming a test parse node
in the common case where it's a simple atom.  You might get a noticable
improvement with a loop in the test node transform to call directly to the
atom transform if all in between nodes have only one child.  Converting
the parse tree is harder to optimize; I did create a new node wrapper
class which only converts to Python data types as needed, but if all
tranformation is done in Python, the entire tree will need to be converted
anyway.

BTW: The Wing IDE does use this transformer internally, but there's no way
to get at the parse tree directly with the IDE.  The transformer and
supporting modules are packaged up as the parsetools package at
ftp://archaeopteryx.com/pub/parsetools/.

John



From tripp@perspex.com  Mon Nov 27 08:05:51 2000
From: tripp@perspex.com (Tripp Lilley)
Date: Mon, 27 Nov 2000 03:05:51 -0500
Subject: [Compiler-sig] Ugly hacks: modifying instance_getattr
Message-ID: <3A2215DF.4AB8DFF@perspex.com>

This isn't exactly a compilation issue, but it's somewhat related, and I
didn't see an obvious "ugly hacks" or "interpreter devloper" list
mentioned anywhere out front.

I have modified my interpreter so that, in the LOAD_ATTR case statement,
it peeks ahead in the code and looks to see if the next opcode is
"CALL_FUNCTION". If so, I'd like to use slightly different getattr steps
to resolve the attribute reference.

For resolution of the call o.XXX( ), I'd like my getattr to use these
steps:

	- if an attribute called __meth_XXX__ exists, return it.
	- if an attribute called __getmethod__ exists, call it to allow
	  it to resolve the attribute. If it returns "None", continue
	  looking for the attribute.
	- continue with "normal" attribute resolution semantics

Basically, the idea is to be able to trap attribute accesses that are
going to be immediately used as method invocations. Why?

	http://sourceforge.net/projects/selfish/

But that's another story. At this point, I can, more or less, determine
the right "context" in which I want to apply these semantics. With the
hack to eval_code2, I trap bytecode method invocations, and with a
modification to PyObject_CallMethod, I trap C API method invocations.
What I now need to do is pass that contextual "hint" down into the
various flavours of getattr.

What's the most "friendly" way of approaching that? Since I can't use
default arguments, adding another parameter to getattrofunc would mean
I'd have to modify all of the modules to pass the parameter. Yuck. I
can't use a global variable because of thread safety issues (and because
that's ugly and I refuse :) ). Is there some thread state to which I
have access from within a getattrofunc?

One disgusting possibility that occured to me was to modify the object
being searched, temporarily replacing its tp_getattro member with a
wrapper that would prepend the method semantics. I've temporarily shot
that one down because it means I have to investigate what type of object
it is, so I can apply the correct prepend (ie: a module might or might
not support the "method" mechanisms I'm proposing). However, I'm willing
to revisit that...

The other alternative is that I'm doing Something I Shouldn't Be
Doing(tm).

-- 
Tripp Lilley * tripp@perspex.com *
http://stargate.eheart.sg505.net/~tlilley/
-----------------------------------------------------------------------------
"This whole textual substitution thing is pissing me off.
 I feel like I'm programming in Tcl."

- Eric Frias, former roommate, hacking partner extraordinaire


From gstein@lyra.org  Mon Nov 27 09:29:44 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 27 Nov 2000 01:29:44 -0800
Subject: [Compiler-sig] Ugly hacks: modifying instance_getattr
In-Reply-To: <3A2215DF.4AB8DFF@perspex.com>; from tripp@perspex.com on Mon, Nov 27, 2000 at 03:05:51AM -0500
References: <3A2215DF.4AB8DFF@perspex.com>
Message-ID: <20001127012944.A14107@lyra.org>

Take a look at using Python's "metaclasses" feature:

    http://www.python.org/doc/essays/metaclasses/

While there is a lot there to get your head around, take a look at some of
the examples. That could give you a quick peek into whether/how to do your
"hook into a method call" gimmick.

And you don't even have to modify the interpreter :-)

Cheers,
-g

On Mon, Nov 27, 2000 at 03:05:51AM -0500, Tripp Lilley wrote:
> This isn't exactly a compilation issue, but it's somewhat related, and I
> didn't see an obvious "ugly hacks" or "interpreter devloper" list
> mentioned anywhere out front.
> 
> I have modified my interpreter so that, in the LOAD_ATTR case statement,
> it peeks ahead in the code and looks to see if the next opcode is
> "CALL_FUNCTION". If so, I'd like to use slightly different getattr steps
> to resolve the attribute reference.
> 
> For resolution of the call o.XXX( ), I'd like my getattr to use these
> steps:
> 
> 	- if an attribute called __meth_XXX__ exists, return it.
> 	- if an attribute called __getmethod__ exists, call it to allow
> 	  it to resolve the attribute. If it returns "None", continue
> 	  looking for the attribute.
> 	- continue with "normal" attribute resolution semantics
> 
> Basically, the idea is to be able to trap attribute accesses that are
> going to be immediately used as method invocations. Why?
> 
> 	http://sourceforge.net/projects/selfish/
> 
> But that's another story. At this point, I can, more or less, determine
> the right "context" in which I want to apply these semantics. With the
> hack to eval_code2, I trap bytecode method invocations, and with a
> modification to PyObject_CallMethod, I trap C API method invocations.
> What I now need to do is pass that contextual "hint" down into the
> various flavours of getattr.
> 
> What's the most "friendly" way of approaching that? Since I can't use
> default arguments, adding another parameter to getattrofunc would mean
> I'd have to modify all of the modules to pass the parameter. Yuck. I
> can't use a global variable because of thread safety issues (and because
> that's ugly and I refuse :) ). Is there some thread state to which I
> have access from within a getattrofunc?
> 
> One disgusting possibility that occured to me was to modify the object
> being searched, temporarily replacing its tp_getattro member with a
> wrapper that would prepend the method semantics. I've temporarily shot
> that one down because it means I have to investigate what type of object
> it is, so I can apply the correct prepend (ie: a module might or might
> not support the "method" mechanisms I'm proposing). However, I'm willing
> to revisit that...
> 
> The other alternative is that I'm doing Something I Shouldn't Be
> Doing(tm).
> 
> -- 
> Tripp Lilley * tripp@perspex.com *
> http://stargate.eheart.sg505.net/~tlilley/
> -----------------------------------------------------------------------------
> "This whole textual substitution thing is pissing me off.
>  I feel like I'm programming in Tcl."
> 
> - Eric Frias, former roommate, hacking partner extraordinaire
> 
> _______________________________________________
> Compiler-sig mailing list
> Compiler-sig@python.org
> http://www.python.org/mailman/listinfo/compiler-sig

-- 
Greg Stein, http://www.lyra.org/


From echuck@mindspring.com  Mon Nov 27 16:03:31 2000
From: echuck@mindspring.com (echuck@mindspring.com)
Date: Mon, 27 Nov 2000 11:03:31 -0500
Subject: [Compiler-sig] Re: [selfish-devel] Ugly hacks: modifying instance_getattr
Message-ID: <Springmail.105.975341011.0.48947100@www.springmail.com>

You have lost me. Why do you want to hack on "obj.foo()"?


Tripp Lilley <tripp@perspex.com> wrote:
> This isn't exactly a compilation issue, but it's somewhat related, and I
didn't see an obvious "ugly hacks" or "interpreter devloper" list
mentioned anywhere out front.

I have modified my interpreter so that, in the LOAD_ATTR case statement,
it peeks ahead in the code and looks to see if the next opcode is
"CALL_FUNCTION". If so, I'd like to use slightly different getattr steps
to resolve the attribute reference.

For resolution of the call o.XXX( ), I'd like my getattr to use these
steps:

	- if an attribute called __meth_XXX__ exists, return it.
	- if an attribute called __getmethod__ exists, call it to allow
	  it to resolve the attribute. If it returns "None", continue
	  looking for the attribute.
	- continue with "normal" attribute resolution semantics

Basically, the idea is to be able to trap attribute accesses that are
going to be immediately used as method invocations. Why?

	http://sourceforge.net/projects/selfish/

But that's another story. At this point, I can, more or less, determine
the right "context" in which I want to apply these semantics. With the
hack to eval_code2, I trap bytecode method invocations, and with a
modification to PyObject_CallMethod, I trap C API method invocations.
What I now need to do is pass that contextual "hint" down into the
various flavours of getattr.

What's the most "friendly" way of approaching that? Since I can't use
default arguments, adding another parameter to getattrofunc would mean
I'd have to modify all of the modules to pass the parameter. Yuck. I
can't use a global variable because of thread safety issues (and because
that's ugly and I refuse :) ). Is there some thread state to which I
have access from within a getattrofunc?

One disgusting possibility that occured to me was to modify the object
being searched, temporarily replacing its tp_getattro member with a
wrapper that would prepend the method semantics. I've temporarily shot
that one down because it means I have to investigate what type of object
it is, so I can apply the correct prepend (ie: a module might or might
not support the "method" mechanisms I'm proposing). However, I'm willing
to revisit that...

The other alternative is that I'm doing Something I Shouldn't Be
Doing(tm).

-- 
Tripp Lilley * tripp@perspex.com *
http://stargate.eheart.sg505.net/~tlilley/
-----------------------------------------------------------------------------
"This whole textual substitution thing is pissing me off.
 I feel like I'm programming in Tcl."

- Eric Frias, former roommate, hacking partner extraordinaire
_______________________________________________
selfish-devel mailing list
selfish-devel@lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/selfish-devel



From tlilley@perspex.com  Mon Nov 27 20:58:57 2000
From: tlilley@perspex.com (Tripp Lilley)
Date: Mon, 27 Nov 2000 20:58:57 +0000 (/etc/localtime)
Subject: [Compiler-sig] Re: [selfish-devel] Ugly hacks: modifying instance_getattr
In-Reply-To: <Springmail.105.975341011.0.48947100@www.springmail.com>
Message-ID: <Pine.LNX.3.96.1001127204710.8405E-100000@mail.perspex.com>

On Mon, 27 Nov 2000 echuck@mindspring.com wrote:

> You have lost me. Why do you want to hack on "obj.foo()"?

I refer you to our earlier discussion on selfish-devel, but to summarize
for others' benefit (or anguish, as the case may be):

	I have two slots, "scalar" and "method", which are, respectively,
	a scalar value and a method (bound using the "new" hacks). I want
	to access those slots according to the conventions established in
	the Self language, namely without regard to whether they're
	implemented as simple attributes or as methods. Thus:

		# retrieve

		o.scalar
		o.scalar( )

		o.method
		o.method( )


		# set

		o.scalar = 42
		o.scalar( 42 )

		o.method = 25
		o.method( 25 )


Taking first the case of the "scalar" slot:

	I want to define two attributes, one a simple value, one a method.
	When the slot "scalar" is retrieved as a simple attribute (ie:
	o.scalar), it will magically pop out of __dict__['scalar'] per
	normal Python getattr rules.

	On the other hand, when it's retrieved just prior to a
	CALL_FUNCTION bytecode (or by the PyObject_CallMethod call), it
	will try magically returning __dict__['__meth_scalar__'], a
	wrapper which handles "method" semantics for the slot. If that
	fails, it will try calling __getmethod__, and if that throws an
	AttributeError, it will fall back to normal semantics.

All of this is predicated on the setting of __use_getmethod__ or somesuch
in globals.

Now, the case of the "method" slot:

	When called as a simple scalar (ie: o.method), this would be
	trapped by either __getattr__, or by the new __attr_method__ hook
	proposed in PEP 213. Either of those would simply execute the
	method in-place, returning the resolved value.

	When called as a method, the contents of
	__dict__['__meth_method__'] would be returned, which happen to be
	a normal, old-fashioned method object, which is called as normal.

So, basically, it allows me to do away with the incredibly ugly
slot-wrapper crap I'm using right now to implement attr/method opacity.
For certain general cases, it's quite efficient. For other general cases,
the inefficiency is masked by other inefficiencies that aren't avoidable.
At least as far as I know right now :)

--
   Joy-Loving * Tripp Lilley  *  http://stargate.eheart.sg505.net/~tlilley/
------------------------------------------------------------------------------
   "There were other lonely singers / in a world turned deaf and blind
    Who were crucified for what they tried to show.
    Their voices have been scattered by the swirling winds of time,
    'Cause the truth remains that no one wants to know."

   - Kris Kristofferson, "To Beat the Devil"




From guido@python.org  Mon Nov 27 20:51:35 2000
From: guido@python.org (Guido van Rossum)
Date: Mon, 27 Nov 2000 15:51:35 -0500
Subject: [Compiler-sig] Re: [selfish-devel] Ugly hacks: modifying instance_getattr
In-Reply-To: Your message of "Mon, 27 Nov 2000 20:58:57 GMT."
 <Pine.LNX.3.96.1001127204710.8405E-100000@mail.perspex.com>
References: <Pine.LNX.3.96.1001127204710.8405E-100000@mail.perspex.com>
Message-ID: <200011272051.PAA26292@cj20424-a.reston1.va.home.com>

> 	I have two slots, "scalar" and "method", which are, respectively,
> 	a scalar value and a method (bound using the "new" hacks). I want
> 	to access those slots according to the conventions established in
> 	the Self language, namely without regard to whether they're
> 	implemented as simple attributes or as methods. Thus:
> 
> 		# retrieve
> 
> 		o.scalar
> 		o.scalar( )
> 
> 		o.method
> 		o.method( )
> 
> 
> 		# set
> 
> 		o.scalar = 42
> 		o.scalar( 42 )
> 
> 		o.method = 25
> 		o.method( 25 )

For "set", this is possible using the __setattr__ hook.

But for "retrieve" it is impossible, and I strongly recommend against
it.

In Python bound methods are first-class objects and can be passed
around just like function pointers.  For example:

	l = [0,1,2,3]
	a = l.append
	a(4)
	a(5)
	print l		# [0,1,2,3,4,5]

Your hack would break this, and I object against calling the resulting
language "Python".

Instead, you can use __getattr__ to redirect any reference to o.scalar
to a method call, so that you can use what you call scalar notation
for method implementation.  In my eyes, this is better than what you
want!

(Also note that the Python compiler-sig is really intended for
discussions of new ways of compiling Python, not for discussions of
the existing Python compiler.)

--Guido van Rossum (home page: http://www.python.org/~guido/)