From ldl@LDL.HealthPartners.COM Wed Nov 1 01:23:47 2000 From: ldl@LDL.HealthPartners.COM (LD Landis) Date: Tue, 31 Oct 2000 19:23:47 -0600 (CST) Subject: [Compiler-sig] Parser Options In-Reply-To: <20001031170146.4F9471CFE3@dinsdale.python.org> from "compiler-sig-request@python.org" at Oct 31, 2000 12:01:46 PM Message-ID: <200011010123.eA11NmR27092@LDL.HealthPartners.COM> Hi, I am working on another project that has the need for (preferably) a grammar driven parser. I've looked at the approaches the CPython, JPython and Python.net have taken to this problem, and am thinking that there should be a way to generate some 'target language independent' scheme for parser generation. Also, I find something more along the lines of an Earley algorithm quite interesting (recently found the Accent Compiler Compiler), so am looking at how that all works, in general. So, my question is sort of along the lines: Guido, have you thought about, have pointers to ideas, etc, a way to unify the Python language grammar over the now-three implementations? The current approaches seem to be lacking nice connectivity between AST generation and rule "reduce" actions... It seems that it could be possible to generate some sort of an intermediary level that would handle (abstractly) the "action code" specification... so that a separate "back end" could generate low-level (C, Python, Net-C-variant) code (bindings?). I am somewhat interested/motivated to look at this issue, but am not interested in starting a duplicate path... would rather hitch up with others that have thought longer about the problems. I have no real experience in anything beyond serious yacc/lex hacking (no attribute grammar usage... only read about them)... I want the "tool" to be useful in my other interest, which has some keywordish/context sensitivity too... and would like to see the Python compiler world benefit as well. For example, ideas on how to allow parsing of a "[[ ]]+" language, where things are context dependent: FOR I=0:1:31,127 SET X="WRITE "_I XECUTE X IF I%5=0 WRITE ! which can equivalently be written: F I=0:1:31,127 S X="WRITE "_I X X I I%5=0 W ! In [ ] ([C O] below) pairing is: F I=0:1:31,127 S X="WRITE "_I X X I I%5=0 W ! [C OOOOOOOOOOOO] [C OOOOOOOOOOOO] [C O] [C OOOOO] [C O] TIA for any discussion/pointers/ideas! Cheers, --ldl From skip@mojam.com (Skip Montanaro) Wed Nov 1 03:11:56 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 31 Oct 2000 21:11:56 -0600 (CST) Subject: [Compiler-sig] Parser Options In-Reply-To: <200011010123.eA11NmR27092@LDL.HealthPartners.COM> References: <20001031170146.4F9471CFE3@dinsdale.python.org> <200011010123.eA11NmR27092@LDL.HealthPartners.COM> Message-ID: <14847.35324.488461.119829@beluga.mojam.com> ldl> Also, I find something more along the lines of an Earley ldl> algorithm quite interesting (recently found the Accent Compiler ldl> Compiler), so am looking at how that all works, in general. John Aycock's SPARK toolkit uses an Earley algorithm: http://www.csr.UVic.CA/~aycock/python/ -- Skip Montanaro (skip@mojam.com) http://www.mojam.com/ http://www.musi-cal.com/ From jeremy@alum.mit.edu Wed Nov 1 20:11:19 2000 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 1 Nov 2000 15:11:19 -0500 (EST) Subject: [Compiler-sig] changes to ast Message-ID: <14848.30951.662662.646264@bitdiddle.concentric.net> I made several checkins last week (or the week before), when SF did not seem to be sending email with the log messages. I thought I'd update briefly and do a little hand-wringing about making changes without consulting anyone. The key change was to replace the hand-edited ast.py module with an automatically generated one. The astgen module uses ast.txt -- both inside the compiler package -- to generate ast.py, which is still under CVS control. I think that's a relatively safe change. I'm a little concerned about the interface changes I made to ast nodes. I eliminated the _children tuple that each node had and eliminated the support for sequence access (node[0] to get type, node[1] to get first child node, etc) and updated all the code elsewhere to reflect that change. I also updated transformer to instantiate Node subclasses directly instead of going through the Node function to call the appropriate method. These changes made the code faster and, I hope, clearer, but I wonder how many people depended on the old sequence-style access protocol. If its eliminate causes problems for anyone, let me know soon; it can be restored if it's a serious issue for anyone. Jeremy From MarkH@ActiveState.com Wed Nov 1 22:06:09 2000 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 2 Nov 2000 09:06:09 +1100 Subject: [Compiler-sig] changes to ast In-Reply-To: <14848.30951.662662.646264@bitdiddle.concentric.net> Message-ID: > These changes made the code faster How much faster? Significantly? The speed of this is still my biggest issue, and may get around to looking at the C implemented compiler tools announced here (by the Wing people?) not too long ago... > but I wonder how many people depended on the old > sequence-style access protocol. .NET does, but I am happy to move to this new scheme - the code is definitely clearer without using node indexing... Mark. From jeremy@alum.mit.edu Wed Nov 1 22:28:44 2000 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 1 Nov 2000 17:28:44 -0500 (EST) Subject: [Compiler-sig] changes to ast In-Reply-To: References: <14848.30951.662662.646264@bitdiddle.concentric.net> Message-ID: <14848.39196.736684.639089@bitdiddle.concentric.net> >>>>> "MH" == Mark Hammond writes: >> These changes made the code faster MH> How much faster? Significantly? The speed of this is still my MH> biggest issue, and may get around to looking at the C MH> implemented compiler tools announced here (by the Wing people?) MH> not too long ago... I don't know if it is significantly faster or not. I wasn't guided by a specific goal; I just spent an afternoon seeing if the profiler showed anything obvious. There were a couple of hot spots that I fixed, but nothing dramatic. I actually haven't compared old times and new times. Do you have specific performances goals that could be addressed? What kind of tree does the Wing IDE code produce? I saw the announcement, but haven't had time to look at it. >> but I wonder how many people depended on the old sequence-style >> access protocol. MH> .NET does, but I am happy to move to this new scheme - the code MH> is definitely clearer without using node indexing... Ok. Jeremy From MarkH@ActiveState.com Wed Nov 1 22:50:25 2000 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 2 Nov 2000 09:50:25 +1100 Subject: [Compiler-sig] changes to ast In-Reply-To: <14848.39196.736684.639089@bitdiddle.concentric.net> Message-ID: [Jeremy] > Do you have specific performances goals that could be addressed? Nothing too specific ATM. However, my compiler is woefully slow. The profiler shows that roughly 1/2 the time is spent in COM (talking to .NET) and the other half is in the AST transformation code. I identified some hotspots in my code, but was still left with these rough ratios. I didn't really look into the AST code, just noted the fact I should ;-) Some of the time spent in COM will be Python's fault, but much of it will be .NET doing its thing, creating the DLL, doing its "assembly" step, etc - so the AST code would appear to offer the most potential. > What kind of tree does the Wing IDE code produce? I saw the > announcement, but haven't had time to look at it. I am in the exact same boat - I have no idea either. However, I should mention the big advantage to me of keeping the compiler in .py code is the potential for the compiler to compile itself. Once this happens, we can support eval, exec, etc. The more .py code the better for this goal. Mark. From jpe@archaeopteryx.com Wed Nov 1 23:06:01 2000 From: jpe@archaeopteryx.com (John Ehresman) Date: Wed, 1 Nov 2000 18:06:01 -0500 (EST) Subject: [Compiler-sig] changes to ast In-Reply-To: <14848.39196.736684.639089@bitdiddle.concentric.net> Message-ID: On Wed, 1 Nov 2000, Jeremy Hylton wrote: > What kind of tree does the Wing IDE code produce? I saw the > announcement, but haven't had time to look at it. The tree produced by the nodetransformer module in the parsetools package is similiar, but not identical to the tree produced by the transformer module. It can either be generated as a series of nested tuples, where the 1st item is the name of the node's class, or it can be generated as a series of class instances if the ast module is available. There are also a number of other, more substantial, differences, such as how assignments are encoded; rather than multiple varients of assigns for tuples, attribs, & so forth, there is a single assign class with right-hand-side & left-hand-side children. At one point, I was going to propose changes to the ast, but now I wonder if it would be simpler to mimic the Python transformer exactly -- because it's easier for me to document and justify the behavior ;). If I recall correctly, the 2 main hotspots in the python transformer were converting the parser tree to a tuple and transforming a test parse node in the common case where it's a simple atom. You might get a noticable improvement with a loop in the test node transform to call directly to the atom transform if all in between nodes have only one child. Converting the parse tree is harder to optimize; I did create a new node wrapper class which only converts to Python data types as needed, but if all tranformation is done in Python, the entire tree will need to be converted anyway. BTW: The Wing IDE does use this transformer internally, but there's no way to get at the parse tree directly with the IDE. The transformer and supporting modules are packaged up as the parsetools package at ftp://archaeopteryx.com/pub/parsetools/. John From tripp@perspex.com Mon Nov 27 08:05:51 2000 From: tripp@perspex.com (Tripp Lilley) Date: Mon, 27 Nov 2000 03:05:51 -0500 Subject: [Compiler-sig] Ugly hacks: modifying instance_getattr Message-ID: <3A2215DF.4AB8DFF@perspex.com> This isn't exactly a compilation issue, but it's somewhat related, and I didn't see an obvious "ugly hacks" or "interpreter devloper" list mentioned anywhere out front. I have modified my interpreter so that, in the LOAD_ATTR case statement, it peeks ahead in the code and looks to see if the next opcode is "CALL_FUNCTION". If so, I'd like to use slightly different getattr steps to resolve the attribute reference. For resolution of the call o.XXX( ), I'd like my getattr to use these steps: - if an attribute called __meth_XXX__ exists, return it. - if an attribute called __getmethod__ exists, call it to allow it to resolve the attribute. If it returns "None", continue looking for the attribute. - continue with "normal" attribute resolution semantics Basically, the idea is to be able to trap attribute accesses that are going to be immediately used as method invocations. Why? http://sourceforge.net/projects/selfish/ But that's another story. At this point, I can, more or less, determine the right "context" in which I want to apply these semantics. With the hack to eval_code2, I trap bytecode method invocations, and with a modification to PyObject_CallMethod, I trap C API method invocations. What I now need to do is pass that contextual "hint" down into the various flavours of getattr. What's the most "friendly" way of approaching that? Since I can't use default arguments, adding another parameter to getattrofunc would mean I'd have to modify all of the modules to pass the parameter. Yuck. I can't use a global variable because of thread safety issues (and because that's ugly and I refuse :) ). Is there some thread state to which I have access from within a getattrofunc? One disgusting possibility that occured to me was to modify the object being searched, temporarily replacing its tp_getattro member with a wrapper that would prepend the method semantics. I've temporarily shot that one down because it means I have to investigate what type of object it is, so I can apply the correct prepend (ie: a module might or might not support the "method" mechanisms I'm proposing). However, I'm willing to revisit that... The other alternative is that I'm doing Something I Shouldn't Be Doing(tm). -- Tripp Lilley * tripp@perspex.com * http://stargate.eheart.sg505.net/~tlilley/ ----------------------------------------------------------------------------- "This whole textual substitution thing is pissing me off. I feel like I'm programming in Tcl." - Eric Frias, former roommate, hacking partner extraordinaire From gstein@lyra.org Mon Nov 27 09:29:44 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 27 Nov 2000 01:29:44 -0800 Subject: [Compiler-sig] Ugly hacks: modifying instance_getattr In-Reply-To: <3A2215DF.4AB8DFF@perspex.com>; from tripp@perspex.com on Mon, Nov 27, 2000 at 03:05:51AM -0500 References: <3A2215DF.4AB8DFF@perspex.com> Message-ID: <20001127012944.A14107@lyra.org> Take a look at using Python's "metaclasses" feature: http://www.python.org/doc/essays/metaclasses/ While there is a lot there to get your head around, take a look at some of the examples. That could give you a quick peek into whether/how to do your "hook into a method call" gimmick. And you don't even have to modify the interpreter :-) Cheers, -g On Mon, Nov 27, 2000 at 03:05:51AM -0500, Tripp Lilley wrote: > This isn't exactly a compilation issue, but it's somewhat related, and I > didn't see an obvious "ugly hacks" or "interpreter devloper" list > mentioned anywhere out front. > > I have modified my interpreter so that, in the LOAD_ATTR case statement, > it peeks ahead in the code and looks to see if the next opcode is > "CALL_FUNCTION". If so, I'd like to use slightly different getattr steps > to resolve the attribute reference. > > For resolution of the call o.XXX( ), I'd like my getattr to use these > steps: > > - if an attribute called __meth_XXX__ exists, return it. > - if an attribute called __getmethod__ exists, call it to allow > it to resolve the attribute. If it returns "None", continue > looking for the attribute. > - continue with "normal" attribute resolution semantics > > Basically, the idea is to be able to trap attribute accesses that are > going to be immediately used as method invocations. Why? > > http://sourceforge.net/projects/selfish/ > > But that's another story. At this point, I can, more or less, determine > the right "context" in which I want to apply these semantics. With the > hack to eval_code2, I trap bytecode method invocations, and with a > modification to PyObject_CallMethod, I trap C API method invocations. > What I now need to do is pass that contextual "hint" down into the > various flavours of getattr. > > What's the most "friendly" way of approaching that? Since I can't use > default arguments, adding another parameter to getattrofunc would mean > I'd have to modify all of the modules to pass the parameter. Yuck. I > can't use a global variable because of thread safety issues (and because > that's ugly and I refuse :) ). Is there some thread state to which I > have access from within a getattrofunc? > > One disgusting possibility that occured to me was to modify the object > being searched, temporarily replacing its tp_getattro member with a > wrapper that would prepend the method semantics. I've temporarily shot > that one down because it means I have to investigate what type of object > it is, so I can apply the correct prepend (ie: a module might or might > not support the "method" mechanisms I'm proposing). However, I'm willing > to revisit that... > > The other alternative is that I'm doing Something I Shouldn't Be > Doing(tm). > > -- > Tripp Lilley * tripp@perspex.com * > http://stargate.eheart.sg505.net/~tlilley/ > ----------------------------------------------------------------------------- > "This whole textual substitution thing is pissing me off. > I feel like I'm programming in Tcl." > > - Eric Frias, former roommate, hacking partner extraordinaire > > _______________________________________________ > Compiler-sig mailing list > Compiler-sig@python.org > http://www.python.org/mailman/listinfo/compiler-sig -- Greg Stein, http://www.lyra.org/ From echuck@mindspring.com Mon Nov 27 16:03:31 2000 From: echuck@mindspring.com (echuck@mindspring.com) Date: Mon, 27 Nov 2000 11:03:31 -0500 Subject: [Compiler-sig] Re: [selfish-devel] Ugly hacks: modifying instance_getattr Message-ID: You have lost me. Why do you want to hack on "obj.foo()"? Tripp Lilley wrote: > This isn't exactly a compilation issue, but it's somewhat related, and I didn't see an obvious "ugly hacks" or "interpreter devloper" list mentioned anywhere out front. I have modified my interpreter so that, in the LOAD_ATTR case statement, it peeks ahead in the code and looks to see if the next opcode is "CALL_FUNCTION". If so, I'd like to use slightly different getattr steps to resolve the attribute reference. For resolution of the call o.XXX( ), I'd like my getattr to use these steps: - if an attribute called __meth_XXX__ exists, return it. - if an attribute called __getmethod__ exists, call it to allow it to resolve the attribute. If it returns "None", continue looking for the attribute. - continue with "normal" attribute resolution semantics Basically, the idea is to be able to trap attribute accesses that are going to be immediately used as method invocations. Why? http://sourceforge.net/projects/selfish/ But that's another story. At this point, I can, more or less, determine the right "context" in which I want to apply these semantics. With the hack to eval_code2, I trap bytecode method invocations, and with a modification to PyObject_CallMethod, I trap C API method invocations. What I now need to do is pass that contextual "hint" down into the various flavours of getattr. What's the most "friendly" way of approaching that? Since I can't use default arguments, adding another parameter to getattrofunc would mean I'd have to modify all of the modules to pass the parameter. Yuck. I can't use a global variable because of thread safety issues (and because that's ugly and I refuse :) ). Is there some thread state to which I have access from within a getattrofunc? One disgusting possibility that occured to me was to modify the object being searched, temporarily replacing its tp_getattro member with a wrapper that would prepend the method semantics. I've temporarily shot that one down because it means I have to investigate what type of object it is, so I can apply the correct prepend (ie: a module might or might not support the "method" mechanisms I'm proposing). However, I'm willing to revisit that... The other alternative is that I'm doing Something I Shouldn't Be Doing(tm). -- Tripp Lilley * tripp@perspex.com * http://stargate.eheart.sg505.net/~tlilley/ ----------------------------------------------------------------------------- "This whole textual substitution thing is pissing me off. I feel like I'm programming in Tcl." - Eric Frias, former roommate, hacking partner extraordinaire _______________________________________________ selfish-devel mailing list selfish-devel@lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/selfish-devel From tlilley@perspex.com Mon Nov 27 20:58:57 2000 From: tlilley@perspex.com (Tripp Lilley) Date: Mon, 27 Nov 2000 20:58:57 +0000 (/etc/localtime) Subject: [Compiler-sig] Re: [selfish-devel] Ugly hacks: modifying instance_getattr In-Reply-To: Message-ID: On Mon, 27 Nov 2000 echuck@mindspring.com wrote: > You have lost me. Why do you want to hack on "obj.foo()"? I refer you to our earlier discussion on selfish-devel, but to summarize for others' benefit (or anguish, as the case may be): I have two slots, "scalar" and "method", which are, respectively, a scalar value and a method (bound using the "new" hacks). I want to access those slots according to the conventions established in the Self language, namely without regard to whether they're implemented as simple attributes or as methods. Thus: # retrieve o.scalar o.scalar( ) o.method o.method( ) # set o.scalar = 42 o.scalar( 42 ) o.method = 25 o.method( 25 ) Taking first the case of the "scalar" slot: I want to define two attributes, one a simple value, one a method. When the slot "scalar" is retrieved as a simple attribute (ie: o.scalar), it will magically pop out of __dict__['scalar'] per normal Python getattr rules. On the other hand, when it's retrieved just prior to a CALL_FUNCTION bytecode (or by the PyObject_CallMethod call), it will try magically returning __dict__['__meth_scalar__'], a wrapper which handles "method" semantics for the slot. If that fails, it will try calling __getmethod__, and if that throws an AttributeError, it will fall back to normal semantics. All of this is predicated on the setting of __use_getmethod__ or somesuch in globals. Now, the case of the "method" slot: When called as a simple scalar (ie: o.method), this would be trapped by either __getattr__, or by the new __attr_method__ hook proposed in PEP 213. Either of those would simply execute the method in-place, returning the resolved value. When called as a method, the contents of __dict__['__meth_method__'] would be returned, which happen to be a normal, old-fashioned method object, which is called as normal. So, basically, it allows me to do away with the incredibly ugly slot-wrapper crap I'm using right now to implement attr/method opacity. For certain general cases, it's quite efficient. For other general cases, the inefficiency is masked by other inefficiencies that aren't avoidable. At least as far as I know right now :) -- Joy-Loving * Tripp Lilley * http://stargate.eheart.sg505.net/~tlilley/ ------------------------------------------------------------------------------ "There were other lonely singers / in a world turned deaf and blind Who were crucified for what they tried to show. Their voices have been scattered by the swirling winds of time, 'Cause the truth remains that no one wants to know." - Kris Kristofferson, "To Beat the Devil" From guido@python.org Mon Nov 27 20:51:35 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 27 Nov 2000 15:51:35 -0500 Subject: [Compiler-sig] Re: [selfish-devel] Ugly hacks: modifying instance_getattr In-Reply-To: Your message of "Mon, 27 Nov 2000 20:58:57 GMT." References: Message-ID: <200011272051.PAA26292@cj20424-a.reston1.va.home.com> > I have two slots, "scalar" and "method", which are, respectively, > a scalar value and a method (bound using the "new" hacks). I want > to access those slots according to the conventions established in > the Self language, namely without regard to whether they're > implemented as simple attributes or as methods. Thus: > > # retrieve > > o.scalar > o.scalar( ) > > o.method > o.method( ) > > > # set > > o.scalar = 42 > o.scalar( 42 ) > > o.method = 25 > o.method( 25 ) For "set", this is possible using the __setattr__ hook. But for "retrieve" it is impossible, and I strongly recommend against it. In Python bound methods are first-class objects and can be passed around just like function pointers. For example: l = [0,1,2,3] a = l.append a(4) a(5) print l # [0,1,2,3,4,5] Your hack would break this, and I object against calling the resulting language "Python". Instead, you can use __getattr__ to redirect any reference to o.scalar to a method call, so that you can use what you call scalar notation for method implementation. In my eyes, this is better than what you want! (Also note that the Python compiler-sig is really intended for discussions of new ways of compiling Python, not for discussions of the existing Python compiler.) --Guido van Rossum (home page: http://www.python.org/~guido/)