Modifying Grammar/grammar and other foul acts

Python-devs, I'm writing to you for some help in understanding the Python grammar. As an excuse to deep dive into Python's tokenizer / grammar, I decided (as a hideous, hideous joke) to want to allow braces where colons are allowed (as flow control). Starting from PEP 306 (and branch r311), I hacked on Grammar/Grammer As a first example: funcdef: ('def' NAME parameters ['->' test] ':' suite | 'def' NAME parameters ['->' test] '{' suite '}' ) I reran Parser/pgen and the dfa changes, but python (3.1) when recompiled, throws errors on things like: def a() { None } Strangely enough: lambdef: ( 'lambda' [varargslist] ':' test | 'lambda' [varargslist] '{' test '}' ) works fine! I this simplely some difference between "test" and "suite". I have tried tackling this with gdb, looking at err_input clearly isn't enough. (gdb) break err_input (gdb) break PyParser_ASTFromString import sys b = compile("def a() {pass}","sys.stdout","single") # yet a simple grammar fix is enough for this! c = compile("lambda x {None}","sys.stdout","single") I'm in over my head! Any insights / help would be appreciated. Full-on flaming is also appropriate, but would be less appreciated. Specific questions 1.) I assume the Grammar/grammar is read top to bottom. Confirm? 2.) More help figuring out how to debug what python *thinks* it's seeing when it see "def a() {pass}". It's not getting to the ast construction stage, as near as I can tell. What additional breakpoints can I set to see where it's failing. Gregg L.

1.) I assume the Grammar/grammar is read top to bottom. Confirm?
Confirm - but this is not surprising: *any* source file is typically read from top to bottom. Randoma access reading is typically done for binary files, only. So you must be asking something else, but I can't guess what that might be.
2.) More help figuring out how to debug what python *thinks* it's seeing when it see "def a() {pass}". It's not getting to the ast construction stage, as near as I can tell. What additional breakpoints can I set to see where it's failing.
Enable the D() debugging in parser.c (configure with --with-pydebug, and set PYTHONDEBUG). Regards, Martin

Sorry, re: question one, forgive the ill-formed question. I meant more, are the parser rules applied "first matching". Essentially trying to confirm that the parser is "top down" or "bottom up" or whether or not it even matters. Thanks for the tip -- it seems to be exactly what I want. To make it explicit, this seems to be fuller (unix) recipe for how to make this style of debugging happen. $ ./configure --with-pydebug $ make $ set PYTHONDEBUG=1 $ ./python -d # then this shows the parsing info On Sat, Mar 6, 2010 at 10:56 AM, "Martin v. Löwis" <martin@v.loewis.de>wrote:
1.) I assume the Grammar/grammar is read top to bottom. Confirm?
Confirm - but this is not surprising: *any* source file is typically read from top to bottom. Randoma access reading is typically done for binary files, only.
So you must be asking something else, but I can't guess what that might be.
2.) More help figuring out how to debug what python *thinks* it's seeing when it see "def a() {pass}". It's not getting to the ast construction stage, as near as I can tell. What additional breakpoints can I set to see where it's failing.
Enable the D() debugging in parser.c (configure with --with-pydebug, and set PYTHONDEBUG).
Regards, Martin

On Sat, Mar 6, 2010 at 10:26 AM, Gregg Lind <gregg.lind@gmail.com> wrote:
Sorry, re: question one, forgive the ill-formed question. I meant more, are the parser rules applied "first matching". Essentially trying to confirm that the parser is "top down" or "bottom up" or whether or not it even matters.
That's not how it works at all. I can't explain it in a few words -- but any text on LL(1) parsing should clarify this. The parser uses no backtracking and a 1-token lookahead. The only unusual thing is that individual rules use a regex-like notation, but that is all converted to a DFA. If one token is not enough to know which path to take through the DFA (this may invoke another rule -- but you always know which one) you're hosed. I suspect you've introduced ambiguities, though I don't immediately see where (they could be in the combination of different rules). Another possibility is that you may be running into problems where the parser expects a newline at the end of a suite. (FWIW since you're not proposing a language change, this is technically off-topic for python-dev. :-) --Guido
Thanks for the tip -- it seems to be exactly what I want. To make it explicit, this seems to be fuller (unix) recipe for how to make this style of debugging happen.
$ ./configure --with-pydebug $ make $ set PYTHONDEBUG=1 $ ./python -d # then this shows the parsing info
On Sat, Mar 6, 2010 at 10:56 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
1.) I assume the Grammar/grammar is read top to bottom. Confirm?
Confirm - but this is not surprising: *any* source file is typically read from top to bottom. Randoma access reading is typically done for binary files, only.
So you must be asking something else, but I can't guess what that might be.
2.) More help figuring out how to debug what python *thinks* it's seeing when it see "def a() {pass}". It's not getting to the ast construction stage, as near as I can tell. What additional breakpoints can I set to see where it's failing.
Enable the D() debugging in parser.c (configure with --with-pydebug, and set PYTHONDEBUG).
Regards, Martin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

Gregg Lind wrote:
Sorry, re: question one, forgive the ill-formed question. I meant more, are the parser rules applied "first matching". Essentially trying to confirm that the parser is "top down" or "bottom up" or whether or not it even matters.
I think pgen would complain if you had two rules that could both match at the same time, so the question doesn't apply. BTW, be careful with terminology here -- the terms "top down" and "bottom up" have a different meaning in parsing theory (they refer to the way the parse tree is built, not the order of matching in a list of rules). -- Greg

On Sat, Mar 6, 2010 at 11:27 AM, Gregg Lind <gregg.lind@gmail.com> wrote:
Python-devs,
I'm writing to you for some help in understanding the Python grammar. As an excuse to deep dive into Python's tokenizer / grammar, I decided (as a hideous, hideous joke) to want to allow braces where colons are allowed (as flow control).
Starting from PEP 306 (and branch r311), I hacked on Grammar/Grammer
As a first example:
funcdef: ('def' NAME parameters ['->' test] ':' suite | 'def' NAME parameters ['->' test] '{' suite '}' )
I reran Parser/pgen and the dfa changes, but python (3.1) when recompiled, throws errors on things like:
def a() { None }
Strangely enough:
lambdef: ( 'lambda' [varargslist] ':' test | 'lambda' [varargslist] '{' test '}' )
works fine! I this simplely some difference between "test" and "suite".
I have tried tackling this with gdb, looking at err_input clearly isn't enough.
(gdb) break err_input (gdb) break PyParser_ASTFromString import sys b = compile("def a() {pass}","sys.stdout","single") # yet a simple grammar fix is enough for this! c = compile("lambda x {None}","sys.stdout","single")
I'm in over my head!
You don't say what errors occur when you try to compile strings in your new language. You may have changed the Grammar, which allows you to tokenize the input. That isn't enough to get the input to compile. You also need to change the compiler to understand the new tokens. Jeremy
Any insights / help would be appreciated. Full-on flaming is also appropriate, but would be less appreciated.
Specific questions
1.) I assume the Grammar/grammar is read top to bottom. Confirm? 2.) More help figuring out how to debug what python *thinks* it's seeing when it see "def a() {pass}". It's not getting to the ast construction stage, as near as I can tell. What additional breakpoints can I set to see where it's failing.
Gregg L.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu

Am 09.03.2010 14:42, schrieb Jeremy Hylton:
On Sat, Mar 6, 2010 at 11:27 AM, Gregg Lind <gregg.lind@gmail.com> wrote:
Python-devs,
I'm writing to you for some help in understanding the Python grammar. As an excuse to deep dive into Python's tokenizer / grammar, I decided (as a hideous, hideous joke) to want to allow braces where colons are allowed (as flow control).
Starting from PEP 306 (and branch r311), I hacked on Grammar/Grammer
As a first example:
funcdef: ('def' NAME parameters ['->' test] ':' suite | 'def' NAME parameters ['->' test] '{' suite '}' )
I reran Parser/pgen and the dfa changes, but python (3.1) when recompiled, throws errors on things like:
def a() { None }
Strangely enough:
lambdef: ( 'lambda' [varargslist] ':' test | 'lambda' [varargslist] '{' test '}' )
works fine! I this simplely some difference between "test" and "suite".
I have tried tackling this with gdb, looking at err_input clearly isn't enough.
(gdb) break err_input (gdb) break PyParser_ASTFromString import sys b = compile("def a() {pass}","sys.stdout","single") # yet a simple grammar fix is enough for this! c = compile("lambda x {None}","sys.stdout","single")
I'm in over my head!
You don't say what errors occur when you try to compile strings in your new language. You may have changed the Grammar, which allows you to tokenize the input. That isn't enough to get the input to compile. You also need to change the compiler to understand the new tokens.
In particular, many AST creation functions check for specific counts of children on many nodes. I haven't checked, but in the case of the "funcdef" rule, it may check for either 7 or 5 children to determine whether the optional return annotation ['->' test] is present. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Thank you for the advice everyone. This seed has finally born (rotten) fruit at: http://writeonly.wordpress.com/2010/04/01/whython-python-for-people-who-hate... http://bitbucket.org/gregglind/python-whython3k/ On Fri, Mar 12, 2010 at 4:13 AM, Georg Brandl <g.brandl@gmx.net> wrote:
Am 09.03.2010 14:42, schrieb Jeremy Hylton:
On Sat, Mar 6, 2010 at 11:27 AM, Gregg Lind <gregg.lind@gmail.com> wrote:
Python-devs,
I'm writing to you for some help in understanding the Python grammar. As an excuse to deep dive into Python's tokenizer / grammar, I decided (as a hideous, hideous joke) to want to allow braces where colons are allowed (as flow control).
Starting from PEP 306 (and branch r311), I hacked on Grammar/Grammer
As a first example:
funcdef: ('def' NAME parameters ['->' test] ':' suite | 'def' NAME parameters ['->' test] '{' suite '}' )
I reran Parser/pgen and the dfa changes, but python (3.1) when recompiled, throws errors on things like:
def a() { None }
Strangely enough:
lambdef: ( 'lambda' [varargslist] ':' test | 'lambda' [varargslist] '{' test '}' )
works fine! I this simplely some difference between "test" and "suite".
I have tried tackling this with gdb, looking at err_input clearly isn't enough.
(gdb) break err_input (gdb) break PyParser_ASTFromString import sys b = compile("def a() {pass}","sys.stdout","single") # yet a simple grammar fix is enough for this! c = compile("lambda x {None}","sys.stdout","single")
I'm in over my head!
You don't say what errors occur when you try to compile strings in your new language. You may have changed the Grammar, which allows you to tokenize the input. That isn't enough to get the input to compile. You also need to change the compiler to understand the new tokens.
In particular, many AST creation functions check for specific counts of children on many nodes. I haven't checked, but in the case of the "funcdef" rule, it may check for either 7 or 5 children to determine whether the optional return annotation ['->' test] is present.
Georg
-- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/gregg.lind%40gmail.com

Gregg Lind wrote:
Thank you for the advice everyone. This seed has finally born (rotten) fruit at:
http://writeonly.wordpress.com/2010/04/01/whython-python-for-people-who-hate... http://bitbucket.org/gregglind/python-whython3k/
Nicely done :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
participants (7)
-
"Martin v. Löwis"
-
Georg Brandl
-
Greg Ewing
-
Gregg Lind
-
Guido van Rossum
-
Jeremy Hylton
-
Nick Coghlan