New Javascript parser in the works
I started development of the javascript parser, based on the pythongrammar.txt file on parsing test dir. The thing is still in its infancy, but I just wanted to say that I am working on it, and would love any help. It is living in users/santagada/newjsparser for now... will move to js when it is at least kind of working. About the parsing lib I have some questions (should be asking cfbolz but I don't seem to find him on irc lately): * ParsingError should print something like lineno: self.lineno+1 collum: self.collum+1 (I know there is a pretty error function, but still) * IGNORE and friends, how do I make then work? * I don't know why when it recognizes something (or I think it should) it just gives me list index out of bounds (I will check what is happening, but if someone know why it would be better). About the grammar: * based on python grammar, so automatic semicolon insertion is OFF. You have optional semicolons at the end of lines though * No regex literals (ahh someday maybe, i'm not really interested in them) * Lot's of work to be done still :) -- Leonardo Santagada santagada@gmail.com
Hi Leonardo! Leonardo Santagada wrote:
I started development of the javascript parser, based on the pythongrammar.txt file on parsing test dir.
This is the point that does not really make sense to me already. You should base your grammar on the official (Mozilla) one, otherwise you will end up chasing and understanding strange corner cases. Also it makes updating the grammar much easier for future versions of the language. If you don't want to (or can't, for some reason) use the official grammar directly, you should use at least parts of it.
The thing is still in its infancy, but I just wanted to say that I am working on it, and would love any help. It is living in users/santagada/newjsparser for now... will move to js when it is at least kind of working.
I looked quickly at the directory, my comment is that you should really write tests. Otherwise you will get regressions when you change the grammar. As you know, PyPy is very strict about untested code, and an untested grammar is not really a good plan either. Also, the tests would have told me what you expect to work, in the current state it is completely worthless to me since I have no clue what should work and what not.
About the parsing lib I have some questions (should be asking cfbolz but I don't seem to find him on irc lately):
Sorry, I am enjoying my free time too much currently and don't want to be online consistently. Mail is fine for such a technical discussion anyway (and it has the additional benefit that I can answer you during this boring lecture I am sitting in now).
* ParsingError should print something like lineno: self.lineno+1 collum: self.collum+1 (I know there is a pretty error function, but still)
There is no class called ParsingErro and ParseError (which exists) does not have a lineno attribute, so what exactly do you mean? If it is only a cosmetic change, just fix it yourself, I will look at it. Note that the line and column numbers of the parse errors are guesses only, there is no way to pinpoint the exact error location with PEGs.
* IGNORE and friends, how do I make then work?
What friends? IGNORE itself is a bit documented, so read the docs (and the tests). The idea is that IGNORE is a regular expression of things that should be ignored in the input (probably comments and whitespace).
* I don't know why when it recognizes something (or I think it should) it just gives me list index out of bounds (I will check what is happening, but if someone know why it would be better).
Check in a failing test (skipped or not, it's your directory anyway), otherwise I have no clue what you are talking about.
About the grammar: * based on python grammar, so automatic semicolon insertion is OFF. You have optional semicolons at the end of lines though * No regex literals (ahh someday maybe, i'm not really interested in them)
Regex literals will only be possible after refactoring the parser generator, I fear.
* Lot's of work to be done still :)
Yip. Cheers, Carl Friedrich
Em 24/04/2007, às 05:58, Carl Friedrich Bolz escreveu:
Hi Leonardo!
Leonardo Santagada wrote:
I started development of the javascript parser, based on the pythongrammar.txt file on parsing test dir.
This is the point that does not really make sense to me already. You should base your grammar on the official (Mozilla) one, otherwise you will end up chasing and understanding strange corner cases. Also it makes updating the grammar much easier for future versions of the language. If you don't want to (or can't, for some reason) use the official grammar directly, you should use at least parts of it.
I've tried to start, converted the number literals, but somehow they are not working, can you try it and help me to understand how do I find and fix this kind of bugs (i'm a fast learner, antonio showed me how to find then when translating rpython code and now I can find and fix bugs miself).
The thing is still in its infancy, but I just wanted to say that I am working on it, and would love any help. It is living in users/santagada/newjsparser for now... will move to js when it is at least kind of working.
I looked quickly at the directory, my comment is that you should really write tests. Otherwise you will get regressions when you change the grammar. As you know, PyPy is very strict about untested code, and an untested grammar is not really a good plan either. Also, the tests would have told me what you expect to work, in the current state it is completely worthless to me since I have no clue what should work and what not.
Done man, now I will be writing tons of tests, I just need to get the first ones to work. They are in test_parser.py
About the parsing lib I have some questions (should be asking cfbolz but I don't seem to find him on irc lately):
Sorry, I am enjoying my free time too much currently and don't want to be online consistently. Mail is fine for such a technical discussion anyway (and it has the additional benefit that I can answer you during this boring lecture I am sitting in now).
perfect.
* ParsingError should print something like lineno: self.lineno+1 collum: self.collum+1 (I know there is a pretty error function, but still)
There is no class called ParsingErro and ParseError (which exists) does not have a lineno attribute, so what exactly do you mean? If it is only a cosmetic change, just fix it yourself, I will look at it. Note that the line and column numbers of the parse errors are guesses only, there is no way to pinpoint the exact error location with PEGs.
It has a sourcepos and that has a lineno... but it is all working ok on test_parse.py
* IGNORE and friends, how do I make then work?
What friends? IGNORE itself is a bit documented, so read the docs (and the tests). The idea is that IGNORE is a regular expression of things that should be ignored in the input (probably comments and whitespace).
Read the documentation, I think that there is not much point in separating pure regexes from the other actions (at least if I understand correctly in real PEGs there isn't such separations). But I am living with then, and man great docs!
* I don't know why when it recognizes something (or I think it should) it just gives me list index out of bounds (I will check what is happening, but if someone know why it would be better).
Check in a failing test (skipped or not, it's your directory anyway), otherwise I have no clue what you are talking about.
There is some failing tests now
About the grammar: * based on python grammar, so automatic semicolon insertion is OFF. You have optional semicolons at the end of lines though * No regex literals (ahh someday maybe, i'm not really interested in them)
Regex literals will only be possible after refactoring the parser generator, I fear.
I don't like then anyway so thats ok.
* Lot's of work to be done still :)
Yip.
Cheers,
Carl Friedrich
Thanks and thanks in advance, -- Leonardo Santagada santagada@gmail.com
Hi Leonardo! Leonardo Santagada wrote:
Done man, now I will be writing tons of tests, I just need to get the first ones to work. They are in test_parser.py
I just looked at the current state of affairs, and while the grammar seems to be getting more complete, there is still only a couple of tests, and those that are there are failing. Ok, it's your time but I can't say I like this approach. Just looking at the grammar made me note the following problem: Something like the following is wrong: additiveexpression : multiplicativeexpression "+" additiveexpression | multiplicativeexpression "-" additiveexpression | multiplicativeexpression ; The problem is that something like "5 - 3 - 3" with this grammar will give you a parse tree that essentially looks like the following: additiveexpression /|\ / | \ 5 - additiveexpression /|\ / | \ 3 - 3 And evaluating this tree gives 5, instead of the correct -1. Unfortunately solving this is not trivial. In theory the grammar rule should look like this: additiveexpression : additiveexpression "+" multiplicativeexpression | additiveexpression "-" multiplicativeexpression | multiplicativeexpression ; which does not work, since then you have a left recursion, which is not supported. In theory there could be a nice grammar transformer that removes the left recursion for you (which is simple and always possible) together with a nice tree transformer that makes the tree look correctly (which is not so simple). Until then, you can use the following workaround: additiveexpression : multiplicativeexpression "+" >additiveexpression< | multiplicativeexpression "- >additiveexpression< | <multiplicativeexpression> ; which will lead to a wrong tree originally, but the tranformer will tranform the tree into this: additiveexpression / | | | \ / | | | \ 5 - 3 - 3 which is not as nice, since additiveexpression has more children than only three but at least it's easy to find out what is correct. You have to make sure then that when you transform the tree into the nodes you use for interpretation that you do the right thing then. Oh, another small thing: would you mind not using tabs in the grammar? I know it is not a python file, but somehow the number of spaces per tab is different between your and my editor, so it makes it a bit annoying for me to look at. Do you have local changes? Otherwise I could do a tab-removing checkin. Cheers, Carl Friedrich
Carl Friedrich Bolz wrote:
Just looking at the grammar made me note the following problem: [snip]
hm, now that I found this problem, which grammar are you using exactly? The one at http://www.mozilla.org/js/language/js20/formal/parser-grammar.html gives the rules for addition correctly: AdditiveExpression ==> MultiplicativeExpression | AdditiveExpression + MultiplicativeExpression | AdditiveExpression - MultiplicativeExpression or did you just rewrite it incorrectly? Cheers, Carl Friedrich
participants (2)
-
Carl Friedrich Bolz
-
Leonardo Santagada