On Sat, 28 Apr 2007 02:50:38 +0200, Leonardo Santagada <santagada@gmail.com> wrote:
Now about semicolons, how should I deal with them? in the spec the grammar doesn't deal with them and in the mozilla one I don't see how they are doing it also. As we have set that as the parsing module works today it is not possible to do automatic semicolon insertion, can we do "forced semicolon presence" as seen on C and Java? (some lightbulb just lightened up here, maybe I should look for the grammar of any of those two languages)
This is the biggest and hardest problem about js parsing. The spec does define how to handle it, I'm not sure now how the grammar reflects that though. Looking at C doesn't help, because there it always needs to be present and can't be replaced with newlines. The problem with a js
is, that newlines aren't really whitespace, just like in python. But
Florian Schulze wrote: parser the
rules are weird, because newlines are only sometimes relevant, not everytime. A js parser which doesn't handle this correctly is in my opinion just wrong. You couldn't parse any real world javascript with it.
The part of the spec that describes the automatic semicolon insertion is completely silly, in my opinion. It goes something like this: When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true: ... This is completely crazy, because it effectively forces you to write a parser using a left-to-right parsing technique and also a parser that works by doing exactly one token lookahead. The second part is what makes packrat parsing fail, since it uses arbitrary many tokens lookahead, so you cannot really determine what an "offending token" is since you cannot distinguish it from normal backtracking. I don't see how you can fix that, really. Now you have basically two choices: you can change force all semicolons to be inserted, which makes most code out there not parse. The other one is more brainstormy-like, it does not work as I describe but maybe someone has an idea to get it to work: you could change the grammar to be very lenient with semicolons (at least for a packrat parser this might be easy) meaning that it will programs as valid that existing Javascript engines will reject. Something like this: a = b c = d would be valid. This opens its own set of problems such as: a = b ++ c Which would most likely be parsed to be equivalent to: a = b++; c; Whereas with the spec it is: a = b; ++c; No clue how to fix that, yet. Cheers, Carl Friedrich