Is pyparsing really a recursive descent parser?
Neil Cerutti
horpner at yahoo.com
Sat Nov 3 15:00:36 EDT 2007
On 2007-11-03, Paul McGuire <ptmcg at austin.rr.com> wrote:
> On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality"
><ihates... at hotmail.com> wrote:
>> It has recursion in it but that's not sufficient to call it a recursive
>> descent parser any more than having a recursive implementation of the
>> factorial function. The important part is what it recurses through...
>
><snip>
>
>> In my opinion, the rule set I mentioned in my original post:
>>
>> grammar = OneOrMore(Word(alphas)) + Literal('end')
>>
>> ...should be translated (internally) to something like this:
>>
>> word = Word(alphas)
>> grammar = Forward()
>> grammar << ((word + grammar) | (word + Literal(end)))
>>
>> This allows a recursive search to find the string correct without any
>> need for "backtracking," if I understand what you mean by that. Couldn't
>> pyparsing have done something like this?
>>
>
> Dear JAVotAM -
>
> This was a great post! (Although I'm not sure the comment in the
> first paragraph was entirely fair - I know the difference between
> recursive string parsing and recursive multiplication.)
>
> You really got me thinking more about how this recursion actually
> behaves, especially with respect to elements such as OneOrMore. Your
> original question is really quite common, and up until now, my stock
> answer has been to use negative lookahead. The expression you posted
> is the first time I've seen this solution, and it *does* work.
>
> I was all set to write a cocky response on why your expression
> wouldn't work. I've seen it many times before, where people (usually
> coming from EBNF experience) implement listOfXs = OneOrMore(x) as:
>
> listOfXs = Forward()
> listOfXs << ( x + listOfXs | x )
>
> Actually, what they usually write is:
>
> listOfXs << ( listOfXs + x )
>
> but this sends pyparsing into a recursive tailspin.
>
> So I fired up SciTE and copy/pasted your code into the editor and ran
> it, and it worked just fine - this was a shock! I studied this for a
> few minutes, and then realized what was happening. First of all, I
> misread what you posted. You posted this:
>
> grammar << ((word + grammar) | (word + Literal(end)))
>
> which works. I *thought* you posted this:
>
> grammar << ((word + grammar) | word) + Literal(end)
>
> which doesn't work. In fact this behaves the same as your original
> post, except it iterates over the input string's words recursively,
> vs. repetition ins a for-loop, as is done by OneOrMore.
>
> So going back to your working version, I had to see why it works.
> Initially, the first term in the MatchFirst (the '|' operator creates
> MatchFirst instances) is just the same, and by grammar referencing
> itself, it just goes word by word through the input trying to find a
> match. I'll try to visualize the steps:
>
> level "First Second Third end"
> 1 word grammar
> 2 word grammar
> 3 word grammar
> 4 word grammar <- fails!
> 4 word end <- fails!
> (therefore level 3 grammar fails!)
> 3 word end <-- success!!!
>
> grammar has 2 options: to match a word followed by a grammar, or to
> match a word followed by 'end'. At 4 levels deep into the Forward's
> recursion, the first option fails, because after parsing "end" as the
> initial word, there is no more text to try to match against grammar.
> Level 4's Forward then also tries to match a word followed by 'end',
> but this fails for the same reason. So at this point, the 4th level
> Forward fails to match either of its options, so it throws its
> exception back up to level 3, indicating that the first alternative,
> word followed by grammar, failed. Level 3 then moves on to see if
> word followed by the literal 'end' matches, and it does - success!
>
> Here's where I am stuck now. In the original grammar that you posted,
> which you want to render into this behavior, grammar is defined as:
>
> grammar = OneOrMore(Word(alphas)) + Literal('end')
Is there not an ambiguity in the grammar?
In EBNF:
goal --> WORD { WORD } END
WORD is '[a-zA-Z]+'
END is 'end'
I think it is fine that PyParsing can't guess what the composer
of that grammar meant.
--
Neil Cerutti
More information about the Python-list
mailing list