Is pyparsing really a recursive descent parser?

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Sat Nov 3 20:41:28 EDT 2007


"Neil Cerutti" <horpner at yahoo.com> wrote in message 
news:oz3Xi.39812$G23.23308 at newsreading01.news.tds.net...
> On 2007-11-03, Paul McGuire <ptmcg at austin.rr.com> wrote:
>> On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality"
>><ihates... at hotmail.com> wrote:
>>>     It has recursion in it but that's not sufficient to call it a 
>>> recursive
>>> descent parser any more than having a recursive implementation of the
>>> factorial function.  The important part is what it recurses through...
>>
>><snip>
>>
>>> In my opinion, the rule set I mentioned in my original post:
>>>
>>> grammar = OneOrMore(Word(alphas)) + Literal('end')
>>>
>>>     ...should be translated (internally) to something like this:
>>>
>>> word = Word(alphas)
>>> grammar = Forward()
>>> grammar << ((word + grammar) | (word + Literal(end)))
>>>
>>>     This allows a recursive search to find the string correct without 
>>> any
>>> need for "backtracking," if I understand what you mean by that. 
>>> Couldn't
>>> pyparsing have done something like this?
>>>
>>
>> Dear JAVotAM -
>>
>> This was a great post!  (Although I'm not sure the comment in the
>> first paragraph was entirely fair - I know the difference between
>> recursive string parsing and recursive multiplication.)
>>
>> You really got me thinking more about how this recursion actually
>> behaves, especially with respect to elements such as OneOrMore.  Your
>> original question is really quite common, and up until now, my stock
>> answer has been to use negative lookahead.  The expression you posted
>> is the first time I've seen this solution, and it *does* work.
>
> Is there not an ambiguity in the grammar?
>
> In EBNF:
>
>  goal --> WORD { WORD } END
>
>  WORD is '[a-zA-Z]+'
>  END is 'end'
>
> I think it is fine that PyParsing can't guess what the composer
> of that grammar meant.

    First, I don't know if that constitutes an ambiguity in the grammar. 
'end' is a word but it's unambiguous that this grammar must end in a literal 
'end'.  You could interpret the input as just a sequence of words or you 
could interpret it as a sequence of words terminated by the word 'end'.  One 
interpretation conforms to the grammar while the other doesn't.  You would 
assume that the interpretation that agrees with the grammar would be the 
preferable choice and so should the program...
    Secondly, even if it is an ambiguity... so what?  pyparsing's current 
behaviour is to return a parse error, pretending that the string can't be 
parsed.  Ideally, perhaps it should alert you to the ambiguity but, surely, 
it's better to return _a_ valid parsing than to pretend that the string 
can't be parsed at all...








More information about the Python-list mailing list