Is pyparsing really a recursive descent parser?

Mon Nov 5 06:04:18 EST 2007

On 2007-11-05, Just Another Victim of the Ambient Morality
<ihatespam at hotmail.com> wrote:
> "Neil Cerutti" <horpner at yahoo.com> wrote in message 
> news:6xvXi.39855$G23.18679 at newsreading01.news.tds.net...
>> There are different kinds of recursion. Compare:
>
>     While interesting, none of this actually addresses the
> point I was making.  I wasn't saying that there was no
> recursion (at least, not in this paragraph), I was saying that
> it wasn't recursing through what I thought it should be
> recursing through.  It recurses through a set of rules without
> any regard to how these rules interact with each other.  That's
> why it fails to parse valid strings.  In my opinion, it should
> recurse through appropriate combinations of rules to determine
> validity, rather than by arbitrary 
> categorization...

OK. I thought you were saying that, without backtracking there's
no recursion. Never mind!

>     I guess that all the And and Or class in pyparsing call
> methods of each other from each other, even if they are doing
> so from different instantiations.  I still say they're not
> recursing through the right things...

Backtracking seems orthoganal to recursion, to me.

>>>> I'm not sure one needs to start again with a naive approach just to
>>>> avoid any parser theory. For a user of a parser it is quite important
>>>> whether she has to wait 50 seconds for a parse to run or 50
>>>> milliseconds. I don't like to compromise speed for implementation
>>>> simplicity here.
>>>
>>>     Finally, I can't believe you complain about potential speed
>>> problems. First, depending on the size of the string, it's
>>> likely to be the difference between 2ms and 200ms.  Secondly,
>>> if speed were an issue, you wouldn't go with a recursive
>>> descent parser.  You'd go with LALR or the many other parsing
>>> techniques available.  Recursive descent parsing is for those
>>> situations where you need correctness, regardless of execution
>>> time.  These situations happen...
>>
>> RDP is plenty fast; speed has never been one of it's
>> disadvantages, as far as I know. Today there are many
>> excellent parser generators and compiler builders that compose an
>> RDP under the hood, e.g., Antlr and Gentle.
>
>     I think I read somewhere that LALR was O(n) while RDP was
> O(e^n).  Most people would consider that, at least, slower...

To my knowledge, most RDPs are LL(1), which is O(n). As you
increase the amount of lookahead (a backtracking RDP is, I
suppose, be a brute-force way of getting infinite lookahead), the
complexity increases.

>     I think your examples may exemplify how little speed
> matters rather than how fast RDPs are...
>
>     Also, it should probably be noted that bubble sort is very
> fast for nearly sorted lists; much faster than quicksort.  So,
> while it shouldn't be the default sorting algorithm, you could
> have it somewhere in the library...

Yes, bubble-sort runs fast in certain circumstances; you wouldn't
want to be bereft of it completely. Backtracking parsers do
probably have their place in the pantheon. I don't want
PyParsing to do backtracking by default, though it might be a
useful option.

-- 
Neil Cerutti