Is pyparsing really a recursive descent parser?
Neil Cerutti
horpner at yahoo.com
Mon Nov 5 06:04:18 EST 2007
On 2007-11-05, Just Another Victim of the Ambient Morality
<ihatespam at hotmail.com> wrote:
> "Neil Cerutti" <horpner at yahoo.com> wrote in message
> news:6xvXi.39855$G23.18679 at newsreading01.news.tds.net...
>> There are different kinds of recursion. Compare:
>
> While interesting, none of this actually addresses the
> point I was making. I wasn't saying that there was no
> recursion (at least, not in this paragraph), I was saying that
> it wasn't recursing through what I thought it should be
> recursing through. It recurses through a set of rules without
> any regard to how these rules interact with each other. That's
> why it fails to parse valid strings. In my opinion, it should
> recurse through appropriate combinations of rules to determine
> validity, rather than by arbitrary
> categorization...
OK. I thought you were saying that, without backtracking there's
no recursion. Never mind!
> I guess that all the And and Or class in pyparsing call
> methods of each other from each other, even if they are doing
> so from different instantiations. I still say they're not
> recursing through the right things...
Backtracking seems orthoganal to recursion, to me.
>>>> I'm not sure one needs to start again with a naive approach just to
>>>> avoid any parser theory. For a user of a parser it is quite important
>>>> whether she has to wait 50 seconds for a parse to run or 50
>>>> milliseconds. I don't like to compromise speed for implementation
>>>> simplicity here.
>>>
>>> Finally, I can't believe you complain about potential speed
>>> problems. First, depending on the size of the string, it's
>>> likely to be the difference between 2ms and 200ms. Secondly,
>>> if speed were an issue, you wouldn't go with a recursive
>>> descent parser. You'd go with LALR or the many other parsing
>>> techniques available. Recursive descent parsing is for those
>>> situations where you need correctness, regardless of execution
>>> time. These situations happen...
>>
>> RDP is plenty fast; speed has never been one of it's
>> disadvantages, as far as I know. Today there are many
>> excellent parser generators and compiler builders that compose an
>> RDP under the hood, e.g., Antlr and Gentle.
>
> I think I read somewhere that LALR was O(n) while RDP was
> O(e^n). Most people would consider that, at least, slower...
To my knowledge, most RDPs are LL(1), which is O(n). As you
increase the amount of lookahead (a backtracking RDP is, I
suppose, be a brute-force way of getting infinite lookahead), the
complexity increases.
> I think your examples may exemplify how little speed
> matters rather than how fast RDPs are...
>
> Also, it should probably be noted that bubble sort is very
> fast for nearly sorted lists; much faster than quicksort. So,
> while it shouldn't be the default sorting algorithm, you could
> have it somewhere in the library...
Yes, bubble-sort runs fast in certain circumstances; you wouldn't
want to be bereft of it completely. Backtracking parsers do
probably have their place in the pantheon. I don't want
PyParsing to do backtracking by default, though it might be a
useful option.
--
Neil Cerutti
More information about the Python-list
mailing list