Is pyparsing really a recursive descent parser?

Sun Nov 4 21:49:38 EST 2007

On 2007-11-05, Just Another Victim of the Ambient Morality <ihatespam at hotmail.com> wrote:
>
> "Kay Schluehr" <kay.schluehr at gmx.net> wrote in message 
> news:1194223837.104424.242360 at o3g2000hsb.googlegroups.com...
>> On Nov 4, 10:44 pm, "Just Another Victim of the Ambient Morality"
>> <ihates... at hotmail.com>
>>
>>> I believe there is a cure and it's called recursive descent parsing.
>>> It's slow, obviously, but it's correct and, sometimes (arguably, often),
>>> that's more important the execution speed.
>>
>> Recursive decendent parsing is not necessarily slow but from your
>> remarks above I infer you want a general RD parser with backtracking:
>> when one rule doesn't match, try another one to derive the current
>> symbol in the input stream.
>
>     I think I've just discovered a major hurdle in my understand of the 
> problem.
>     You keep saying "with backtracking."  Why?  Isn't "backtracking" 
> inherent in recursion?  So, why can't these alleged "recursive descent 
> parsers" find valid parsings?  How are they not already backtracking?  What 
> was the point of being recursive if not to take advantage of the inherent 
> backtracking in it?
>     Obviously, these parsers aren't recursing through what I think they 
> should be recursing.  The question is "why not?"

There are different kinds of recursion. Compare:

  def fac1(x, y=1):
    """ Compute factorials with a recursive function (it calls
    itself), but the stack is not actually used for storing
    anything important, i.e., it is tail-recursive. """
    if x < 0:
      raise ValueError('non-negative integer')
    elif x == 0:
      return y 
    else:
      return fac1(x-1, y*x)

to

  def fac2(x):
    """ Computes factorials with a recursive process, keeping
    the state of the calculation on the stack. """
    if x < 0:
      raise ValueError('non-negative integer')
    if x == 0:
      return 1
    else:
      return fac2(x-1) * x

to

  def Ack(x, y):
    """ The Ackermann function. Creates a humongous mess even
    with quite tiny numbers. """
    if x < 0 or y < 0:
      raise ValueError('non-negative integer')
    elif x == 0:
      return y + 1
    elif y == 0:
      return foo3(x-1, 1)
    else:
      return foo3(x-1, foo3(x, y-1))

There's probably a word for the type of recursive process built
by fac2; the RDP's I'm most familiar with create a fac2 sort of
process, which stores valuable info on the stack.

And even though fac1 defines an iterative process, the code
itself is recursive, and you can call it a recursive function if
you wish (and in Python you might as well).

>     Correct me if I'm wrong but I'm beginning to think that
> pyparsing doesn't typically use recursion, at all.  It only
> employs it if you create one, using the Forward class.
> Otherwise, it does everything iteratively, hence the lack of
> "backtracking."

It's recursive because each production rule calls other
production rules to define itself. A rule regularly ends up
calling itself. Consider the Parser class I built earlier.
list_tail keeps calling itself to continue consuming characters
in an ab_list. The stack is used to keep track of where we are in
the grammar; at any time you can look up the stack and see how
you got where you are--you 'descend' down from the topmost
productions to the most primitive productions, and then back up
once everything has been sorted out. Take another look
at the exception raised in my Parsing class example for an
illustrative traceback.

>> I'm not sure one needs to start again with a naive approach just to
>> avoid any parser theory. For a user of a parser it is quite important
>> whether she has to wait 50 seconds for a parse to run or 50
>> milliseconds. I don't like to compromise speed for implementation
>> simplicity here.
>
>     Finally, I can't believe you complain about potential speed
> problems. First, depending on the size of the string, it's
> likely to be the difference between 2ms and 200ms.  Secondly,
> if speed were an issue, you wouldn't go with a recursive
> descent parser.  You'd go with LALR or the many other parsing
> techniques available.  Recursive descent parsing is for those
> situations where you need correctness, regardless of execution
> time.  These situations happen...

RDP is plenty fast; speed has never been one of it's
disadvantages, as far as I know. Today there are many
excellent parser generators and compiler builders that compose an
RDP under the hood, e.g., Antlr and Gentle.

>     I've said this before, albeit for a different language, but
> it applies to Python just as well.  I don't use Python to write
> fast code, I use it to write code fast.
>     If _you_ "don't like to compromise speed for implementation
> simplicity" then you have a plethora choices available to you.
> What about the guy who needs to parse correctly and is
> unconcerned about speed?

You have to be concerned about speed when something runs so
slowly in common circumstances compared to other well-known
algotithms that you can't practically wait for an answer. Would
you consider bubble-sort a suitable general-purpose sorting
algorithm for Python?

-- 
Neil Cerutti