Is pyparsing really a recursive descent parser?

Thu Nov 8 00:07:00 EST 2007

"Chris Mellon" <arkanes at gmail.com> wrote in message 
news:mailman.948.1194480097.13605.python-list at python.org...
> On Nov 7, 2007 5:15 PM, Just Another Victim of the Ambient Morality
> <ihatespam at hotmail.com> wrote:
>>
>> "Chris Mellon" <arkanes at gmail.com> wrote in message
>> news:mailman.945.1194471797.13605.python-list at python.org...
>> > On Nov 7, 2007 3:15 PM, Just Another Victim of the Ambient Morality
>> > <ihatespam at hotmail.com> wrote:
>> >
>> >> > In short, it hasn't really evovled into a user-friendly package
>> >> > yet.
>> >>
>> >>     Thank you.
>> >>     How is it that I seem to be the only one in the market for a 
>> >> correct
>> >> parser?  Earley has a runtine of O(n^3) in the worst case and O(n^2)
>> >> typically.  I have trouble believing that everyone else in the world 
>> >> has
>> >> such intense run-time requirements that they're willing to forego
>> >> correctness.  Why can't I find a pyparsing-esque library with this
>> >> implementation?  I'm tempted to roll my own except that it's a fairly
>> >> complicated algorithm and I don't really understand how it's any more
>> >> efficient than the naive approach...
>> >
>> > You have an unusual definition of correctness. Many people would say
>> > that an ambiguous grammar is a bug, not something to support.
>>
>>     I don't think I do.
>
> There are an enormous variety of parsing tools, and it's the subject
> of much research. And in all those tools, not one meets your
> definition of correctness? You don't think that might make it unusual?

    It doesn't appear to be common, I'll grant you that!
    However, there is some research.  For instance, the Earley parser 
appears to be what I want (in conjunction with a parse tree builder).  A CYK 
parser would probably do, too.  The algorithms are out there yet no one has 
chosen to use any of them.  At the same time, there are several LALR 
parsers.  Why did anyone need to write the second one after the first one 
was written?!
    In fact, in a sense, my problem is solved.  There exists a solution to 
my problem.  It's just that no one has implemented that solution.  I guess 
you're right in that it really does appear to be an unusual problem but I 
don't understand how...


>> Besides, you assume too much...
>>     First off, we've already established that there are unambiguous 
>> grammars
>> for which pyparsing will fail to parse.  One might consider that a bug in
>> pyparsing...
>
> You might. Or you might not, since it's well known that there are lots
> of types of parsers that can't parse all possible grammars, but that
> doesn't make those parsers useless.

    No one said they were useless.  I only said that a correct parser is 
useful.  Many people in this thread seem to disagree and I find this 
incredible...


>>     Secondly, I get the impression you want to consider ambiguous 
>> grammars,
>> in some sense, "wrong."  They are not.
>
> Sure they are, at least in many contexts. I understand that you want
> support for them, but it's by far more common to want one and only one
> set of results from parsing a particular document.

    Okay, in some contexts, an ambiguous grammar may be considered 
erroneous.  However, in many other contexts, it's merely a fact of life. 
How is it that there are no tools to address this?  If nothing else, 
pyparsing throws the same error it does when there is no valid parsing of 
the string.  Having no solution and having several solutions are not the 
same thing...


>>Even if they were, if you are
>> parsing something for which you are not the creator and that something
>> employs an ambiguous grammar, what choice do you have?
>
> You either disambiguate, or you don't accept ambiguous input. The
> third option seems to be what you want, which is to find all possible
> solutions and return all of them (and wouldn't this be NP-hard in the
> general case?) but that's not a satisfactory result in most
> applications.

    What do you mean by "disambiguate?"  Do you mean disambiguate the 
grammar?  One of the conditions of the problem is that you have no control 
over the grammar, so that's really not an option.  Also, an implicit 
condition of solving a problem is that the problem be... solved, so not 
accepting the input is not an option, either.
    While there are many applications that can't deal with multiple 
solutions, surely there are some?  Again, perhaps you can pick one solution 
over the other through the context of the parse results?  That's kind of 
hard to do if your parser refuses to return any results...
    Just because a grammar is ambiguous doesn't mean the input string is. 
It can easily be the case that most or all the input you're expecting will, 
in practice, only produce one correct parse tree.  In this case, it would be 
useful for your parser to return a correct solution, even at random!
    Finally, who cares if something is NP-Hard?  Okay, in some situations, 
you'd care.  In many others, you don't.  For instance, suppose your input 
length has an upper bound?  Unless that's a really high bound or a really 
complex grammar, its runtime is not likely relevant...


>> Furthermore, given a
>> set of possible parsings, you might be able to decide which one you 
>> favour
>> given the context of what was parsed!  There's a plethora of applications
>> for parsing ambiguous grammars yet there are no tools for doing so?
>
> If you can do this, isn't this really a sign that your grammar is
> context sensitive?

    I don't think so.  I'm using the word "context" colloquially, not 
grammatically.  If you know what kind of output you're expecting and only 
one of several parsings produces that kind of output, then you know which 
one to run with.  That doesn't mean the grammar is context sensitive; just 
that the data is...
    As an aside, do you know what tools are available for parsing context 
senstive grammars?


>> > In fact, I often use pyparsing precisely in order to disambiguate
>> > (according to specific rules, which are embodied by the parser)
>> > ambiguous input, like bizarre hand-entered datetime value.
>>
>>     What do you mean?  How do you use pyparsing to disambiguate:
>>
>>     01-01-08
>
> The same way a human would - given an ambiguous date such as this, I
> (arbitrarily) decided what it would mean. The alternative is to refuse
> the input.

    You use pyparsing to "arbitrarily decide what it would mean?"
    You said you "often use pyparsing" to "disambiguate ambiguous input" and 
I was wondering what you meant by this.
    On one hand, it's not exactly a fair question since that date string is 
not grammatically ambiguous.  On the other hand, you're the one who brought 
up "bizarre hand-entered datetime value," and I was only trying to give an 
example of what you meant.  So, what do you mean?