Need help parsing with pyparsing...

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Mon Oct 22 23:29:34 CEST 2007


"Paul McGuire" <ptmcg at austin.rr.com> wrote in message 
news:1193070904.326690.49360 at i38g2000prf.googlegroups.com...
> On Oct 22, 4:18 am, "Just Another Victim of the Ambient Morality"
> <ihates... at hotmail.com> wrote:
>>     I'm trying to parse with pyparsing but the grammar I'm using is 
>> somewhat
>> unorthodox.  I need to be able to parse something like the following:
>>
>> UPPER CASE WORDS And Title Like Words
>>
>>     ...into two sentences:
>>
>> UPPER CASE WORDS
>> And Title Like Words
>>
>>     I'm finding this surprisingly hard to do.  The problem is that 
>> pyparsing
>> implicitly assumes whitespace are ignorable characters and is (perhaps
>> necessarily) greedy with its term matching.  All attempts to do the
>> described parsing either fails to parse or incorrectly parses so:
>>
>> UPPER CASE WORDS A
>> nd Title Like Words
>>
>>     Frankly, I'm stuck.  I don't know how to parse this grammar with
>> pyparsing.
>>     Does anyone know how to accomplish what I'm trying to do?
>>     Thank you...
>
> By the way, are these possible data lines?:
>
> A Line With No Upper Case Words
> A LINE WITH NO TITLE CASE WORDS
> SOME UPPER CASE WORDS A Title That Begins With A One Letter Word

    Thank you for your kind help!
    Unfortunately, there are some ambiguities but, hopefully and surely, 
they'll be very rare.  There will always be an uppercase section followed by 
a non-uppercase section.  So, your examples will parse like so:

A
Line With No Upper Case Words

    ...the second example will result in a parse error...

SOME UPPER CASE WORDS A
Title That Begins With A One Letter Word

    Occasional errors can be tolerated.  My problem was that my posted 
problem happened all the time which, of course, is not tolerable.  The 
ambiguities you bring up, especially the last one, are interesting and I'm 
not sure how to deal with them without an English grammatical analysis, 
which is too much, especially if I'm to integrate it with pyparsing.
    Another problem involves the ambiguity of numbers.  Some more examples, 
if you're interested:

FAHRENHEIT 451 2000 Copies Sold
1984 Book Of The Year

    The last example is actually okay but the first one is honestly 
ambiguous.
    Thanks again...






More information about the Python-list mailing list