Writing a parser the right way?

Paul McGuire ptmcg at austin.rr._bogus_.com
Wed Sep 21 21:17:08 CEST 2005


"beza1e1" <andreas.zwinkau at googlemail.com> wrote in message
news:1127300661.440587.287950 at g47g2000cwa.googlegroups.com...
> I'm writing a parser for english language. This is a simple function to
> identify, what kind of sentence we have. Do you think, this class
> wrapping is right to represent the result of the function? Further
> parsing then checks isinstance(text, Declarative).
>
> -------------------
> class Sentence(str): pass
> class Declarative(Sentence): pass
> class Question(Sentence): pass
> class Command(Sentence): pass
>
> def identify_sentence(text):
>     text = text.strip()
>     if text[-1] == '.':
>         return Declarative(text)
>     elif text[-1] == '!':
>         return Command(text)
>     elif text[-1] == '?':
>         return Question(text)
>     return text
> -------------------
>
> At first i just returned the class, then i decided to derive Sentence
> from str, so i can insert the text as well.
>
Andreas -

Are you trying to parse any English sentence, or just a limited form of
them?  Parsing *any* English sentence (or question or interjection or
command) is a ***huge*** undertaking - Google for "natural language" and you
will find many efforts (with substantial time and money and manpower
resources) working on this problem.  Applications range from automated
language translation to helpdesk automated analysis.  I really suggest you
do a bit of research on this topic, just to get an idea of how big this job
is.  Here's a Wikipedia link:
http://en.wikipedia.org/wiki/Natural_language_processing

Here are some simple examples, that quickly go beyond
subject-predicate-object:

I drive a truck.
I drive a red truck.
I drive a red truck to work.
I drive a red truck to the shop to work on it.
I drive a red truck to the shop to have some work done on it.
I drive a red truck very fast.
I drive a red truck through a red light.

Then factor in other sentences (past and future tenses, past and future
perfect tenses, figurative metaphors) and parsing general English is a major
job.  The favorite test case of the natural language folks is "Time flies
like an arrow," which early auto-translation software converted to "Temporal
insects enjoy a pointed projectile."

On the other hand, if you plan to limit the type and/or content of the
sentences being parsed (such as computer system commands or adventure game
inputs, or descriptions of physical objects), then you can scope out a
reasonable capability by choosing a vocabulary of known verbs and objects,
and avoiding ambiguities (such as "set", as in "I set the set of glasses
next to the TV set," or "lead" as in "Lead me to the store that sells lead
pencils.").

Hope this sheds some light on your task,
-- Paul





More information about the Python-list mailing list