[Tutor] Learning natural language processing and Python? [why NLP?]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Wed, 18 Sep 2002 15:04:51 -0700 (PDT)


On Wed, 18 Sep 2002, Stephen Harris wrote:

> As meaning becomes more abstract, there is a corresponding increase in
> ambiguity which means the filter tree of inference rules can become
> confused. So poetry that is translated from English to Russian and then
> back, is pretty garbled. I think it would be worse from English to
> Russian to French.
>
[some text cut]

Thankfully, the text documents I'm thinking about processing have very
little poetry.  *grin*

The problem that I really want to work on is to automatically "categorize"
technical documents, where the language is hopefully less ambiguous than
free verse.  Automatic document classification appears to be slightly less
hard than AI.


A few posts back, someone mentioned the 'spambayes' classifer as a program
that detects spam.  Spam has a specific "scent" that we can pick out.

But why stop at spam?  Ultimately, the idea I have is to put automatic
classification to a more constructive use: I'd like to categorize
Python-Tutor postings so that messages can be searched by topic.


> Anyway, I wrote because you seemed to have a mildly dabbling attitude
> about a project that I think would take a lot of time to create anything
> useful. The CyC webpage has more on theory.

Yes, I do dabble a lot.  *grin*

Don't worry: I know I should try to avoid reinventing the wheel.  I'm just
trying to build up my own general knowledge, just enough so I can
understand 'spambayes' and other classification systems.  I don't plan to
do anything serious.

(Perhaps the easiest thing to try is to run multiple copies of spambayes,
and just give each copy different training sets!  Hmmm...)



Good luck to you!