[Tutor] [part-of-speech tagging / montytagger / penn treebank]
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Sat Dec 7 18:05:02 2002
> For example: could the split command [split(s[, sep[, maxsplit]])] be
> modified to accept more than one 'sep' argument? That odd suggestion
> reflects my goal (generating an index for my log files): I don't see any
> simple software method of distinguishing nouns & adjectives in my logs -
> but splitting on the basis of connectives & aricles (to, the, in, etc.)
> might leave the noun - adjective relationship intact (more meaningful
> index entries I hope).
Hmmm... I just did a quick check, and ran into the following:
http://web.media.mit.edu/~hugo/research/montytagger.html
In Natural Language Processing (NLP), a common task that NLP researchers
do is take a sentence and attach part-of-speech roles to each word.
Here's a brief run through the program:
###
dyoo@coffeetable:~/montytagger-1.0/python$ python MontyTagger.py
***** INITIALIZING ******
Lexicon OK!
LexicalRuleParser OK!
ContextualRuleParser OK!
*************************
MontyTagger v1.0
--send bug reports to hugo@media.mit.edu--
> This is a test of the emergency broadcast system
This/DT is/VBZ a/DT test/NN of/IN the/DT emergency/NN broadcast/NN
system/NN
-- monty took 0.02 seconds. --
> In a hole, there lived a hobbit.
In/IN a/DT hole/NN ,/, there/EX lived/VBD a/DT hobbit/NN ./.
-- monty took 0.19 seconds.
###
Wow! This is pretty neat!
This program takes a sentence, and tries its best to attach part-of-speech
tags to each word. Here are the meanings of some of those tags:
DT --> determiner
IN --> preposition or subordinating conjunction
VBZ --> verb, 3rd person singular present
NN --> noun, singular or mass
EX --> Existential there
VBD --> Verb, past tense
I do not know a single one of these tags yet. *grin* But there is a good
list of them in the Penn Treebank Project:
http://www.cis.upenn.edu/~treebank/
ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz
Good luck to you!