python/regex question... hope someone can help

charonzen your.master at gmail.com
Sun Dec 9 14:45:53 EST 2007


> Another suggestion is to ensure that the job specification is not
> overly simplified. How did you parse the text into "words" in the
> prior exercise that produced the list of bigrams? Won't you need to
> use the same parsing method in the current exercise of tagging the
> bigrams with an underscore?
>
> Cheers,
> John

Thank you John, that definitely puts things in perspective!  I'm very
new to both Python and text parsing, and I often feel that I can't see
the forest for the trees.  If you're asking, I'm working on a project
that utilizes Church's mutual information score.  I tokenize my text,
split it into a list, derive some unigram and bigram dictionaries, and
then calculate a pmi dictionary based on x,y from the bigrams and
unigrams.  The bigrams that pass my threshold then get put into my
list of x_y strings, and you know the rest.  By modifying the original
text file, I can view 'x_y', z pairs as x,y and iterate it until I
have some collocations that are worth playing with.  So I think that
covers the question the same parsing method.  I'm sure there are more
pythonic ways to do it, but I'm on deadline :)

Thanks again!

Brandon



More information about the Python-list mailing list