Text mining in Python

mk mrkafk at gmail.com
Wed Mar 10 19:58:07 CET 2010

Hello everyone,

I need to do the following:

(0. transform words in a document into word roots)

1. analyze a set of documents to see which words are highly frequent

2. detect clusters of those highly frequent words

3. map the clusters to some "special" keywords

4. rank the documents on clusters and "top n" most frequent words

5. provide search that would rank documents according to whether search 
words were "special" cluster keywords or frequent words

Is there some good open source engine out there that would be suitable 
to the task at hand? Anybody has experience with them?

Now, I do now about NLTK and Python bindings to UIMA. The thing is, I do 
not know if those are good for the above task. If somebody has 
experience with those or other and would be able to say if they're good 
for this, please post.


More information about the Python-list mailing list