Text mining in Python

Robert Kern robert.kern at gmail.com
Wed Mar 10 20:05:38 CET 2010

On 2010-03-10 12:58 PM, mk wrote:
> Hello everyone,
> I need to do the following:
> (0. transform words in a document into word roots)
> 1. analyze a set of documents to see which words are highly frequent
> 2. detect clusters of those highly frequent words
> 3. map the clusters to some "special" keywords
> 4. rank the documents on clusters and "top n" most frequent words
> 5. provide search that would rank documents according to whether search
> words were "special" cluster keywords or frequent words
> Is there some good open source engine out there that would be suitable
> to the task at hand? Anybody has experience with them?

You can probably do most of this with Whoosh:


Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

More information about the Python-list mailing list