Text mining in Python

Robert Kern robert.kern at gmail.com
Wed Mar 10 20:05:38 CET 2010


On 2010-03-10 12:58 PM, mk wrote:
> Hello everyone,
>
> I need to do the following:
>
> (0. transform words in a document into word roots)
>
> 1. analyze a set of documents to see which words are highly frequent
>
> 2. detect clusters of those highly frequent words
>
> 3. map the clusters to some "special" keywords
>
> 4. rank the documents on clusters and "top n" most frequent words
>
> 5. provide search that would rank documents according to whether search
> words were "special" cluster keywords or frequent words
>
> Is there some good open source engine out there that would be suitable
> to the task at hand? Anybody has experience with them?

You can probably do most of this with Whoosh:

   http://whoosh.ca/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the Python-list mailing list