Text mining in Python
Robert Kern
robert.kern at gmail.com
Wed Mar 10 14:05:38 EST 2010
On 2010-03-10 12:58 PM, mk wrote:
> Hello everyone,
>
> I need to do the following:
>
> (0. transform words in a document into word roots)
>
> 1. analyze a set of documents to see which words are highly frequent
>
> 2. detect clusters of those highly frequent words
>
> 3. map the clusters to some "special" keywords
>
> 4. rank the documents on clusters and "top n" most frequent words
>
> 5. provide search that would rank documents according to whether search
> words were "special" cluster keywords or frequent words
>
> Is there some good open source engine out there that would be suitable
> to the task at hand? Anybody has experience with them?
You can probably do most of this with Whoosh:
http://whoosh.ca/
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list