Any module or library for full-text indexing?

Adam Ruth aruth at intercation.com
Wed May 10 09:24:54 EDT 2000


I use Swish++, http://www.best.com/~pjl/software/swish/.  It's got the
perfect combination of speed and ease for me.  Plus, while it can be used
for the web (that's where I use it), it's not web-centric and can therefore
be used for any application.

--
Adam Ruth
InterCation, Inc.
www.intercation.com


"Russell Turpin" <noone at do.not.use> wrote in message
news:391854D9.89285A9B at do.not.use...
> I'm looking for a Python module that does full text
> indexing, ie, that extracts a set of significant words
> from a text document, and searches for a candidate word
> in a list of words so extracted. The module should
> solve the following problems:
>
> COMMON WORD MANAGEMENT. No one wants to index on common
> words such as "the," "of," and "what." Ideally, a module
> that does full-text indexing would have some tool for
> managing the set of words that are defined as "common."
> Words not commonly in a dictionary, such as "Noam" and
> "Chomsky," are significant and should be indexed.
>
> COGNATES. The module should have some way of identifying
> variations of the same word when searching the index,
> ie, "goose" would also match on "geese," "mouse" on
> "mice," and "456" on "four-hundred fifty-six." This
> requires the module to have or make use of a language
> dictionary in some form. (I would be more than happy
> with noun cognates. Yeah, the number example is hard,
> and not required.)
>
> The package does not need to implement a persistence
> mechanism, nor manage the indices and their referents. In
> other words, the core functions I am looking for are:
>
>    extract_significant: text -> word_list
>    find: word, word_list -> set of hits
>
> These would be trivial functions if not for the
> linguistic aspects as described above, and it is
> precisely these problem for which I'm hoping to find a
> solution. Of course, if the module goes further, that
> is great.
>
> If there is no existing Python module for this, I would
> be interested in any C package that could be adapted
> toward this end. In this case, I would try to wrap the
> C package as a Python module, and make it available for
> other Python programmers.
>
> If there is no C package, I'll consider anything that
> can run on a Linux box.
>
> If there is no package that does this, I'll go out
> on the glacier and eat ice worms.
>
> Thanks!
>
> Russell





More information about the Python-list mailing list