similar words index?

John Machin sjmachin at lexicon.net
Fri Jan 2 04:32:51 EST 2009


On Jan 2, 8:07 pm, robert <no-s... at no-spam.invalid> wrote:
> how can one index (text documents) for efficient similar word search?
> existing modules?
> what principles are used by search engines therefore?

Only your second question is on-topic for this newsgroup. Try this:

http://pylucene.osafoundation.org/

Looking at the site for Lucene itself, where you should find
references to the various technologies they use, and some (definitely
recommended) googling should give you some clues about your other
questions. Some computer science topics are: Burkhard-Keller tree,
Voronoi diagram/tree, permuted lexicon ... do bear in mind that what
is actually used in the real-world search engines like Google may be
rather difficult to find out; Google sure ain't open source, not any
more.

HTH,
John



More information about the Python-list mailing list