Text Search Engine that works with Python

Ype Kingma ykingma at accessforall.nl
Mon Mar 4 13:55:20 EST 2002


Doug Farrell wrote:
> 
> Hi all,
> 
> I'm wondering if anyone knows of a text search engine that works with
> Python? What I'm looking for specifically is something that will compress
> the text and still allow searches and retrievals that can be exact matches
> or proximity based. The text I want to compress and search is huge (70 megs)
> and should compress down to half, not including any index files that might
> be required by the search engine. Anyone know of anything like this or any
> ideas?

In case you can use Jython as your Python implementation, have a look
at Lucene http://jakarta.apache.org/lucene/docs/index.html .

You'll have to do the compression yourself, but you can store any field
with a document, including one that is filtered through a zip outputstream
from the standard java libraries. You might consider storing only a reference
to a file containing the compressed text of your documents.

Lucene searches very fast. For 500 Mb of indexes in 15 lucene dbs, typical
query time is less than a second for all databases together on a 400Mhz
machine. Typical index size is around one third of original text.
The 15 dbs are my own choice, lucene could easily handle everything in
a single db.

Apart from exact matches and proximity you can also use prefix terms
and required terms. Lucene is optimized to retrieve only the best matches
to a query, but you can also use it's API in boolean mode.

Recommended, especially together with the lucene-users list.

Ype



More information about the Python-list mailing list