DocIndexer is a document indexer toolkit that uses the PyLucene search engine for indexing and searching document files. DocIndexer includes command-line utilities, Python index and search classes plus a Win32 COM server that can be used to integrate indexing and searching into application software. The current version has parser support for Microsoft Word, HTML, PDF and plain text documents.
0.9 is the long overdue rewrite of 0.7 -- the Lupy search library has been replaced with PyLucene plus there are lots of new features along with significant performance increases.
Win32: None (compiled binary distribution). Linux: Python 2.5, PyLucene 2.