ANN: NUCULAR B3 Full text indexing (now on Win32 too)

Paul Rubin http
Thu Feb 14 03:50:35 EST 2008


Jarek Zgoda <jzgoda at o2.usun.pl> writes:
> > I don't know what Sphinx is.
> http://www.sphinxsearch.com/

Thanks, looks interesting, maybe not so good for what I'm doing, but
worth looking into.  There is also Xapian which again I haven't looked
at much, but which has fancier PIR (probabilistic information
retrieval) capabilities than Lucene or the version of Nucular that I
looked at.

The main thing killing most of the search apps that I'm involved with
is disk latency.  If Aaron is listening, I might suggest offering a
config option to redundantly recording the stored search fields with
every search term in the index.  That will bloat the indexes by a
nontrivial constant factor (maybe 5x-10x) but even terabyte disks are
dirt cheap these days, so you still index a lot of data, and present
large result sets without having to do a disk seek for every result in
the set.  I've been meaning to crunch some numbers to see if this
actually makes sense.

Unfortunately, the concept of the large add-on memory card seems to
have vanished.  It would be very useful to have a cheap x86 box with a
buttload of ram (say 64gb), using commodity desktop memory and extra
modules for ECC.  It would be ok if it went over some slow interface
so that it was 10x slower than regular ram.  That's still 100x faster
than a flash disk and 1000x faster than a hard disk.



More information about the Python-list mailing list