ANN: NUCULAR B3 Full text indexing (now on Win32 too)
Fri Feb 22 23:31:06 CET 2008
Aaron Watters <aaron.watters at gmail.com> writes:
> [apologies to the list: I would have done this offline,
> but I can't figure out Paul's email address.]
> 1) Paul please forward your email address
Will send it privately. I don't have a public email address any more
(death to spam!!!). My general purpose online contact point is
http://paulrubin.com which currently has an expired certificate that
I'll get around to renewing someday. Meanwhile you have to click
"accept" to connect using the expired cert.
> 3) Since you seem to know about these things: I was thinking
> of adding an optional feature to Nucular which would allow
> a look-up like "given a word find all attributes that contain
> that word anywhere and give a count of the number of times it
> is found in that attribute as well as the entry id for an example
> instance (arbitrarily chosen). I was thinking about calling
> this "inverted faceting", but you probably know a
> better/standard name, yes? What is it please? Thanks!
> Answers from anyone else welcomed also.
In Solr this is called the DisMax (disjunction maximum) handler, I
think. I tried it and it doesn't work very well, and ended up using a
script written by a co-worker, that expands such queries to more
complex queries that put user-supplied weights on each field. It is a
somewhat messy problem. Otis Gospodnetic's book "Lucene in Action"
talks about it some, I believe. Manning and Schutz are working on a
new book at http://informationretrieval.org that discusses fancier
methods. I think these are worth looking into, but I haven't had the
bandwidth to spend time on it so far.
More information about the Python-list