[Mailman-Developers] Google Summer of Code: Integration of Search Code

Stephen J. Turnbull stephen at xemacs.org
Fri Mar 30 07:18:45 CEST 2012


On Fri, Mar 30, 2012 at 12:24 PM, Shayan Md <mdoshayan at gmail.com> wrote:
> On Fri, Mar 30, 2012 at 5:05 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

>> And (2) search and retrieval may
>> do a *lot* of message access, for example if you want to do data
>> mining (see Ana from Spain's thread).

> Isn't it the purpose of index?

Yes, of course, but possibly it's not good enough.  Yes, when you know
what "features" (eg, author)  you want to index.  Then you can use an
online algorithm, indexing messages as they come in.  However, many
data mining methods are adaptive, meaning that they discover features
of the corpus over time (eg, through "Bayesian" algorithms) and then
wish to go back and reindex or cross-reference previously examined
messages based on more accurate feature specifications.

As I say, if that's not your purpose, then you don't have to worry
about it.  But in some cases it will matter (eg, pretend you're
Google, not just a GSoC student!)


More information about the Mailman-Developers mailing list