Re: [Mailman-Developers] Google Summer of Code: Integration of Search Code
On Fri, Mar 30, 2012 at 12:24 PM, Shayan Md <mdoshayan@gmail.com> wrote:
On Fri, Mar 30, 2012 at 5:05 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
And (2) search and retrieval may do a *lot* of message access, for example if you want to do data mining (see Ana from Spain's thread).
Isn't it the purpose of index?
Yes, of course, but possibly it's not good enough. Yes, when you know what "features" (eg, author) you want to index. Then you can use an online algorithm, indexing messages as they come in. However, many data mining methods are adaptive, meaning that they discover features of the corpus over time (eg, through "Bayesian" algorithms) and then wish to go back and reindex or cross-reference previously examined messages based on more accurate feature specifications.
As I say, if that's not your purpose, then you don't have to worry about it. But in some cases it will matter (eg, pretend you're Google, not just a GSoC student!)
participants (1)
-
Stephen J. Turnbull