On Fri, Mar 30, 2012 at 12:24 PM, Shayan Md <mdoshayan(a)gmail.com> wrote:
> On Fri, Mar 30, 2012 at 5:05 AM, Stephen J. Turnbull <stephen(a)xemacs.org>wrote:
>> And (2) search and retrieval may
>> do a *lot* of message access, for example if you want to do data
>> mining (see Ana from Spain's thread).
> Isn't it the purpose of index?
Yes, of course, but possibly it's not good enough. Yes, when you know
what "features" (eg, author) you want to index. Then you can use an
online algorithm, indexing messages as they come in. However, many
data mining methods are adaptive, meaning that they discover features
of the corpus over time (eg, through "Bayesian" algorithms) and then
wish to go back and reindex or cross-reference previously examined
messages based on more accurate feature specifications.
As I say, if that's not your purpose, then you don't have to worry
about it. But in some cases it will matter (eg, pretend you're
Google, not just a GSoC student!)