[Spambayes] train on error - to exhaustion?
David Relson
relson at osagesoftware.com
Tue Dec 3 16:57:58 2002
At 11:27 AM 12/3/02, Greg Louis wrote:
>Doesn't look as though pure training-on-error is particularly
>advantageous with the Robinson-Fisher (chi) calculation method. It may
>still be useful in maintaining the effectiveness of an established
>training base.
Greg,
That makes sense. By definition, with training-on-error, only some of the
training corpora are put into the word lists. The obvious result is
smaller word lists. Other than list size, the effects are less clear. On
the one hand, incoming messages will have fewer "hits" in the word lists;
while on the other hand, the hits will be more "meaningful". With the
smaller lists, there is less "breadth of knowledge" about spam and
ham. This could account for the lack of advantage of training-on-error.
David
More information about the Spambayes
mailing list