[Spambayes] incremental training strategies

Tim Peters tim.one@comcast.net
Mon Oct 28 17:24:46 2002


[Skip Montanaro]
> I am now running hammie.py from my procmailrc file, but not yet doing any
> filtering based on the results.  I trained it on my current setup (7000
> hams, 5000 spams).  Should I:
>
>     * train it on every message which passes through my inbox
>
>     * only train it on messages which it incorrectly classifies
>
>     * some other scheme
>
> ?  Or is that not yet known?

Experiment <wink>.  Note that chi-combining has a very real middle ground,
and you're not used to that yet:  you should certainly train it on msgs it
says it's unsure about.

For my personal email, I've trained on about 1,000 ham and 1,500 spam.  As
an experiment, I'm going to stop training now, except for Unsure msgs and
mistakes; however, I haven't yet seen a mistake beyond one spam python.org
let thru (it let thru more than that, but all the rest of those wound up in
my Unsure folder despite the "I've been thru python.org" ham clues; the one
that fooled both of us is hopeless).