[Spambayes] Need help getting started

Tim Peters tim.one@comcast.net
Thu, 19 Sep 2002 13:08:35 -0400


[Skip Montanaro]
> Dunno if it's worth mentioning or not, but ...  When you discover ham in
> your spam (or vice versa) you should move such messages to the correct
> reservoir instead of simply deleting them.  No sense wasting good data.

Even more, I suggest moving them into the correct category of your "live"
test data.  Randomness means "slice of life" in all possible respects, and
things that some other system misclassified (whether that system be
SpamAssassin or your own eyeballs doesn't matter) should appear in the
training data in the same proportion as they occur in real life.
Systematically moving misclassifications into the normally passive
reservoirs is a distortion of real life, just as feeding the system *only*
misclassifications would be.