[Spambayes] Need help getting started
Skip Montanaro
skip@pobox.com
Thu, 19 Sep 2002 14:27:57 -0500
Tim> Even more, I suggest moving them into the correct category of your
Tim> "live" test data. Randomness means "slice of life" in all possible
Tim> respects, and things that some other system misclassified (whether
Tim> that system be SpamAssassin or your own eyeballs doesn't matter)
Tim> should appear in the training data in the same proportion as they
Tim> occur in real life. Systematically moving misclassifications into
Tim> the normally passive reservoirs is a distortion of real life, just
Tim> as feeding the system *only* misclassifications would be.
Good point. I was thinking about the potential loss of data, but moving the
misclassified message to the right Set? directory and running rebal.py would
be better.
Skip