[spambayes-dev] train to exhaustion?

Kenny Pitt kennypitt at hotmail.com
Fri Feb 13 11:51:24 EST 2004


Skip Montanaro wrote:
> I think you could probably approximate that closely enough by
> requiring that the number of misses drops from round to round.  (A
> "miss" in this case is a message that doesn't score within its proper
> zone.) 

In most cases that's probably true.  However, here's an example from one
of the test runs of tte.py that Tony posted to his web site:

round:  3, msgs: 1312, ham misses:   2, spam misses:   0
round:  4, msgs: 1312, ham misses:   0, spam misses:   2
round:  5, msgs: 1312, ham misses:   0, spam misses:   1
round:  6, msgs: 1312, ham misses:   0, spam misses:   0

The total number of misses did not decrease between rounds 3 and 4, but
further rounds did reduce the misses to zero.

I guess you could correct for that by stopping if the total misses
increases or if both ham misses and spam misses stay the same, but that
doesn't feel quite right either.  If nothing else, it fails to account
for Tony's original question:  "if one message was still a
false-positive, but moved from 0.8 to 0.7, is that improving?"

-- 
Kenny Pitt




More information about the spambayes-dev mailing list