[Spambayes] Question about the score of spam suspects

Thu May 13 03:51:47 EDT 2004

> If this is the case, then how come that a simple 'delete as 
> spam' changed a message's score from a number below 80% (my
> spam threshold) to 100%? 

SpamBayes is often strongly hapax based.  If you look at the clues for the
message now, you'll probably see a lot of 0 1 clues.  With a well balanced
corpus, these will be all quite strong, and so form the majority of the
clues used to score the message, pushing out any weak ham clues that it may
have previously had.

(In any case, looking at the clues for a message before you "delete as spam"
it, and then at the clues again afterwards, would answer the question.  It's
often interesting (to some of us, anyway!) to do this anyway - to try and
see why the message wasn't scored correctly in the first place).

This is pretty typical - 3 of the 300 spams I've trained on (all either
previously unsure or false negatives) didn't end up with a 100% score after
they were retrained on.  I don't keep track of the ham, but I'm pretty sure
that the ham scores would be pretty similar (i.e. they almost all go to 0%).

This is why you don't test your system on the same data you've trained it
on, after all.  Well, unless you want to inflate your results <wink>.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.