[spambayes-dev] imbalance within ham or spam training sets?

T. Alexander Popiel popiel at wolfskeep.com
Mon Nov 3 17:07:57 EST 2003


In message:  <LNBBLJKPBEHFEDALKOLCMEJMGOAB.tim.one at comcast.net>
             "Tim Peters" <tim.one at comcast.net> writes:
>[T. Alexander Popiel]
>> No.  Training on other mail which does not contain the word does not
>> affect the score for a word at all ...
>
>It's a bit curious that this is true only so long as the word has appeared
>in only one kind of training data (only in spam, or only in ham).  As soon
>as a word appears in at least one of each, training on msgs that don't
>contain the word can change the word's score.

Yarg.  I stand corrected.

Perhaps it's time to test a variation where the prob is based on
hamcount and spamcount instead of hamratio and spamratio.  Hrm.
*tap, tap, tap*  I'll be back in a few hours...

- Alex



More information about the spambayes-dev mailing list