[spambayes-dev] imbalance within ham or spam training sets?
T. Alexander Popiel
popiel at wolfskeep.com
Mon Nov 3 17:07:57 EST 2003
In message: <LNBBLJKPBEHFEDALKOLCMEJMGOAB.tim.one at comcast.net>
"Tim Peters" <tim.one at comcast.net> writes:
>[T. Alexander Popiel]
>> No. Training on other mail which does not contain the word does not
>> affect the score for a word at all ...
>
>It's a bit curious that this is true only so long as the word has appeared
>in only one kind of training data (only in spam, or only in ham). As soon
>as a word appears in at least one of each, training on msgs that don't
>contain the word can change the word's score.
Yarg. I stand corrected.
Perhaps it's time to test a variation where the prob is based on
hamcount and spamcount instead of hamratio and spamratio. Hrm.
*tap, tap, tap* I'll be back in a few hours...
- Alex
More information about the spambayes-dev
mailing list