[Spambayes] Re: Training oddity/confusion

Mathew Hendry TJLWBECGSGWU at spammotel.com
Thu Jan 13 14:34:42 CET 2005


On Thu, 13 Jan 2005 18:00:42 +1300, "Tony Meyer" <tameyer at ihug.co.nz> wrote:

>> I was doing a kind of manual "train to exhaustion", and the 
>> other thing I noticed was that the spam took a lot more 
>> training to make classification accurate (currently 82 ham : 
>> 409 spam, out of a total training set of 644 : 1414). I guess 
>> this simply means that my spam is a lot less consistent than my ham.
>
>With 'classic' train to exhaustion, the database is kept exactly balanced, I
>believe.  How well is your system working for you?

Erm, not all that well. :|

My incoming mail is very unbalanced - 17:1 spam:ham since I started the
training - which can't help, but so far I have 18% unsure spam and 3% false
negatives. No mistakes on ham though; none scored higher than 0.5%. Given
that, I suppose I could simply mess with the thresholds.

-- Mat.




More information about the Spambayes mailing list