[Spambayes] Re: Training oddity/confusion
Mathew Hendry
TJLWBECGSGWU at spammotel.com
Thu Jan 13 14:34:42 CET 2005
On Thu, 13 Jan 2005 18:00:42 +1300, "Tony Meyer" <tameyer at ihug.co.nz> wrote:
>> I was doing a kind of manual "train to exhaustion", and the
>> other thing I noticed was that the spam took a lot more
>> training to make classification accurate (currently 82 ham :
>> 409 spam, out of a total training set of 644 : 1414). I guess
>> this simply means that my spam is a lot less consistent than my ham.
>
>With 'classic' train to exhaustion, the database is kept exactly balanced, I
>believe. How well is your system working for you?
Erm, not all that well. :|
My incoming mail is very unbalanced - 17:1 spam:ham since I started the
training - which can't help, but so far I have 18% unsure spam and 3% false
negatives. No mistakes on ham though; none scored higher than 0.5%. Given
that, I suppose I could simply mess with the thresholds.
-- Mat.
More information about the Spambayes
mailing list