[Spambayes] Re: Training oddity/confusion
tameyer at ihug.co.nz
Thu Jan 13 21:19:37 CET 2005
> >With 'classic' train to exhaustion, the database is kept exactly
> >balanced, I believe. How well is your system working for you?
> Erm, not all that well. :|
:( I'm trying to get things rearranged a little for 1.1 so that it's easier
to try out different training regimes (including tte) with the various apps,
so hopefully that'll help.
> My incoming mail is very unbalanced - 17:1 spam:ham since I
> started the training - which can't help, but so far I have
> 18% unsure spam and 3% false negatives. No mistakes on ham
> though; none scored higher than 0.5%. Given that, I suppose I
> could simply mess with the thresholds.
I've read reports of people who have done that (in an extreme way, so that
the cutoffs are 5% and 10% or something like that). It seems pretty risky
to me, though, since a message that contains nothing that has been seen
before will score 0.5 and that would be same under that system...
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
More information about the Spambayes