[Spambayes] Still getting tons of false positives...

Tony Meyer tameyer at ihug.co.nz
Fri Apr 8 06:06:21 CEST 2005


> I have read several notes (and the web page) indicating that 
> one should strive for 1:1 ham / spam in training.

Approximately, yes.  1::1 is ideal, but you should be fine with 2::1, 1::3,
etc.  People with ratios like 65::1 or 431::1 tend to get weird results.

The golden rule is that if things are working, then you shouldn't need to
change them.

> Given that I use the web interface for training, I am not 
> clear on how I would go about doing that.

Indeed, it is somewhat of a difficult task.  Training only on mistakes and
unsures tends to help in my experience.  Adjusting the thresholds can help,
too - particularly the spam threshold, which can probably be reasonably
safely dropped down to around 80% once the classifier is trained.  If you
get a lot of duplicate spam, then only training on one of the duplicates
will help (assuming that the problem is too much spam).  You could do
training on fp, fn, unsures and a few ham every now and then to get the
balance up.

Is that any help?

> Also, thank you very much for getting the 1.0.4 fix out.  
> That is a big help.

You're welcome.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.



More information about the Spambayes mailing list