[Spambayes] Hand tuning the database?
tameyer at ihug.co.nz
Mon Feb 9 18:54:08 EST 2004
> What do you guys think of the idea of being able
> to mark certain terms in the database as being
> "not interesting"? (Of course, we would need a tool
> or tool-set to be able to do this, but....)
> The reason I ask is that, if the classifier is only
> going to consider 100 terms, I'd like it to be considering
> good ones, as opposed to things that are in every mail
> message that I get, spam and ham alike.
1. SpamBayes doesn't use any tokens that have a current spamprob between
0.4 and 0.6 (you can change these values if you like). So 0.62 is just
outside that range, and so it does appear to have a little bit of value
(indicating that mail is just a wee bit more likely to be spam). IOW, it's
basically doing what you've asked, but automatically, rather than via some
2. The 150 'strongest' (furtherest from 0.5) tokens are used, by default.
Early testing showed that this was a good number, but if you like, you can
change this, too - if you set it high enough, then every token will be used,
no matter what score it has.
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the Spambayes