[Spambayes] Hand tuning the database?

Webb Scales scales at zko.dec.com
Mon Feb 9 19:05:32 EST 2004

Tony Meyer wrote:

> 1.  SpamBayes doesn't use any tokens that have a current spamprob between
> 0.4 and 0.6 (you can change these values if you like).  So 0.62 is just
> outside that range, and so it does appear to have a little bit of value
> (indicating that mail is just a wee bit more likely to be spam).

OK, that makes sense.  So, (other than ignoring the problem ;-) I could either
move the "goalposts", or find some ham that came through that mail gateway and
do some more training.

> it's
> basically doing what you've asked, but automatically, rather than via some
> manual tool.

That's cool.

I wasn't clear on how the classifier selected its evidence (nor how the
individual terms are weighted).

> 2.  The 150 'strongest' (furtherest from 0.5) tokens are used, by default.
> Early testing showed that this was a good number, but if you like, you can
> change this, too - if you set it high enough, then every token will be used,
> no matter what score it has.

Well, I only counted about 80 in the mail header, but, uh, I wasn't exactly
counting carefully.  Perhaps my training corpus was too small to complete cover
this piece of spam?

Anyway, I was wondering why there's no hand-tuning option, and I think you
answered the question.  So, I'm just going to ignore the evidence (which is how
any good logician proceeds ;-).



Webb Scales                                Hewlett-Packard Company
scales at zko.dec.com                         110 Spit Brook Rd, ZKO2-3/N30
Voice: 603.884.2196, FAX: 603.884.0120     Nashua, NH 03062-2711
Someone who thinks logically provides a nice contrast to the real world.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20040209/cd65355a/attachment-0001.html

More information about the Spambayes mailing list