[Spambayes] Enhancement Suggestion

Tony Meyer tameyer at ihug.co.nz
Thu Jan 29 23:52:44 EST 2004


> To be honest, I haven't gone into the deep detail. What I do 
> know, is that a specific person I do not know is often in the 
> to: field in my spam. Whenever he is there the mail is ALWAYS 
> spam. Bayes logic should one would hope identify spam and 
> that person with a very high probability.

Well, not really.  The way that a Bayesian style classifier works is that it
adds up lots of different clues.  Having a single person always in spam (and
never in ham) will only generate a single token, although it should have a
spamprob very close to 1.  This will get combined with whatever other tokens
are in the message, though, which may move the score closer to 0.

> I guess my point was that using a word in the textual part of 
> an email to generate a probability should not carry the same 
> weight as a name or addess. The word VIAGRA should not imply 
> spam as strongly as this guy's address. The weightings should 
> be different.

Well, that would work for you, at least with this, but what about when you
get spam from a known address, for example?  It would then be more likely to
be a false-negative.  It doesn't seem all that likely that producing
additional tokens, or weighting tokens more heavily, just because they are
in the headers, would give better results.

That said, almost anything's worth trying.  If you really feel strongly that
this would help, you can either try it yourself (if you can modify the
code), or submit a feature request, and someone will get around to testing
it out sooner or later.

You haven't actually said how well these messages are scoring, though. That
sort of information would really help if you do open a feature request,
since the vast majority of messages should be correctly scored without this,
anyway.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list