[Spambayes] Re: Hello + Problems

Mathew Hendry TJLWBECGSGWU at spammotel.com
Sat Jul 17 22:36:48 CEST 2004


On Sat, 17 Jul 2004 08:51:41 -0700, "Steven J. Hodgen" <steven at twitch.net>
wrote:

>My suspicion is that spammers are getting much cleverer in their use of
>things like:
>
>V iag r.a
>
>And permuting this sort of thing to such an extent that Spambayes can't
>latch on to it.  I'm a programmer, and I can see that this would be a
>difficult problem to solve.  One question, it is my understanding that
>Spambayes uses "words" as the basic unit for scoring.  If so, is a space the
>only character used as a break?  Any thoughts on this?  I'm certain that
>this business has been rehashed many times, since Spambayes has been around
>for a while now and has an active user base, but, well, the spam is driving
>me nuts.

One theory of mine is that the accuracy of SpamBayes is down less to the
spam you receive, but to your ham. Like most people I guess, I have two main
mail streams: home and work. I run the SpamBayes Outlook addin separately on
each. The home stream gets about the same amount and proportion of spam as
work, but the ham content is *much* more hererogeneous. I receive mail from
nearly a dozen different accounts, on all sorts of topics. Many of them are
quite spammy - including mailing lists with messages containing quoted spam.
SpamBayes issues "unsures" quite a lot, and they are usually ham. At work on
the other hand, mail content is relatively consistent, from relatively few
sources, on only one account. The spam is the same, but the ham isn't.
SpamBayes runs nearly flawlessly on this account - I get maybe a couple of
unsures a day, and it's usually a carefully crafted spam.

At home then, perhaps it would help to have separate databases for each
account? The problem then would be to identify the accounts, because all my
home mail (including hotmail and yahoo accounts) is collected externally by
the SpamCop Mail service, and I download it all from a SpamCop POP3 mailbox.
(I don't have SpamCop Mail set up to block suspected spam directly, because
it issues too many false positives for my liking. I have customized
SpamBayes to examine the custom headers SpamCop Mail adds, though).

It could be done by sniffing the headers, but that would be hard to do in a
general way. Even if I were collecting my mail from the accounts directly, I
don't think Outlook provides any way of finding out which account a
particular mail came from.

-- Mat.




More information about the Spambayes mailing list