spammers have found work around?
I've been using spambayes quite a while now with remarkably good results, almost 0 spam for many months. Recently, though, a few messages have been sneaking through in the 0% to 9% range. Do you think that spammers are reacting to spambays and other good intelligent filter spam blockers by crafting spam especially to get through. Anyone else noticing this change? Will spambayes always be able to learn these new techniques? Regards ..... Jim Mumm -- Certified Organic seeds for sprouting -- MUMM'S SPROUTING SEED LTD., Box 80, Parkside, SK, Canada S0J 2A0 mumms@sprouting.com ph 306 747 2935 fx 306 747 3618 www.sprouting.com
[Mumm's]
I've been using spambayes quite a while now with remarkably good results, almost 0 spam for many months. Recently, though, a few messages have been sneaking through in the 0% to 9% range. Do you think that spammers are reacting to spambays and other good intelligent filter spam blockers by crafting spam especially to get through.
Spammers have been working as hard as they can to evade filters of all kinds for years, but I doubt anyone is targeting SB specifically. We're too small a target, and this kind of classifier is difficult to sidestep reliably because it's so personalized. Spam is a business, and spammers get much higher return on investment by learning to fool "one filter to rule them all" systems used by large organizations (corporations, universities, ISPs). Oversimplified, say you're a well-heeled spammer (as some are). You *buy* one of these systems, and then send spam to yourself until you find a way to fool it. To a first approximation, then, that spam has a decent chance of fooling many installations of the same system. You send out millions of spam then as fast as you can, before the filter vendor has a chance to change the system to plug the hole you found. Now you can do the same thing with your own trained SpamBayes, but it won't do you much good: the training you do leaves you with a different database than the training I do, and unless you spend real money and effort to investigate me, you'll have no idea how to make your spam look hammy to *my* classifier. But if you could afford to do intelligent targeted marketing, you'd get a higher return by moving to a more traditional form of targeted advertising, trying to sell me high-ticket legitimate items instead of assorted bottom-feeding scams. You'd be out of the spam business then. That's why SB is hard to beat on a large scale. It's not trying to identify spam, it's trying to separate ham from spam according to an individual's tastes.
Anyone else noticing this change?
I notice that spam changes all the time. For example, "Rolex" spam has become very heavy over the past two weeks in my mix. A few of those were Unsure for me at first. I trained on 2, and haven't seen another one rate unsure since then. BTW, I throw away my database a few times each year and start over from scratch. I know that this is fun, and slashes the database size. I suspect it helps recognition accuracy too, but don't know that for a fact. If you feel like you're in a rut, try it! One common cause for deteriorating accuracy is training a message into the wrong category (ham as spam, or vice versa), and that's very hard to detect after the fact. As the months wear on, that's simply *going* to happen sooner or later. Starting over is often the easiest way to recover from that.
Will spambayes always be able to learn these new techniques?
Answering that requires perfect foreknowledge, so, yes, of course <wink>. We haven't made significant changes to the classifier or the tokenizer in many months, but I haven't noticed any decrease in effectiveness, despite that both the content and form of spam keeps changing. A good development is the increasing number of email clients that refuse to download images in HTML email automatically. I wish that were universal. So long as the spammer has to put stuff *in the msg* itself that's visible to you, they have to make their sales pitch and their URL visible to classifiers too, and then it can be analyzed. When all they give is a URL that automatically downloads a .gif or .jpg containing an image of the sales message, that's very hard to analyze. But if email clients stop downloading that stuff, the response rate on spam using that trick will fall to 0, and spammers will stop doing that. It's easy to forget that their goal isn't to irritate you, it's to extract money from you. To do that, they have to make a visible sales pitch.
Tim Peters wrote:
When all they give is a URL that automatically downloads a .gif or .jpg containing an image of the sales message, that's very hard to analyze.
one thing that spamassassin has that i really like is the SURBL checking mechanism. it scans an email for any URL-looking strings and then checks the domains in those URLs against a central database of known spammer URLs. so even in the case of "message with nothing but a jpeg URL" you'd still get a spam warning if the SURBL database has seen that particular item. checking spamassassin on my local server i see that 90% of my spams trip either the Bayes filter and the SURBL check, making them the most effective spam fighting tools in my arsenal. see http://www.surbl.org/ for more info. -jsd-
participants (3)
-
Jon Drukman -
Mumm's -
Tim Peters