Hi folks, I'm currently writing a column in PC Authority magazine (www.pcauthority.com.au <http://www.pcauthority.com.au/> ) on the new wave of spam that use randomised or semi-randomised words to confound Bayesian filters. I'm looking for a developer for SpamBayes who would be willing to help me understand the issue and who can make a few comments on the impact of this kind of spam on filters such as SpamBayes, and how spam is evolving in general. Any information, comments or quotes would be greatly appreciated. You can contact me through this email address: tdean at abbrev.com.au. Best regards, Tim Dean Freelance journalist w. www.timstechguide.com.au e. tdean@abbrev.com.au p. (02) 9518 3481 m. 0412 560 365
I'm currently writing a column in PC Authority magazine (www.pcauthority.com.au <http://www.pcauthority.com.au/> ) on the new wave of spam that use randomised or semi-randomised words to confound Bayesian filters.
I can take this, cos it's in my timezone, if no-one else wants to. I figure the key point about the random word spam is that it's just trying to overwhelm the bayesian filters. Personally, I'm finding them _slightly_ effective (2 or 3 a day slip through if they hit the right words) but not significantly more than that. Fundamentally, they still have to put words in that sell a product, and that screws them over. Anthony
I'm currently writing a column in PC Authority magazine (www.pcauthority.com.au <http://www.pcauthority.com.au/> ) on the new wave of spam that use randomised or semi-randomised words to confound Bayesian filters.
I can take this, cos it's in my timezone,
Plus you speak the local language ;)
if no-one else wants to.
Sounds good to me :)
I figure the key point about the random word spam is that it's just trying to overwhelm the bayesian filters. Personally, I'm finding them _slightly_ effective (2 or 3 a day slip through if they hit the right words) but not significantly more than that. Fundamentally, they still have to put words in that sell a product, and that screws them over.
I think that people have shown that random words are pretty ineffective (e.g. John Graham-Cumming at 2004's MIT Spam Conference). Random paragraphs (those news clippings and the like) are a bit more effective. I think that image-based spam is clearly far superior to any sort of random-word technique, though (although some of the image-spam also has the random words - I'm not sure that really helps the spammer, though). =Tony.Meyer
On Friday 20 October 2006 18:55, Tony Meyer wrote:
I'm currently writing a column in PC Authority magazine (www.pcauthority.com.au <http://www.pcauthority.com.au/> ) on the new wave of spam that use randomised or semi-randomised words to confound Bayesian filters.
I can take this, cos it's in my timezone,
Plus you speak the local language ;)
Does babelfish not have a Kiwi-ese to English translator? Pah.
I think that people have shown that random words are pretty ineffective (e.g. John Graham-Cumming at 2004's MIT Spam Conference). Random paragraphs (those news clippings and the like) are a bit more effective. I think that image-based spam is clearly far superior to any sort of random-word technique, though (although some of the image-spam also has the random words - I'm not sure that really helps the spammer, though).
The stuff that slips through for me tends to have a lot of individual lines from various news articles, smashed together randomly. My favourite one (this didn't get through - I just noticed it when emptying my spam box) was one that had something like [%RANDOM_LINE_1%] [%RANDOM_LINE_2%] [%RANDOM_LINE_3%] [%RANDOM_LINE_4%] Ah spammers - clearly they're the best and the brightest. :) The nasty one which I've only seen occasionally would be one that spammed by replying to an email you'd already sent (either from a public mailing list archive or from the mailbox of a compromised PC). Fortunately, the cost to individualise spams like this is much much higher than mass random blasting, so I've seen very very little of it. The ones I have seen seem to be manually entered - someone will reply to a post with "Have you heard about XYZspamproduct" with a link. Image spam could be more of a problem, except that the less text in the message, the more header clues come into play, as well. While SB doesn't do a massive amount of, for instance, RBL checking, defense in depth (spamassassin+graylisting on the server, SB on the client) seems pretty effective. I'll email the guy back. Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
Anthony> Image spam could be more of a problem, except that the less Anthony> text in the message, the more header clues come into play, as Anthony> well. While SB doesn't do a massive amount of, for instance, Anthony> RBL checking, defense in depth (spamassassin+graylisting on the Anthony> server, SB on the client) seems pretty effective. Have the spammers still not figured out how to defeat greylisting? (I suppose they may just not have the time to wait for the timeout on a compromised machine.) I've run postgrey for a couple years. Maybe that's one reason I don't see as much junk. Oh, another thing. I read my mail through XEmacs+VM and very rarely get legitimate email containing GIFs. When I do, legitimate or not, it's clear that the message has an image attached. I noticed with email clients like Thunderbird, that's not always the case. The GIF images might look like plain (though often colored, blinking) text when rendered. This became obvious to me when a guy at work showed me such a spam. He couldn't figure out why the spam filter at work hadn't caught it because it obviously had lots of spammy text. I explained that was actually a GIF image being displayed. He has a PhD in Computer Science and is an extremely bright guy, so I'm sure on casual glance lots of people focus on the random text and don't realize the sales pitch is embedded in an image, and conclude it must be the gibberish that's defeating the spam filter. It's just that the spam filter can't see what you see. Skip
>> I'm currently writing a column in PC Authority magazine >> (www.pcauthority.com.au <http://www.pcauthority.com.au/> ) on the new >> wave of spam that use randomised or semi-randomised words to confound >> Bayesian filters. Anthony> I can take this, cos it's in my timezone, if no-one else wants Anthony> to. I figure the key point about the random word spam is that Anthony> it's just trying to overwhelm the bayesian filters. Personally, Anthony> I'm finding them _slightly_ effective (2 or 3 a day slip Anthony> through if they hit the right words) but not significantly more Anthony> than that. Fundamentally, they still have to put words in that Anthony> sell a product, and that screws them over. The random word spam generally relies on the fact that the actual message is in the attached GIF image. It's not so much that the random words are defeating the filter. It's more that not enough useful tokens are being extracted from the image. Hence the move toward OCR approaches. Spammers evolve. Spam filters evolve. Skip
participants (4)
-
Anthony Baxter -
skip@pobox.com -
Tim Dean -
Tony Meyer