[Spambayes] One potential problem with this filter apporach

Tim Peters tim.one at comcast.net
Tue Apr 29 22:28:17 EDT 2003

[John Clegg]
> I am really impressed with this implementation of a spam filter.

So are we <wink>.  Thanks!

> Like everyone else I (and my company) have been plagued by spam. I was
> thinking about the way the spambayes works, and I think I have thought
> of a way spammers could get around it. A devious spammer could use
> images as instead of text.  So the email would just contain an HTML
> table. It's something you guys should think about how your filter will
> operate on these types of emails.

Asian spam has been doing this for a long time.  I believe it's not because
they're trying to fool filters, but because they can't rely on email clients
rendering their character sets correctly.  So they put a jpeg of the spam
out on the web, and just send URLs in the email (embedded in suitable HTML
to get the image(s) rendered).  I saw this show up in less exotic spam much
later, where I believe they are trying to fool filters.

As has already been mentioned, spambayes doesn't do any sort of semantic
content analysis, yet this kind of spam usually gets caught anyway.  Clues
they don't manage to hide this way include "funny stuff" in the email
headers, and the URLs themselves.  Indeed, just finding the character
strings ".jpg" or ".gif" in a URL turn out to be strong spam clues, and if
the message doesn't have any hammish text then spambayes pays a lot of
attention to the handful of spam clues it finds.

> FYI: I am former CTO of Baazee.com from India and I have designed
> email delivery systems for the company.

Next time you'll know enough to code them in Python <wink>.

