[Spambayes] Spam in Images

Alan Arndt aga at jlw.com
Wed Aug 2 01:59:23 CEST 2006


Skip,

Thanks.  I did look through the last three months of forum archive.  I
didn't specifically see this addressed.  I did notice some comments about
porn images, etc.

I don't think the image size works.  I just saved about 20 of my most recent
spam images and while the vast majority (1/2) are pushing a stock and most
of them are pretty similar in size they aren't all the same.  The other 10
images were all of quite varying sizes, even 3 with exactly the same text
were deliberately made quite different in size.

I was amazed that they had gone to the extent of adding random bits into
each of the images, but I guess they knew someone would try to compare them.

I am not an expert at looking at the raw data of the e-mail.  I can only
hope that there is some way they reference them that might be different from
images sent to me by friends.  But I'm not optimistic that one can determine
that by the attributes of the image or the rest of the message itself.

So that does lead to examining the image.  The main difference is
immediately obvious.  It's not a picture, it's just some formatted text put
into an image.  Given that one could hope that some simple image analysis
process could quickly classify them as different, or even make it a learning
process like the rest of the spam filtering.  The big downside is that image
analysis is expensive compute and time wise.  Not to mention all the various
formats of images that the tool would need to process.  Perhaps that is
constrained by limitations of what e-mail programs will actually render.

All in all, not a good outlook it seems.

-Alan

-----Original Message-----
From: skip at pobox.com [mailto:skip at pobox.com]
Sent: Tuesday, August 01, 2006 4:29 PM
To: Alan Arndt
Cc: spambayes at python.org
Subject: Re: [Spambayes] Spam in Images


    Alan> I haven't thought of a decent way to filter these types of things.
    Alan> I hope someone else can and that it can get implemented into
    Alan> SpamBayes....

    Alan> Does anyone have any good suggestions?

This topic has come up several times in the past.  There is, as yet, no
perfect way to identify these sorts of spams.  The last time it came up
(maybe a month ago), optical character recognition (OCR) came up as a
possible means of getting at the text.  Unfortunately, the open source tools
available fall far short of the mark as far as accuracy is concerned.

Perhaps image size would be a helpful clue.  I don't know if anyone has
tried that before.

Skip




More information about the SpamBayes mailing list