[Spambayes] picture ads
tameyer at ihug.co.nz
Mon Feb 14 09:35:16 CET 2005
> Spam Bayes has worked great for me, except for one thing:
> most of the spam I'm getting now in my inbox are picture
> ads disguised with innocuous text.or no text. (usually drug
> ads) I'm concerned that Spam Bayes only trains on the text,
> and if I keep "deleting as spam" these ads, I'll train
> SpamBayes into false positives.
No text is fine, because there are lots of tokens in the headers that will
be used. Innocuous text might be a problem - it really depends how often
those words appear in spam compared to ham. If they're just random words,
then it's probably still fine.
The best thing to do, IMO, would be to keep training as usual, and see if
things improve. If you do end up with more good mail in your unsure folder
(possibly, though I suspect not) then we'll have to figure something out.
Countering this type of spam is difficult, because no-one has a way of
converting a picture to something like "lots of white pills", which we could
use. OTOH, many mailers don't show images by default any more, which means
that this type of spam isn't effective for the spammer, either.
I wonder whether generating tokens from the image would work (simple ones
that aren't particularly time consuming). This sort of thing is used in
some image algorithms (e.g. cascades of haar-like classifiers for face
detection), so it's feasible that it would both work and be fast enough - it
really depends whether suitable classifiers can be found that differentiate
between ham images and spam ones.
I'd quite like to do some research into this, but (a) I don't have the time
right and the moment, and (b) either I get just about no spam like this, or
if I do, it's all classified correctly and I don't notice it.
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
More information about the Spambayes