[Spambayes] Why I added src=cid: etc

Tim Peters tim.one@comcast.net
Mon Nov 4 16:17:56 2002


[Tim]
> This is typical of the kind of email I'm getting a lot of
> lately.  Without> mining the HTML, there's almost nothing to
> look at, not even a word in the Subject line.  (Of course, if we
> weren't throwing the HTML tags away, the classifier would have
> learned this stuff on its own.)

[Matt Sergeant]
> It's a virus though. Why don't you just get a gateway scanner (like the
> one I wrote [1] for qpsmtpd [2] which plugs into qmail and bounces
> viruses with a 5xx return code) which uses clamav[3]?

Because <wink>, like *most* of the world, I'm just running "the email stuff"
that came with my Windows box here.  Not one user in a thousand knows beans
beyond that.

> It's optimised for catching viruses, so you can focus on just catching
> spam (lets face it, the techniques are slightly different).

Yes.  Greg Ward and Neil Schemenauer here have each written their own virus
detectors too, and Greg's stops essentially all viruses from getting beyond
python.org.  The ones I'm getting come from other accounts, but somewhere
along the line the actual virus payload has been stripped out, leaving just
the little HTML trigger.

I wouldn't recommend this project's code for virus/worm detection, although
anecdotal reports here (not controlled experiments) have been that it works
for that purpose too.