[Spambayes] Why I added src=cid: etc
Matt Sergeant
msergeant@startechgroup.co.uk
Mon Nov 4 16:35:36 2002
Tim Peters said the following on 04/11/02 16:17:
> [Tim]
>
>>This is typical of the kind of email I'm getting a lot of
>>lately. Without> mining the HTML, there's almost nothing to
>>look at, not even a word in the Subject line. (Of course, if we
>>weren't throwing the HTML tags away, the classifier would have
>>learned this stuff on its own.)
>
> [Matt Sergeant]
>
>>It's a virus though. Why don't you just get a gateway scanner (like the
>>one I wrote [1] for qpsmtpd [2] which plugs into qmail and bounces
>>viruses with a 5xx return code) which uses clamav[3]?
>
> Because <wink>, like *most* of the world, I'm just running "the email stuff"
> that came with my Windows box here. Not one user in a thousand knows beans
> beyond that.
Ah Windows eh. I didn't realise anyone still used that. ;-) <sympathy/>
>>It's optimised for catching viruses, so you can focus on just catching
>>spam (lets face it, the techniques are slightly different).
>
> Yes. Greg Ward and Neil Schemenauer here have each written their own virus
> detectors too, and Greg's stops essentially all viruses from getting beyond
> python.org. The ones I'm getting come from other accounts, but somewhere
> along the line the actual virus payload has been stripped out, leaving just
> the little HTML trigger.
>
> I wouldn't recommend this project's code for virus/worm detection, although
> anecdotal reports here (not controlled experiments) have been that it works
> for that purpose too.
Yeah, I've got some neat results just from classifying file extensions.
The double extension ones are especially good ;-)
Matt.