[Spambayes] Why I added src=cid: etc

Matt Sergeant msergeant@startechgroup.co.uk
Mon Nov 4 16:35:36 2002


Tim Peters said the following on 04/11/02 16:17:
> [Tim]
> 
>>This is typical of the kind of email I'm getting a lot of
>>lately.  Without> mining the HTML, there's almost nothing to
>>look at, not even a word in the Subject line.  (Of course, if we
>>weren't throwing the HTML tags away, the classifier would have
>>learned this stuff on its own.)
> 
> [Matt Sergeant]
> 
>>It's a virus though. Why don't you just get a gateway scanner (like the
>>one I wrote [1] for qpsmtpd [2] which plugs into qmail and bounces
>>viruses with a 5xx return code) which uses clamav[3]?
> 
> Because <wink>, like *most* of the world, I'm just running "the email stuff"
> that came with my Windows box here.  Not one user in a thousand knows beans
> beyond that.

Ah Windows eh. I didn't realise anyone still used that. ;-) <sympathy/>

>>It's optimised for catching viruses, so you can focus on just catching
>>spam (lets face it, the techniques are slightly different).
> 
> Yes.  Greg Ward and Neil Schemenauer here have each written their own virus
> detectors too, and Greg's stops essentially all viruses from getting beyond
> python.org.  The ones I'm getting come from other accounts, but somewhere
> along the line the actual virus payload has been stripped out, leaving just
> the little HTML trigger.
> 
> I wouldn't recommend this project's code for virus/worm detection, although
> anecdotal reports here (not controlled experiments) have been that it works
> for that purpose too.

Yeah, I've got some neat results just from classifying file extensions. 
The double extension ones are especially good ;-)

Matt.