[Spambayes] Bayesian virus detection?

Mark Hammond mhammond at skippinet.com.au
Fri Jan 31 22:12:31 EST 2003

> I've "accidently" captured a handful of VB-script-type virii
> lately, since my spam training corpus apparently (luckily!)
> contained a few of these nasties. Got me thinking... I'd like to
> try training my classifier with a corpus of viral material, plus
> add a "virus" classification category into the mix, see what
> happens.
> However, I haven't a clue as to how to go about deliberately
> collecting such crud (and apparently, neither does Google)...

I have a Python script that collects lots of "klez" and other "iframe
vulnerability" variants.  IIRC, these have their payload in some illegal
HTML inside an iframe tag.  My outlook (ie, late/patched versions) discards
the illegal HTML, so the payload is lost.  Lots of other useful stuff is
also lost just due to the fact we are talking Outlook <wink>.  In general
though, these mails tend to have very standard or empty bodies, making our
spambayes Outlook filter tend to put them in the "unsure" category.  If I
train enough of them, the filter does eventually get it correct, but I found
a fairly trivial, stand-alone Python based filter works fine.

I collect around 150 of these a day though if you want them.


More information about the Spambayes mailing list