[Spambayes] Bayesian virus detection?
mhammond at skippinet.com.au
Fri Jan 31 22:12:31 EST 2003
> I've "accidently" captured a handful of VB-script-type virii
> lately, since my spam training corpus apparently (luckily!)
> contained a few of these nasties. Got me thinking... I'd like to
> try training my classifier with a corpus of viral material, plus
> add a "virus" classification category into the mix, see what
> However, I haven't a clue as to how to go about deliberately
> collecting such crud (and apparently, neither does Google)...
I have a Python script that collects lots of "klez" and other "iframe
vulnerability" variants. IIRC, these have their payload in some illegal
HTML inside an iframe tag. My outlook (ie, late/patched versions) discards
the illegal HTML, so the payload is lost. Lots of other useful stuff is
also lost just due to the fact we are talking Outlook <wink>. In general
though, these mails tend to have very standard or empty bodies, making our
spambayes Outlook filter tend to put them in the "unsure" category. If I
train enough of them, the filter does eventually get it correct, but I found
a fairly trivial, stand-alone Python based filter works fine.
I collect around 150 of these a day though if you want them.
More information about the Spambayes