[Spambayes] Ideas for an MSc project please...
wsy at merl.com
Wed Feb 4 16:22:03 EST 2004
From: "Ryan Malayter" <rmalayter at bai.org>
I have a few ideas for you:
1) using Bayesian-like statistics to evaluate code for virus-like
behavior. I have no idea if it could work, but I would really like to
see something that could stop a new worm before the anti-virus vendors
have a chance to update their signatures.
2) Evaluate using different multi-gram strategies and sliding windows,
like http://crm114.sourceforge.net/. Also, evaluate alternative parsing
strategies or tricks, even coming up with new strategies (say, like
evaluating what class-C subnet a message comes from). Tell us what
strategies are actually best, with rigorous, *general-case* statistical
evidence. Www.spamarchive.com may provide a source of material here, or
you may be able to partner with your university's mail admins to get a
diverse email mix from lots of users. Privacy issues be damned ;-).
I am currently finishing up the written version of the paper I gave at
MIT Spam Filtering 2004; I've got a first pass on some of that data.
I will post to you when it's on the web page.
More information about the Spambayes