[Spambayes] Ideas for an MSc project please...

Bill Yerazunis wsy at merl.com
Wed Feb 4 16:22:03 EST 2004

   From: "Ryan Malayter" <rmalayter at bai.org>

   I have a few ideas for you:

   1) using Bayesian-like statistics to evaluate code for virus-like
   behavior. I have no idea if it could work, but I would really like to
   see something that could stop a new worm before the anti-virus vendors
   have a chance to update their signatures. 

   2) Evaluate using different multi-gram strategies and sliding windows,
   like http://crm114.sourceforge.net/. Also, evaluate alternative parsing
   strategies or tricks, even coming up with new strategies (say, like
   evaluating what class-C subnet a message comes from). Tell us what
   strategies are actually best, with rigorous, *general-case* statistical
   evidence. Www.spamarchive.com may provide a source of material here, or
   you may be able to partner with your university's mail admins to get a
   diverse email mix from lots of users. Privacy issues be damned ;-).


I am currently finishing up the written version of the paper I gave at
MIT Spam Filtering 2004; I've got a first pass on some of that data.

I will post to you when it's on the web page.

  -Bill Yerazunis

