[Python-Dev] The first trustworthy <wink> GBayes results

Skip Montanaro skip@pobox.com
Wed, 28 Aug 2002 22:15:11 -0500


[ lots of interesting stuff elided ]

    Tim> What's an acceptable false positive rate?  What do we get from
    Tim> SpamAssassin?  I expect we can end up below 0.1% here, and with a
    Tim> generous meaning for "not spam", but I think *some* of these
    Tim> examples show that the only way to get a 0% false-positive rate is
    Tim> to recode spamprob like so:

I don't know what an acceptable false positive rate is.  I guess it depends
on how important those falsies are. ;-)

One thing I think would be worthwhile would be to run GBayes first, then
only run stuff it thought was spam through SpamAssassin.  Only messages that
both systems categorized as spam would drop into the spam folder.  This has
a couple benefits over running one or the other in isolation:

    * The training set for GBayes probably doesn't need to be as big

    * The two systems use substantially different approaches to identifying
      spam, so I suspect your false positive rate would go way down.  False
      negatives would go up, but only testing can suggest by how much.

    * Since SA is dog slow most of the time, SA users get a big speedup,
      since a substantially smaller fraction of your messages get run
      through it.

This sort of chaining is pretty trivial to setup with procmail.  Dunno what
the Windows set will do though.

Skip