[ lots of interesting stuff elided ]
Tim> What's an acceptable false positive rate? What do we get from Tim> SpamAssassin? I expect we can end up below 0.1% here, and with a Tim> generous meaning for "not spam", but I think *some* of these Tim> examples show that the only way to get a 0% false-positive rate is Tim> to recode spamprob like so:
I don't know what an acceptable false positive rate is. I guess it depends on how important those falsies are. ;-)
One thing I think would be worthwhile would be to run GBayes first, then only run stuff it thought was spam through SpamAssassin. Only messages that both systems categorized as spam would drop into the spam folder. This has a couple benefits over running one or the other in isolation:
* The training set for GBayes probably doesn't need to be as big
* The two systems use substantially different approaches to identifying spam, so I suspect your false positive rate would go way down. False negatives would go up, but only testing can suggest by how much.
* Since SA is dog slow most of the time, SA users get a big speedup, since a substantially smaller fraction of your messages get run through it.
This sort of chaining is pretty trivial to setup with procmail. Dunno what the Windows set will do though.