What's an acceptable false positive rate?
Speaking as one of the people who reviews suspected spam for python.org and rescues false positives, I would say that the more relevant figure is: how much suspected spam do I have to review every morning? < 10 messages would be peachy; right now it's around 5-20 messages per day.
I must be missing something. I would *hope* that you review *all* messages claimed to be spam, in which case the number of msgs to be reviewed would, in a perfectly accurate system, be equal to the number of spams received.
OTOH, the false positive rate doesn't have anything to do with the number of spams received, it has to do with the number of non-spams received.
Currently there are probably 1-3 FPs per day, although on a bad day there can be 5-10. (Eg. on 2002-08-21, six mailman-users posts from the same guy were all caught, mainly because his ISP added X-AntiAbuse, and his messages were multipart/alternative with unwrapped plain text. This is a perfect example of SpamAssassin screwing up royally.) 1-3 FPs/day I can live with, but the real burden is the manual review: I'd much rather have 5 FPs in a pool of 10 suspects than 1 FP out of 100 suspects.
Maybe you don't want this kind of approach at all. The classifier doesn't have "gray areas" in practice: it tends to give probabilites near 1, or near 0, and there's very little in between -- a msg either has a preponderance of spam indicators, or a preponderance of non-spam indicators. You're simply not going to get a batch of "hmm, I'm not really sure about these" out of it. You would in a conventional Bayesian classifer, but Graham's ignores almost all of the words, judging on only the most extreme words present; when only extremes are fed in, the final result also tends to be extreme (the only cases where that doesn't obtain are those where the most extreme words it finds aren't extreme at all; e.g., a msg consisting entirely of "the", "and" and "it" would get rated as 0.5).
What do we get from SpamAssassin?
Recall the stats I posted this morning; the bulk of spam is in Chinese or Korean, and I have things setup so SpamAssassin never even sees it. I think the only way to meaningfully answer this question is to stash *everything* mail.python.org receives for a day or 10, spam and otherwise, and run it all through SA.
It would be good to have such a corpus regardless.