[Python-Dev] The first trustworthy <wink> GBayes results

Greg Ward gward@mems-exchange.org
Wed, 28 Aug 2002 15:42:48 -0400

On 28 August 2002, Tim Peters said:
> What's an acceptable false positive rate?

Speaking as one of the people who reviews suspected spam for python.org
and rescues false positives, I would say that the more relevant figure
is: how much suspected spam do I have to review every morning?  < 10
messages would be peachy; right now it's around 5-20 messages per day.

Currently there are probably 1-3 FPs per day, although on a bad day
there can be 5-10.  (Eg. on 2002-08-21, six mailman-users posts from the
same guy were all caught, mainly because his ISP added X-AntiAbuse, and
his messages were multipart/alternative with unwrapped plain text.  This
is a perfect example of SpamAssassin screwing up royally.)  1-3 FPs/day
I can live with, but the real burden is the manual review: I'd much
rather have 5 FPs in a pool of 10 suspects than 1 FP out of 100

> What do we get from SpamAssassin?

Recall the stats I posted this morning; the bulk of spam is in Chinese
or Korean, and I have things setup so SpamAssassin never even sees it.
I think the only way to meaningfully answer this question is to stash
*everything* mail.python.org receives for a day or 10, spam and
otherwise, and run it all through SA.