Re: [Python-Dev] The first trustworthy <wink> GBayes results

3 Sep 2002

      [Tim, last week]
...
What's an acceptable false positive rate?
[my response]
...
Speaking as one of the people who reviews suspected spam for python.org
and rescues false positives, I would say that the more relevant figure
is: how much suspected spam do I have to review every morning?  < 10
messages would be peachy; right now it's around 5-20 messages per day.
[Tim again]
...
I must be missing something.  I would *hope* that you review *all* messages
claimed to be spam, in which case the number of msgs to be reviewed would,
in a perfectly accurate system, be equal to the number of spams received.
Good lord, certainly not!  Remember that Exim rejects a couple hundred
messages a day that never get near SpamAssassin -- that's mostly
Chinese/Korean junk that's rejected on the basis of 8-bit chars or
banned charsets in the headers.  Then, probably 50-75% of what SA gets
its hands on scores >= 10.0, so it too is rejected at SMTP time.  Only
messages that score < 10 are accepted, and those that score >= 5.0 are
set aside in /var/mail/spam for review.  That's 10-30 messages/day.

(I do occasionally scan Exim's reject log on mail.python.org to see
what's getting rejected today -- Exim kindly logs the full headers of
every message that is rejected after the DATA command.  I usually make
it to about 11am of a given day's logfile before my eyes glaze over from
the endless stream of spam and viruses.)

Note that we *used* to accept messages before passing them to
SpamAssassin, so never rejected anything on the basis of its SA score.
Back then, we saved and reviewed probably 50-70 messages/day.  Very,
very, very few (if any) false positives scored >= 10.0, which is why
that's the threshold for SMTP-time rejection.
...
OTOH, the false positive rate doesn't have anything to do with the number of
spams received, it has to do with the number of non-spams received.
Err, yeah, good point.  I make a point of talking about "suspected
spam", which is any message that scores between 5.0 and 10.0.  IMHO, the
true nature of those messages can only be determined by manual
inspection.
...
Maybe you don't want this kind of approach at all.  The classifier doesn't
have "gray areas" in practice:  it tends to give probabilites near 1, or
near 0, and there's very little in between -- a msg either has a
preponderance of spam indicators, or a preponderance of non-spam indicators.
That's a great improvement over SpamAssassin then: with SA, the grey
area (IMHO) is scores from 3 to 10... which is why several python.org
lists now have a little bit of Mailman configuration magic that makes MM
set aside messages with an SA score >= 3 for list admin review.  (It's
probably worth getting the list admin to do a bit more work in order to
avoid sending low-scoring spam to the list.)

However, as long as "very little" != "nothing", we still need to worry a
bit about that grey area.  What do you think we should do with a message
whose spam probability is between (say) 0.1 and 0.9?  Send it on, reject
it, or set it aside?  Just how many messages fall in that grey area
anyways?

        Greg
-- 
Greg Ward                          http://www.gerg.ca/
MTV -- get off the air!
    -- Dead Kennedys