[Spambayes] understanding high false negative rate

Jeremy Hylton jeremy@alum.mit.edu
Fri, 6 Sep 2002 16:45:37 -0400


>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

  GvR> Looks like your ham corpus by and large has To:
  GvR> jeremy@alum.mit.edu in a header while your spam corpus by and
  GvR> large doesn't.  But this one does.

By and large that's true.  Wouldn't it be true of any mailbox?  Most
of your real mail is addressed to you, but only some of the spam is.

  GvR> Where did you gather your spam corpus?  Could it be a
  GvR> collection of edge cases that SA didn't kill, like Barry's
  GvR> collection of SA false negatives?

A large chunk of my spam collection is from 2000.  The rest is recent,
starting about the same time spambayes did.  None of it was previously
filtered by SA.

Jeremy