[Spambayes] Re: Move closer to Gary's ideal

Guido van Rossum guido@python.org
Sat, 21 Sep 2002 07:52:13 -0400


> So... if we're doing well at classification this way, we will still
> be able to do spread-based things like using sliders for cutoffs, or
> putting some emails into a middle ground "you should look at this
> manually" group.

The "manual review" group only makes sense in a system that actually
*throws away* spam rather than simply labeling it.  We run with this
setup at python.org, and the sysadmins who scan the 30 or so middle
ground message per day for false positives are my heros.

But this is a rather unrealistic deployment scheme -- the fear of
false positives will make almost everybody else use a setup where the
system only *labels* spam, and the end user's email program sorts
those into spam and ham folders based upon the labeling.  (In fact it
was argued here that evern allowing an *option* to throw away spam
would make us liable -- though that was before Tim's fenomonal
results.)

When all you do is labeling, I see no use for a middle category --
it's easier to merge that into the ham, which the end user will look
at anyway.

I guess what I'm saying is that producing a less bipolar distribution
lets us have a more usable control to weigh the importance of
f.p. against f.n. -- nothing more, nothing less.  And that's
important, so that individual users can decide how big exactly their
fear of f.p.'s is, relative to their hatred for spam.

I guess this would require an X-Spam-Probability: <float> header,
rather than a Yes/No header, either in addition or instead of the
Yes/No header.  If you also want to provide a Yes/No header, you'd
need a database of user preferences plus a UI for it.  But only a
numeric value is probably hard for the average mail program to filter
on...

--Guido van Rossum (home page: http://www.python.org/~guido/)