[Spambayes] Making Tester and TestDriver unsure

Tim Peters tim.one@comcast.net
Thu Oct 17 07:29:13 2002


[T. Alexander Popiel]
> I thought it would be interesting to bring the middle ground
> into the Tester and TestDriver,

Indeed, long overdue.  Thank you!  I checked in a minor variation of this
patch.

Everyone, note that there's a new option ham_cutoff, and the meaning of
spam_cutoff has changed slightly.  Also new bool option show_unsure.  From
the new Options.py:

"""
[TestDriver]
...
# spam_cutoff and ham_cutoff are used in Python slice sense:
#    A msg is considered    ham if its score is in 0:ham_cutoff
#    A msg is considered unsure if its score is in ham_cutoff:spam_cutoff
#    A msg is considered   spam if its score is in spam_cutoff:
#
# So it's unsure iff  ham_cutoff <= score < spam_cutoff.
# For a binary classifier, make ham_cutoff == spam_cutoff.
# ham_cutoff > spam_cutoff doesn't make sense.
#
# The defaults are for the all-default Robinson scheme, which makes a
# binary decision with no middle ground.  The precise value that works
# best is corpus-dependent, and values into the .600's have been known
# to work best on some data.
ham_cutoff:  0.560
spam_cutoff: 0.560

...

show_unsure: False
"""

I should probably add that 0.05 and 0.95 probably aren't optimal, but may
well be close to optimal, if using chi-combining.


> in preparation for new comparators (cmp.py and table.py) which grok
> the middle ground.  Only so much I can do in one night, though.

Same here, I'm afraid -- I won't get to your later patch tonight.