[Spambayes] Making Tester and TestDriver unsure
Tim Peters
tim.one@comcast.net
Thu Oct 17 07:29:13 2002
[T. Alexander Popiel]
> I thought it would be interesting to bring the middle ground
> into the Tester and TestDriver,
Indeed, long overdue. Thank you! I checked in a minor variation of this
patch.
Everyone, note that there's a new option ham_cutoff, and the meaning of
spam_cutoff has changed slightly. Also new bool option show_unsure. From
the new Options.py:
"""
[TestDriver]
...
# spam_cutoff and ham_cutoff are used in Python slice sense:
# A msg is considered ham if its score is in 0:ham_cutoff
# A msg is considered unsure if its score is in ham_cutoff:spam_cutoff
# A msg is considered spam if its score is in spam_cutoff:
#
# So it's unsure iff ham_cutoff <= score < spam_cutoff.
# For a binary classifier, make ham_cutoff == spam_cutoff.
# ham_cutoff > spam_cutoff doesn't make sense.
#
# The defaults are for the all-default Robinson scheme, which makes a
# binary decision with no middle ground. The precise value that works
# best is corpus-dependent, and values into the .600's have been known
# to work best on some data.
ham_cutoff: 0.560
spam_cutoff: 0.560
...
show_unsure: False
"""
I should probably add that 0.05 and 0.95 probably aren't optimal, but may
well be close to optimal, if using chi-combining.
> in preparation for new comparators (cmp.py and table.py) which grok
> the middle ground. Only so much I can do in one night, though.
Same here, I'm afraid -- I won't get to your later patch tonight.