[Spambayes] optimal max_discriminators for chi2
Rob Hooft
rob@hooft.net
Thu Oct 17 21:49:26 2002
This is a multi-part message in MIME format.
---------------------- multipart/mixed attachment
I did a series of runs:
=========================
[Classifier]
use_chi_squared_combining: True
robinson_minimum_prob_strength = 0.0
robinson_probability_s = 0.45
max_discriminators = XXXXXX
[TestDriver]
spam_cutoff: 0.70
nbuckets: 200
best_cutoff_fp_weight: 10
show_false_positives: True
show_false_negatives: True
show_best_discriminators: 50
show_spam_lo = 0.00
show_spam_hi = 0.80
show_ham_lo = 0.40
show_ham_hi = 1.00
show_charlimit: 5000
============
With XXXXXX between 15 and 300. Attached are plots of the 95th
percentile ham, 5th percentile spam, and of the total cost vertical
against max_discriminators horizontal. Please note again that my ham is
much tighter than my spam: vertical scales are from 0 to 0.16 and from
89 to 100, respectively (Almost a factor of 100!). The cost plot shows
"no trend at all", but the variation is not large.
I'd almost conclude "anything goes", but based on the spam-5% value
I'd like to stick with values over ~40.
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/
---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: ham95.png
Type: image/png
Size: 6748 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021017/107955fc/ham95.png
---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: spam5.png
Type: image/png
Size: 6330 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021017/107955fc/spam5.png
---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: cost.png
Type: image/png
Size: 8545 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021017/107955fc/cost.png
---------------------- multipart/mixed attachment--