[Spambayes] optimal max_discriminators for chi2

Rob Hooft rob@hooft.net
Thu Oct 17 21:49:26 2002


This is a multi-part message in MIME format.
---------------------- multipart/mixed attachment
I did a series of runs:
=========================
[Classifier]
use_chi_squared_combining: True
robinson_minimum_prob_strength = 0.0
robinson_probability_s = 0.45
max_discriminators = XXXXXX

[TestDriver]
spam_cutoff: 0.70

nbuckets: 200
best_cutoff_fp_weight: 10

show_false_positives: True
show_false_negatives: True
show_best_discriminators: 50
show_spam_lo = 0.00
show_spam_hi = 0.80
show_ham_lo = 0.40
show_ham_hi = 1.00
show_charlimit: 5000
============

With XXXXXX between 15 and 300. Attached are plots of the 95th 
percentile ham, 5th percentile spam, and of the total cost vertical 
against max_discriminators horizontal. Please note again that my ham is 
much tighter than my spam: vertical scales are from 0 to 0.16 and from 
89 to 100, respectively (Almost a factor of 100!). The cost plot shows 
"no trend at all", but the variation is not large.

I'd almost conclude "anything goes", but based on the spam-5% value
I'd like to stick with values over ~40.

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/

---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: ham95.png
Type: image/png
Size: 6748 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021017/107955fc/ham95.png

---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: spam5.png
Type: image/png
Size: 6330 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021017/107955fc/spam5.png

---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: cost.png
Type: image/png
Size: 8545 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021017/107955fc/cost.png

---------------------- multipart/mixed attachment--