[Spambayes] It gets funnier all the time....
Skip Montanaro
skip at pobox.com
Thu Feb 13 06:16:49 EST 2003
TimP> I'd hate to see the code bloat with gimmicks that don't prove
TimP> themselves via testing
Skip> People asked about decoding stuff that was encoded but didn't have
Skip> a Content-Transfer-Encoding header. I suggested the diff I
Skip> posted. That's as far as it's gone at this point.
TimS> Apparently our test corpora didn't include any mail with this
TimS> problem.
Au contraire. Using my untouched-since-December ham/spam collections I ran
a 10-fold cross-validation last night. The summary results are
filename: base cte
ham:spam: 2000:2000
2000:2000
fp total: 9 9
fp %: 0.45 0.45
fn total: 17 14
fn %: 0.85 0.70
unsure t: 94 100
unsure %: 2.35 2.50
real cost: $125.80 $124.00
best cost: $76.20 $77.60
h mean: 1.50 1.56
h sdev: 9.59 9.80
s mean: 98.03 98.14
s sdev: 10.91 10.62
mean diff: 96.53 96.58
k: 4.71 4.73
"base" is an empty ini file. "cte" is
[Tokenizer]
assume_missing_cte: True
so in this case at least the false negatives got slightly better and the
unsures a bit worse. I suspect this is typical of what we'll see with most
changes at this stage of the game - somewhat inconclusive results. Whether
or not to add it is going to be a judgement call.
A patch which implements this change is attached for anyone who wants to run
the test.
Skip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sb.diff
Type: application/octet-stream
Size: 2078 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030213/2dfdb1d1/sb.obj
More information about the Spambayes
mailing list