[spambayes-dev] Deprecated options
Tony Meyer
ta-meyer at ihug.co.nz
Thu Aug 5 04:07:39 CEST 2004
[Tony]
> [Classifier]
> x-experimental_ham_spam_imbalance_adjustment - the code for
> this is gone already; it's just the option that's left.
> [Tokenizer] x-extract_dow
> [Tokenizer] x-generate_time_buckets
[Skip]
> Definitely zap the above.
Done.
[Tony]
> [Classifier] x-use_bigrams - becomes a regular
> option (defaulting to False?)
[Skip]
> False would be best. We already have people complaining
> about the size of their databases.
Unless anyone speaks up in the next couple of days, I'll remove the "x-"
from the option, the "EXPERIMENTAL" from the description, and leave it set
to False by default.
[Skip]
> Are the habeas headers a dead-end in the wider world that
> most Spambayes users simply don't use? If they are spoofed
> they should be a fairly good spam clue. I'm not sure I'd
> delete them yet.
I'm not certain - I very rarely see mail with them (I have an Outlook thingy
that puts a little *H* next to mail with them, so I do notice when mail
does) - with the exception of one source (TidBITS/TidBITS-Talk). For a
while I saw spam with them, too, but even that seems to have stopped. I
wonder whether perhaps the experiment failed, and they simply don't get used
any more.
I'm happy to leave them for the moment - it would certainly be interesting
to see results from anyone that does get habeas-marked mail (good or bad).
It's a while since I did any testing with it, so I reran it with my current
testing corpora and got a loss and an indifferent:
(first line is all defaults, second is searching for habeas headers, third
is reducing habeas headers to a single token)
-> <stat> tested 280 hams & 131 spams against 1111 hams & 512 spams
[...]
filename: exchanges exchange_habeass
exchange_habeas_reduces
ham:spam: 1391:643 1391:643 1391:643
fp total: 0 0 0
fp %: 0.00 0.00 0.00
fn total: 35 35 35
fn %: 5.44 5.44 5.44
unsure t: 83 82 82
unsure %: 4.08 4.03 4.03
real cost: $51.60 $51.40 $51.40
best cost: $33.80 $33.20 $33.20
h mean: 0.10 0.09 0.09
h sdev: 1.72 1.60 1.60
s mean: 89.34 89.33 89.33
s sdev: 25.65 25.64 25.64
mean diff: 89.24 89.24 89.24
k: 3.26 3.28 3.28
-> <stat> tested 4690 hams & 384 spams against 18764 hams & 1539 spams
[...]
filename: ihugs ihug_habeass
ihug_habeas_reduces
ham:spam: 23454:1923 23454:1923 23454:1923
fp total: 1 5 5
fp %: 0.00 0.02 0.02
fn total: 23 20 20
fn %: 1.20 1.04 1.04
unsure t: 169 151 154
unsure %: 0.67 0.60 0.61
real cost: $66.80 $100.20 $100.80
best cost: $57.00 $84.20 $83.00
h mean: 0.09 0.12 0.12
h sdev: 1.89 2.36 2.38
s mean: 95.86 96.42 96.43
s sdev: 14.99 14.20 14.17
mean diff: 95.77 96.30 96.31
k: 5.67 5.82 5.82
=Tony Meyer
More information about the spambayes-dev
mailing list