[Spambayes] spamBayes is great, thank you all!

Tim Peters tim.one at comcast.net
Tue Sep 30 21:10:56 EDT 2003


[AndreasK]
> What a great program! Many many thanks ...!
>
> SpamBayes works just fine, removes 80%-90% SPAM automatically and
> learns with each manually deleted spam mail. Wow. Great.
> Only 4 non-spam mails (out of hundreds) were wrongly treated as spam
> so far - and they actually looked like Spam, I have to admit.

spambayes "should be" doing better than that.  It's best if you train it on
an approximately equal number of ham and spam.  If you've done so, and have
trained on at least several hundred of each, then I'd expect better
performance than you report here.  If your primary language isn't English,
that could explain it, as *most* developers and testers here use English.
If, for example, your primary email language is German, then the Outlook
addin's

[Tokenizer]
replace_nonascii_chars: True

setting may be inappropriate for you, and the default skip_max_word_size
value of 12 may be too small (13-character words like Unterstützung are hurt
by both of those:  first the ü gets replaced by a question mark due to
replace_nonascii_chars, and then the whole word gets replaced by a
synthesized "skip: U 10" token because 13 > 12).

We've done almost nothing here on tuning for languages other than English,
so I expect the default settings work best with English.  Still, I
appreciate that it's better than nothing even with the poor performance you
reported <wink>.




More information about the Spambayes mailing list