[Spambayes] Looking for code to modify spambayes for 5-gram tokenization
Richard Coleman
rcoleman at criticalmagic.com
Fri Jun 10 17:45:44 CEST 2005
I'm looking to modify spambayes to use 5-grams rather than
split-on-whitespace. We have a few Asian customers and the default
spambayes setup has not been very effective for them. So, we want to
test with 5-grams and see if we can improve the effectiveness.
I know that n-grams have been tested several times before. So, if
anyone has a n-gram tokenizer that they can share, I would appreciate a
copy. Otherwise, I'll dive in and write it myself.
Thanks.
Richard Coleman
rcoleman at criticalmagic.com
More information about the Spambayes
mailing list