[Spambayes] Looking for code to modify spambayes for 5-gram tokenization

Richard Coleman rcoleman at criticalmagic.com
Fri Jun 10 17:45:44 CEST 2005


I'm looking to modify spambayes to use 5-grams rather than 
split-on-whitespace.  We have a few Asian customers and the default 
spambayes setup has not been very effective for them.  So, we want to 
test with 5-grams and see if we can improve the effectiveness.

I know that n-grams have been tested several times before.  So, if 
anyone has a n-gram tokenizer that they can share, I would appreciate a 
copy.  Otherwise, I'll dive in and write it myself.

Thanks.

Richard Coleman
rcoleman at criticalmagic.com


More information about the Spambayes mailing list