[Spambayes] Looking for code to modify spambayes for 5-gramtokenization

Tony Meyer tameyer at ihug.co.nz
Mon Jun 13 05:14:05 CEST 2005


> I'm looking to modify spambayes to use 5-grams rather than 
> split-on-whitespace.  We have a few Asian customers and the default 
> spambayes setup has not been very effective for them.  So, we want to 
> test with 5-grams and see if we can improve the effectiveness.

You might want to look at (fi you haven't already):

[ 824651 ] Multibyte (CJK etc.) message support
<https://sourceforge.net/tracker/?func=detail&atid=498105&aid=824651&group_i
d=61702>

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. 



More information about the Spambayes mailing list