[Spambayes] Here's why "generate_long_skips: False" worked...
Tim Peters
tim.one@comcast.net
Mon, 30 Sep 2002 22:22:03 -0400
[Neil Schemenauer]
> I tried generating 2 character-grams when has_highbit_char was true.
In addition to, or in lieu of, generating skip tokens?
> I seem to recall that it worked okay. The bonus would be that there
> would be a limit of 2**16 of these tokens in the DB.
Appreciated. I used to do character 5-grams in this case, and the database
burden was significant. Plus results didn't get worse when I stopped doing
n-grams altogether.
Somebody want to try this on their corpus?
1. Current vs doing character 2-grams when has_highbit_char is true
instead of generating skip tokens.
2. Current vs doing character 2-grams when has_highbit_char is true
in addition to generating skip tokens.