[Spambayes] options.skip_max_word_size.

Alexander G. M. Smith agmsmith@rogers.com
Mon Oct 28 15:45:26 2002


Anthony Baxter wrote:
> I noticed a bunch of really nice ham clues were getting skipped in some
> of my personal email's 'unsure' bucket. They were words like 'interconnection'
> and other longer techie-words. I added an option skip_max_word_size and
> tried boosting it to 20 (from the default of 12). 

I took the naive approach and allow words up to 50 bytes long.  I picked
that because I saw some uuencoded data with 60 bytes per line.  Also
while looking up the spelling of supercalifragilisticexpialidoceous,
I found pneumonoultramicroscopicsilicovolcanoconiosis mentioned as the
longest word in English, according to some web site* which refered back
to the Oxford English Dictionary.  So, 50 seems like a nice safe value.

- Alex

*: http://www.dictionary.com/doctor/faq/l/longestword.html