[Spambayes] (no subject)

Amedee Van Gasse amedee at amedee.be
Thu May 25 16:24:30 CEST 2006


On Wed, May 24, 2006 23:44, Tony Meyer said:
> (As an aside: SpamBayes was created, for the most part, by English
> speakers.  The process should still work in other white-space
> delimited languages, but there may be a few issues.  For example,
> SpamBayes ignores any tokens that are fewer than 3 characters long -
> which includes 'worthless' English words like "a", "be", "to", "my",
> and so on.  However, many of these words are longer in German, so
> perhaps performance would be better with a lower limit of 4 (or maybe
> too much useful information would be lost then).  It would need
> experimentation to know for sure).

This sounds interesting.
The word lengths in Dutch are somewhere between those of English and German.
Is this a "configurable"?

-- 
Amedee



More information about the SpamBayes mailing list