[Spambayes] (no subject)
Amedee Van Gasse
amedee at amedee.be
Thu May 25 16:24:30 CEST 2006
On Wed, May 24, 2006 23:44, Tony Meyer said:
> (As an aside: SpamBayes was created, for the most part, by English
> speakers. The process should still work in other white-space
> delimited languages, but there may be a few issues. For example,
> SpamBayes ignores any tokens that are fewer than 3 characters long -
> which includes 'worthless' English words like "a", "be", "to", "my",
> and so on. However, many of these words are longer in German, so
> perhaps performance would be better with a lower limit of 4 (or maybe
> too much useful information would be lost then). It would need
> experimentation to know for sure).
This sounds interesting.
The word lengths in Dutch are somewhere between those of English and German.
Is this a "configurable"?
More information about the SpamBayes