[Spambayes] Foreign language spam: bug or feature?

Tim Peters tim.one@comcast.net
Fri Oct 25 17:36:08 2002


[Tim]
> ...
> Unless someone has a strong objection, I expect to introduce a new option:
>
> """
> [Tokenizer]
> # If true, replace high-bit characters (ord(c) >= 128) and
> # control characters with question marks.  This allows
> # non-ASCII character strings to be identified with little
> # training and small database burden.  It's appropriate only
> # if your ham is plain 7-bit ASCII, or nearly so, so that
> # the mere presence of non-ASCII character strings is known
> # in advance to be a strong spam indicator.
> replace_nonascii_chars: False
> """

This has been added, and is False by default.  However, it's True by default
for users of the Outlook 2000 client, since I can't remember the last time
Mark or Sean asked me a question in Korean <wink>.