[Mailman-Users] Chinese characters spam filter?

Wed Jul 13 07:38:55 EDT 2016

On 07/13/16 13:15, I wrote:
> On 07/13/16 03:47, Mark Sapiro wrote:
>> On 07/12/2016 12:03 AM, Stephen J. Turnbull wrote:
>>> At an earlier stage, you could also just do a trial re-encoding with
>>> the list preferred codec, set errors = 'strict', catch the Exception,
>>> and re-raise as a Hold (or Discard, according to per-list policy).
>>> (Then discard the output.)  I would prefer this solution, I think, as
>>> creating regexps turns out to be an issue for many list owners.
>>>
>>> People would have to learn not to use emoji in headers, of course, or
>>> suffer moderation delays or even discards.
>>
>> I think this will have too many undesired effects. Not just emoji, but
>> accented latin or CJK characters, etc. in display names would I think be
>> real problems, even on English language lists.
> 
> I suggest to use variable to select handler from 'replace' (for backword
> compatibility), 'xmlcharrefreplace', or 'backslashreplace'
or 'strict' with catching Exception
>                                                           in mm_cfg.py.
 
> I think it is better to hold string attributes of mm_cfg and mlist class
> as Unicode than site_language code or list's preferred language code
> encoded (but I know it is so trouble to do so).
And then on pattern matching on message pipeline is done with Unicode
rather than list's prefered language.
-- 
Yasuhito FUTATSUKI <futatuki at poem.co.jp>