[Mailman-Users] Chinese characters spam filter?

Mark Sapiro mark at msapiro.net
Wed Jul 6 11:56:29 EDT 2016


On 7/5/16 8:19 PM, Greg Lindsay via Mailman-Users wrote:
> Hi,
> 
> I am running Mailman v 2.1.20 & have been trying to filter out Chinese spam messages with no luck. A few typical subjects are below:
> 
> Subject: 为什么总让客户不满意,如何才能提升业绩
> Subject: 没1有1业1绩,怎1么1办?
> Subject: 带问题来,带方案走;如何缩短生产周期
> 
> Under Privacy options/Spam filters I have created header filters such as the three below. These aren't working. I was under the impression that [abcd] will discard all mail with a, b, c, or d in the subject line. I've tried including a hundred characters and a single character, but neither works.
> 
> ^Subject:\*[如何解决企业关务管理风险跨境电商国家政策综解读及创新模式非财务人员如何进行财务管理掌握最规范的薪酬设计方法企业相关法律风险控制及用工管理如何加强企业反舞弊及内审如何把准经销商的赢利模式?如何评估供应商及优化采购运作流程如何掌握车间管理的精髓]
> 
> ^Subject:\?utf-8\?B\?[56]
> 
> ^Subject:\*[发杜营全及正了先]
> 
> What am I doing wrong here? Is there something about the character encoding that prevents this filter from working?


There are a couple of things here. Your 3 regexps above have no space
after Subject:.  That notwithstanding, none of them will match what
you're trying to match. The second appears to be an attempt to match an
RFC2047 encoded word, but the encoded word would begin '=?utf-8?B?...'
and your regexp is missing the '='. I'm not sure what the first and
third are doing with the literal asterisk.

However, this is not the real problem. The real issue is that the
headers matched by the header_filter_rules regexps have been RFC2047
decoded and then encoded in Mailman's character set for the list's
preferred language.

If the list's preferred language is not one whose character set is utf-8
or some Chinese character set, this probably results in

Subject: ??????...

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list