[Mailman-Users] Chinese characters spam filter?

Wed Jul 6 13:51:34 EDT 2016

Thanks for the reply. I was going off some examples I found, but should have known better than to use a \* in the regexp. This is likely what is causing the filters to fail.  The encoding example was something I was trying based off another thread I found, but I've deleted this rule.

I assume the text box that is asking to input a "Spam Filter Regexp" will attempt to match all text in the header. Since all headers include the text "Subject:" and that is the area of the header that I want to filter, this is why "^Subject:" is specified. If I eliminate the literal asterisk and just change this to an asterisk, i.e.: "^Subject:*" that should take care of the space, right? Sometimes the mails come in with mixed Chinese and English characters, so if an English character is first in the subject and my filter specifies that it must be a space followed by a Chinese character, then the filter would fail to catch this...I think what is needed is this:

^Subject:*[list of all Chinese characters here]

I don't understand the use of an equals sign in the regexp. Isn't this implied?

Thanks,
-Greg

-----Original Message-----
From: Mailman-Users [mailto:mailman-users-bounces+greg.lindsay=microsoft.com at python.org] On Behalf Of Mark Sapiro
Sent: Wednesday, July 6, 2016 8:56 AM
To: mailman-users at python.org
Subject: Re: [Mailman-Users] Chinese characters spam filter?

On 7/5/16 8:19 PM, Greg Lindsay via Mailman-Users wrote:
> Hi,
> 
> I am running Mailman v 2.1.20 & have been trying to filter out Chinese spam messages with no luck. A few typical subjects are below:
> 
> Subject: 为什么总让客户不满意，如何才能提升业绩
> Subject: 没1有1业1绩，怎1么1办？
> Subject: 带问题来，带方案走；如何缩短生产周期
> 
> Under Privacy options/Spam filters I have created header filters such as the three below. These aren't working. I was under the impression that [abcd] will discard all mail with a, b, c, or d in the subject line. I've tried including a hundred characters and a single character, but neither works.
> 
> ^Subject:\*[如何解决企业关务管理风险跨境电商国家政策综解读及创新模式非财务人员如何进行财务管理掌握最规范的薪酬设计方法企业相关法
> 律风险控制及用工管理如何加强企业反舞弊及内审如何把准经销商的赢利模式？如何评估供应商及优化采购运作流程如何掌握车间管理的精髓]
> 
> ^Subject:\?utf-8\?B\?[56]
> 
> ^Subject:\*[发杜营全及正了先]
> 
> What am I doing wrong here? Is there something about the character encoding that prevents this filter from working?

There are a couple of things here. Your 3 regexps above have no space after Subject:.  That notwithstanding, none of them will match what you're trying to match. The second appears to be an attempt to match an
RFC2047 encoded word, but the encoded word would begin '=?utf-8?B?...'
and your regexp is missing the '='. I'm not sure what the first and third are doing with the literal asterisk.

However, this is not the real problem. The real issue is that the headers matched by the header_filter_rules regexps have been RFC2047 decoded and then encoded in Mailman's character set for the list's preferred language.

If the list's preferred language is not one whose character set is utf-8 or some Chinese character set, this probably results in

Subject: ??????...

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list Mailman-Users at python.org https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: https://mail.python.org/mailman/options/mailman-users/greg.lindsay%40microsoft.com