[Mailman-Users] Chinese characters spam filter?
Greg.Lindsay at microsoft.com
Wed Jul 6 13:51:34 EDT 2016
Thanks for the reply. I was going off some examples I found, but should have known better than to use a \* in the regexp. This is likely what is causing the filters to fail. The encoding example was something I was trying based off another thread I found, but I've deleted this rule.
I assume the text box that is asking to input a "Spam Filter Regexp" will attempt to match all text in the header. Since all headers include the text "Subject:" and that is the area of the header that I want to filter, this is why "^Subject:" is specified. If I eliminate the literal asterisk and just change this to an asterisk, i.e.: "^Subject:*" that should take care of the space, right? Sometimes the mails come in with mixed Chinese and English characters, so if an English character is first in the subject and my filter specifies that it must be a space followed by a Chinese character, then the filter would fail to catch this...I think what is needed is this:
^Subject:*[list of all Chinese characters here]
I don't understand the use of an equals sign in the regexp. Isn't this implied?
From: Mailman-Users [mailto:mailman-users-bounces+greg.lindsay=microsoft.com at python.org] On Behalf Of Mark Sapiro
Sent: Wednesday, July 6, 2016 8:56 AM
To: mailman-users at python.org
Subject: Re: [Mailman-Users] Chinese characters spam filter?
On 7/5/16 8:19 PM, Greg Lindsay via Mailman-Users wrote:
> I am running Mailman v 2.1.20 & have been trying to filter out Chinese spam messages with no luck. A few typical subjects are below:
> Subject: 为什么总让客户不满意，如何才能提升业绩
> Subject: 没1有1业1绩，怎1么1办？
> Subject: 带问题来，带方案走；如何缩短生产周期
> Under Privacy options/Spam filters I have created header filters such as the three below. These aren't working. I was under the impression that [abcd] will discard all mail with a, b, c, or d in the subject line. I've tried including a hundred characters and a single character, but neither works.
> What am I doing wrong here? Is there something about the character encoding that prevents this filter from working?
There are a couple of things here. Your 3 regexps above have no space after Subject:. That notwithstanding, none of them will match what you're trying to match. The second appears to be an attempt to match an
RFC2047 encoded word, but the encoded word would begin '=?utf-8?B?...'
and your regexp is missing the '='. I'm not sure what the first and third are doing with the literal asterisk.
However, this is not the real problem. The real issue is that the headers matched by the header_filter_rules regexps have been RFC2047 decoded and then encoded in Mailman's character set for the list's preferred language.
If the list's preferred language is not one whose character set is utf-8 or some Chinese character set, this probably results in
Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-Users mailing list Mailman-Users at python.org https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
More information about the Mailman-Users