[Mailman-Users] privacy options, SPAM, regex

Helmut Schneider jumper99 at gmx.de
Fri Nov 28 16:24:11 CET 2008


From: "Mark Sapiro" <mark at msapiro.net>
>> Mark Sapiro wrote:
>>> Helmut Schneider wrote:
>>>> Interesting, with "^subject:.*Declined.*"
>>>>
>>>> Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008
>>>>
>>>> matches while
>>>>
>>>> Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008
>>>>
>>>> does not. Huh?!
>>>
>>>
>>> It turns out that RFC 2047 encoded headers are not decoded before
>>> matching against the regexps. Is that the issue here? What do the raw
>>> headers look like?
>>>
>>> I think that the headers should be decoded, but I wonder if people are
>>> currently working around this with regexps that match encoded headers
>>> and wouldn't match decoded headers.
>>
>>
>> I have developed a patch for SpamDetect.py which will decode RFC 2047
>> encoded headers. This is somewhat problematic because the decoded
>> headers will presumably contain non-ascii characters, and while the
>> character sets of the headers are known (and there can be different
>> headers or even different parts of a single header encoded in different
>> character sets), the character set of the regexps in header_filter_rules
>> is not known.
>>
>> The patch creates a unicode object containing all the headers unfolded
>> and RFC 2047 decoded with one complete header per line and then encodes
>> it into the character set of the list's preferred_language, and this
>> result is what the regexps will search. As long as the regexps contain
>> only ascii and the raw headers contain no non-ascii characters, this
>> should give expected results. If the regexps contain non-ascii
>> characters or the headers contain non-ascii not RFC 2047 encoded,
>> results may be unexpected.
>>
>> If in fact, the original issue is due to RFC 2047 encoded headers, try
>> the patch and let us know how it works.

As far as I can see this patch works great. As a positive side effect, is it 
possible that this patch also affects uncaught bounces? I recieve lots of 
uncaught bounces now where a SPAM-filter was required before the patch.

Thanks a lot, Helmut 



More information about the Mailman-Users mailing list