[Mailman-Users] privacy options, SPAM, regex

Mark Sapiro mark at msapiro.net
Thu Nov 27 21:20:48 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark Sapiro wrote:
> Helmut Schneider wrote:
>> Interesting, with "^subject:.*Declined.*"
>>
>> Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008
>>
>> matches while
>>
>> Subject: [Somelist] Declined:  Invitation to workshop on 13rd Dec. 2008
>>
>> does not. Huh?!
> 
> 
> It turns out that RFC 2047 encoded headers are not decoded before
> matching against the regexps. Is that the issue here? What do the raw
> headers look like?
> 
> I think that the headers should be decoded, but I wonder if people are
> currently working around this with regexps that match encoded headers
> and wouldn't match decoded headers.


I have developed a patch for SpamDetect.py which will decode RFC 2047
encoded headers. This is somewhat problematic because the decoded
headers will presumably contain non-ascii characters, and while the
character sets of the headers are known (and there can be different
headers or even different parts of a single header encoded in different
character sets), the character set of the regexps in header_filter_rules
is not known.

The patch creates a unicode object containing all the headers unfolded
and RFC 2047 decoded with one complete header per line and then encodes
it into the character set of the list's preferred_language, and this
result is what the regexps will search. As long as the regexps contain
only ascii and the raw headers contain no non-ascii characters, this
should give expected results. If the regexps contain non-ascii
characters or the headers contain non-ascii not RFC 2047 encoded,
results may be unexpected.

If in fact, the original issue is due to RFC 2047 encoded headers, try
the patch and let us know how it works.

- --
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFJLwEfVVuXXpU7hpMRArKTAKCiDYtwz3VENF8Qww1tEw3lUMzUnQCgoGNh
K8vySqy57Vn8w0EHpj6LeJM=
=0pk1
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SpamDetect.patch.txt
URL: <http://mail.python.org/pipermail/mailman-users/attachments/20081127/4332a243/attachment.txt>


More information about the Mailman-Users mailing list