gmail marks mailman confirmation mail as spam...

Hi, maybe you have some recipe for making gmail treat confirmation mails as non-spam? It just throws mail "confirm e8492f19d7c336341050..".
Kārlis Repsons

Kālis Repsons wrote:
Confirmations are sent with
Precedence: bulk
which may be part of the problem, but I just tested a confirmation to a gmail.com address and it went to the inbox. As far as I know, I have no special spam whitelisting in effect on this gmail account.
The one thing that might be different is the server I sent this from has
VERP_CONFIRMATIONS = Yes
in mm_cfg.py which changes the subject from
confirm 6e4cfe0ab337729574b1a643a231569ef0ef59ab
to
Your confirmation is required to join the LISTNAME mailing list
and the From: from
LISTNAME-request@example.com
to
LISTNAME-confirm+6e4cfe0ab337729574b1a643a231569ef0ef59ab@example.com
So you might try setting VERP_CONFIRMATIONS = Yes if your MTA can properly deliver to an address such as above. That may help.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Friday 12 June 2009 19:45:25 you wrote:
And there is one more thing bothering me: look there: http://www.trikata.com/pipermail/test/2009-June/thread.html same word "nogādāt" was posted in all of the cases, where those terrible characters appear! Maybe you know whats wrong?
Kārlis Repsons

Kārlis Repsons wrote:
There are various things that could be different. My server publishes SPF records. That may make a difference. My server may have a better reputation with Google/gmail than yours. See the FAQ at <http://wiki.list.org/x/4oA9>. This is something you'll have to pursue with Google/gmail.
The string "=?utf-8?q?_nog=C4=81d=C4=81t?=" is an RFC2047 encoding of the string " nogādāt"
The actual raw header in your archive (at least for one of these) contains three RFC 2047 encoded pieces. The first is "=?utf-8?q?skatamies=2C_cik_ilgi_google_m=C4=93=C4=A3ina_=3D?=" and decodes to "skatamies, cik ilgi google mēģina =". The second is "=?utf-8?b?P3V0Zi04P3E/X25vZz1DND04MWQ9QzQ9ODF0Pz0sIGthZCBuZXZhci4u?=" and decodes to "?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar..". The last is "=?utf-8?q?=2E?=" and decodes to "."
If I had to guess, I'd say that the original subject got mis-folded by something and the initial "=" of "=?utf-8?q?_nog=C4=81d=C4=81t?=" got separated from the rest by a line continuation, and then the remaining "?utf-8?q?_nog=C4=81d=C4=81t?=" was treated as text rather than an endoded string
The problem may be with the MUA that composed the mail or it may be with Mailman's adding the subject_prefix. I think I'd need to see the raw message as sent to the list to know for sure.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
Kārlis forwarded an email to me off list. It's salient feature is the subject header
Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?= =?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar...
which is wrapped here but was all one line in the original. I have verified that there is a problem in the underlying Python email package with headers containing multiple RFC 2047 encoded words whether or not they are separated by non-encoded text.
It appears the only the first encoded word is properly decoded resulting in garbled headers in the archive and digests and in messages too if the subject is prefixed. I will follow up when I know more.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Kārlis Repsons wrote:
On Sunday 14 June 2009 17:12:22 you wrote:
Actually, the problem is not multiple encoded words. It is the fact that Python's email.header.decode_header() function doesn't recognize an RFC 2047 encoded word as such if the trailing "?=" is not followed by whitespace or the end of the string - here it is followed by a ",".
I think this is a bug in decode_header(), but I won't have time to look further at this until tomorrow.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
I think there is a minor bug in decode_header() in that it won't recognize a RFC 2047 encoded word in a comment if the encoded word is not separated by whitespace from the ")" that terminates the comment. However, this is the only place where an encoded word need not be followed by whitespace or the end of the header.
The Subject: header above is non-compliant in two respects. It is too long. RFC 2047 section 2 says in part:
While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters.
However, decode_header will accept it anyway and do the right thing. The real problem is item (1) in section 5 of the RFC says in part:
Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.
The header above does not comply with this. Instead of being
Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?= =?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar...
(all on one line), it should be
Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?= =?utf-8?q?_nog=C4=81d=C4=81t,?= kad nevar...
I.e., it should be folded so no part is longer than 76 characters, but more importantly for this, the "," near the end should be part of the encoded word rather than following the "?=" with no intervening whitespace.
This is a problem with the MUA (mail client) that encoded the Subject: header in the first place.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro writes:
Indeed that's a bug. I gather that you're saying that this bug is not the cause of the OP's problem, though?
As it should, according to the Postel Principle. Anyway, IIRC the length limit is a SHOULD NOT, not a MUST NOT, right?
Agreed, but I think that by default[1] email should try to parse this header as the user intended it. It's not like encoded-words are that easy to confuse with intended text; it's unlikely that changing 'linear-white-space' above to 'linear-white-space or specials' would harm anyone.
This is a problem with the MUA (mail client) that encoded the Subject: header in the first place.
Agreed, but I think following the Postel Principle here is likely to do less harm than adhering strictly to the RFC.
That said, I'm not in a position to contribute code, and this is a pretty invasive change, so the user is unlikely to see a version of Mailman that handles this any time soon. They are likely to have more luck switching clients.
Footnotes: [1] Ie, there should be an option to be strict.

I am trying to move this thread to email-sig@python.org since the underlying issue is in the email package. Further, since as of Mailman 2.1.12, we no longer install a Mailman specific version of the email package, it really has to be addressed in the email package.
Stephen J. Turnbull wrote:
Correct.
The RFC (8|28|53)22 limits are MUST BE <= 998 and SHOULD BE <= 78. RFC 2047 seems to want to impose stricter limits on encoded words, but unfortunately does not use the defined terms MUST and SHOULD. Section 2 says in part:
An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used.
While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters.
so it is not clear whether these are 'recommendations' or 'requirements'. In any case, email.header.decode_header() is not enforcing any limits so we are being generous in what we accept in this respect.
I fully agree. There is a regexp (ecre) in email/header.py that ends with the lookahead assertion "(?=[ \t]|$)". Even in "strict mode", I think the lookahead needs to accept ")" as well as space and tab, but I think by default, it should just be removed.
I agree here too, and note that some MUAs (all three I tried including mutt and Thunderbird) decode the original header as intended.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Kālis Repsons wrote:
Confirmations are sent with
Precedence: bulk
which may be part of the problem, but I just tested a confirmation to a gmail.com address and it went to the inbox. As far as I know, I have no special spam whitelisting in effect on this gmail account.
The one thing that might be different is the server I sent this from has
VERP_CONFIRMATIONS = Yes
in mm_cfg.py which changes the subject from
confirm 6e4cfe0ab337729574b1a643a231569ef0ef59ab
to
Your confirmation is required to join the LISTNAME mailing list
and the From: from
LISTNAME-request@example.com
to
LISTNAME-confirm+6e4cfe0ab337729574b1a643a231569ef0ef59ab@example.com
So you might try setting VERP_CONFIRMATIONS = Yes if your MTA can properly deliver to an address such as above. That may help.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Friday 12 June 2009 19:45:25 you wrote:
And there is one more thing bothering me: look there: http://www.trikata.com/pipermail/test/2009-June/thread.html same word "nogādāt" was posted in all of the cases, where those terrible characters appear! Maybe you know whats wrong?
Kārlis Repsons

Kārlis Repsons wrote:
There are various things that could be different. My server publishes SPF records. That may make a difference. My server may have a better reputation with Google/gmail than yours. See the FAQ at <http://wiki.list.org/x/4oA9>. This is something you'll have to pursue with Google/gmail.
The string "=?utf-8?q?_nog=C4=81d=C4=81t?=" is an RFC2047 encoding of the string " nogādāt"
The actual raw header in your archive (at least for one of these) contains three RFC 2047 encoded pieces. The first is "=?utf-8?q?skatamies=2C_cik_ilgi_google_m=C4=93=C4=A3ina_=3D?=" and decodes to "skatamies, cik ilgi google mēģina =". The second is "=?utf-8?b?P3V0Zi04P3E/X25vZz1DND04MWQ9QzQ9ODF0Pz0sIGthZCBuZXZhci4u?=" and decodes to "?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar..". The last is "=?utf-8?q?=2E?=" and decodes to "."
If I had to guess, I'd say that the original subject got mis-folded by something and the initial "=" of "=?utf-8?q?_nog=C4=81d=C4=81t?=" got separated from the rest by a line continuation, and then the remaining "?utf-8?q?_nog=C4=81d=C4=81t?=" was treated as text rather than an endoded string
The problem may be with the MUA that composed the mail or it may be with Mailman's adding the subject_prefix. I think I'd need to see the raw message as sent to the list to know for sure.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
Kārlis forwarded an email to me off list. It's salient feature is the subject header
Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?= =?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar...
which is wrapped here but was all one line in the original. I have verified that there is a problem in the underlying Python email package with headers containing multiple RFC 2047 encoded words whether or not they are separated by non-encoded text.
It appears the only the first encoded word is properly decoded resulting in garbled headers in the archive and digests and in messages too if the subject is prefixed. I will follow up when I know more.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Kārlis Repsons wrote:
On Sunday 14 June 2009 17:12:22 you wrote:
Actually, the problem is not multiple encoded words. It is the fact that Python's email.header.decode_header() function doesn't recognize an RFC 2047 encoded word as such if the trailing "?=" is not followed by whitespace or the end of the string - here it is followed by a ",".
I think this is a bug in decode_header(), but I won't have time to look further at this until tomorrow.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
I think there is a minor bug in decode_header() in that it won't recognize a RFC 2047 encoded word in a comment if the encoded word is not separated by whitespace from the ")" that terminates the comment. However, this is the only place where an encoded word need not be followed by whitespace or the end of the header.
The Subject: header above is non-compliant in two respects. It is too long. RFC 2047 section 2 says in part:
While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters.
However, decode_header will accept it anyway and do the right thing. The real problem is item (1) in section 5 of the RFC says in part:
Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.
The header above does not comply with this. Instead of being
Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?= =?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar...
(all on one line), it should be
Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?= =?utf-8?q?_nog=C4=81d=C4=81t,?= kad nevar...
I.e., it should be folded so no part is longer than 76 characters, but more importantly for this, the "," near the end should be part of the encoded word rather than following the "?=" with no intervening whitespace.
This is a problem with the MUA (mail client) that encoded the Subject: header in the first place.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro writes:
Indeed that's a bug. I gather that you're saying that this bug is not the cause of the OP's problem, though?
As it should, according to the Postel Principle. Anyway, IIRC the length limit is a SHOULD NOT, not a MUST NOT, right?
Agreed, but I think that by default[1] email should try to parse this header as the user intended it. It's not like encoded-words are that easy to confuse with intended text; it's unlikely that changing 'linear-white-space' above to 'linear-white-space or specials' would harm anyone.
This is a problem with the MUA (mail client) that encoded the Subject: header in the first place.
Agreed, but I think following the Postel Principle here is likely to do less harm than adhering strictly to the RFC.
That said, I'm not in a position to contribute code, and this is a pretty invasive change, so the user is unlikely to see a version of Mailman that handles this any time soon. They are likely to have more luck switching clients.
Footnotes: [1] Ie, there should be an option to be strict.

I am trying to move this thread to email-sig@python.org since the underlying issue is in the email package. Further, since as of Mailman 2.1.12, we no longer install a Mailman specific version of the email package, it really has to be addressed in the email package.
Stephen J. Turnbull wrote:
Correct.
The RFC (8|28|53)22 limits are MUST BE <= 998 and SHOULD BE <= 78. RFC 2047 seems to want to impose stricter limits on encoded words, but unfortunately does not use the defined terms MUST and SHOULD. Section 2 says in part:
An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used.
While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters.
so it is not clear whether these are 'recommendations' or 'requirements'. In any case, email.header.decode_header() is not enforcing any limits so we are being generous in what we accept in this respect.
I fully agree. There is a regexp (ecre) in email/header.py that ends with the lookahead assertion "(?=[ \t]|$)". Even in "strict mode", I think the lookahead needs to accept ")" as well as space and tab, but I think by default, it should just be removed.
I agree here too, and note that some MUAs (all three I tried including mutt and Thunderbird) decode the original header as intended.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Kārlis Repsons
-
Mark Sapiro
-
Stephen J. Turnbull