Re: [Mailman-Users] diacritics in from header with from_is_list set to munge

On 01/20/2016 04:23 PM, Stephen J. Turnbull wrote:
Mark Sapiro writes:
In any case, the From: I see is
From: valérie via List1 <list1@msapiro.net>
which is technically not valid because it contains a non-ascii character, but even so, the message with that exact From: header is accepted by Yahoo and delivered to my Yahoo address.
Really? It's not that you're looking at this in your MUA and it's being silently MIME-encoded and -decoded at the MTA/MUA boundary? ISTR that Yahoo! is quite sensitive to non-ASCII headers.
Really. In my initial test, I sent from mutt and mutt kindly RFC2047 encoded the name for me, so I switched to testing using Mailman's bin/inject to post the raw message from a file, and I looked at the raw message source of the message I received in my Yahoo inbox and the header is
From: valérie via List1 <list1@msapiro.net>
where é is the utf-8 encoding of e-acute.
One thing I've seen is this:
in MUA UI: From: auser@example.com (non-ASCII display name) on the wire: From: auser@example.com =?utf-8?Q?(non-ASCII display name)?=
which gets bounced at many sites because the MIME-word hides the comment delimiters from the receiver's parser, which typically rejects with "= not allowed here" or something like that. Is it possible that Mailman is doing something like that?
Yes, I see where that would be an issue, but as far as Mailman is concerned, this would only be an issue if the incoming post had that From: header.
With a message containing
From: auser@example.com =?utf-8?Q?(non-ASCII_display_name)?=
the code Mailman uses to parse the From: will return 'non-ASCII_display_name' as the display name and 'auser@example.com=?utf-8?Q??=' as the address. This is clearly wrong, but then the the encoded header is not RFC2047 compliant. RFC 2047, sec 5(2) is clear that an encoded-word in a comment does not include the parentheses. It defines comment as
comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"
Mailman would never create a header like that.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

[Aside: I sent my previous message from the wrong address, and it was rejected. I am *not* resending it, since Mark quoted everything at some point. Nobody except me lost mail, and I deserved to! :-]
Mark Sapiro writes:
On 01/20/2016 04:23 PM, Stephen J. Turnbull wrote:
Really? It's not that you're looking at this in your MUA and it's being silently MIME-encoded and -decoded at the MTA/MUA boundary?
Really.
OK, just checking.
One thing I've seen is this:
in MUA UI: From: auser@example.com (non-ASCII display name) on the wire: From: auser@example.com (non-ASCII display name)
Yes, I see where that would be an issue, but as far as Mailman is concerned, this would only be an issue if the incoming post had that From: header.
Mailman would never create a header like that.
Absent a bug in the email package. But yes, I'm suggesting exactly that there's a broken MUA out there sending something that doesn't parse correctly, and the email package is failing to respect the "In the face of ambiguity, refuse to guess" Zen.
We really need to see both the input and the output headers that Mailman sends and receives.
Steve

On Thu, Jan 21, 2016 at 11:44:26AM +0900, Stephen J. Turnbull wrote:
Absent a bug in the email package. But yes, I'm suggesting exactly that there's a broken MUA out there sending something that doesn't parse correctly, and the email package is failing to respect the "In the face of ambiguity, refuse to guess" Zen.
unfortunately i cannot check for the python version as i don't have commandline access to the server.
We really need to see both the input and the output headers that Mailman sends and receives.
i can only send the headers that i receive over the list and those of the messages that get bounced. i don't want to bother the whole list with debugging messages.
so the message of users getting bounced look like (abbreviated):
--===============8546344873151602248== Content-Type: message/rfc822 MIME-Version: 1.0
Delivered-To: mailman-mylist-bounces@some.server.org Return-Path: <> Received: from localhost (localhost [127.0.0.1]) (ftp://ftp.isi.edu/in-notes/rfc1894.txt) by some.server.org with dsn; Wed, 20 Jan 2016 16:27:08 +0100 id 000000000000002C.00000000569FA74C.00002C9F From: "server.org postmaster" <postmaster@server.org> To: mylist-bounces@lists.mydomain.org Subject: NOTICE: mail delivery status. Mime-Version: 1.0 Content-Type: multipart/report; report-type=delivery-status; boundary="=_courier_0" Content-Transfer-Encoding: 7bit Message-ID: <courier.00000000569FA74C.00002C9F@some.server.org> Date: Wed, 20 Jan 2016 16:27:08 +0100
This is a MIME-formatted message. If you see this text it means that your E-mail software does not support MIME-formatted messages.
--=_courier_0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii
This is a delivery status notification from some.server.org, running the Courier mail server, version 0.75.0.
The original message was received on Wed, 20 Jan 2016 16:26:48 +0100 from some.server.org ([::1])
UNDELIVERABLE MAIL
Your message to the following recipients cannot be delivered:
<xxxxxxxxxxxxx@yahoo.fr>: mx-eu.mail.am0.yahoodns.net [188.125.69.79]:
DATA <<< 554 Message not allowed - [299]
[...]
The original message follows as a separate attachment.
Received: from some.server.org ([::1]) by some.server.org with ESMTP; Wed, 20 Jan 2016 16:26:48 +0100 id 0000000000000066.00000000569FA738.000029C1 Delivered-To: mailman-mylist@some.server.org Old-Return-Path: <valerie@mydomain.org> MIME-Version: 1.0 Date: Wed, 20 Jan 2016 16:26:44 +0100 To: mylist@lists.mydomain.org In-Reply-To: <7e498749e02d18656fd14393b90cdc38@mydomain.org> References: <7e498749e02d18656fd14393b90cdc38@mydomain.org> Message-ID: <a74fc9f373a741532c5f3d25404435ba@mydomain.org> X-Sender: valerie@mydomain.org Subject: [mylist] some subject X-BeenThere: mylist@lists.mydomain.org X-Mailman-Version: 2.1.20 Precedence: list From: =?utf-8?q?Val=C3=A9rie/Something_via_mylist_=3Cmyli?=, =?utf-8?b?ZW5AbGlzdHMubXRtZWRpYS5vcmc+?= Reply-To: valerie@mydomain.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Errors-To: mylist-bounces@lists.mydomain.org Sender: "mylist" <mylist-bounces@lists.mydomain.org>
so the message on the list finally looks like this:
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Return-Path: <mylist-bounces@lists.mydomain.org>
Delivered-To: mailman-mylist@some.server.org
Old-Return-Path: <valerie@mydomain.org>
MIME-Version: 1.0
Date: Wed, 20 Jan 2016 16:26:44 +0100
To: mylist@lists.mydomain.org
In-Reply-To: <7e498749e02d18656fd14393b90cdc38@mydomain.org>
References: <7e498749e02d18656fd14393b90cdc38@mydomain.org>
Message-ID: <a74fc9f373a741532c5f3d25404435ba@mydomain.org>
X-Sender: valerie@mydomain.org
Subject: [mylist] some subject
X-BeenThere: mylist@lists.mydomain.org
X-Mailman-Version: 2.1.20
Precedence: list
From:
=?utf-8?q?Val=C3=A9rie/Something_via_mylist_=3Cmyli?=@some.server.org,
=?utf-8?b?ZW5AbGlzdHMubXRtZWRpYS5vcmc+?=@some.server.org
Reply-To: valerie@mydomain.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Errors-To: mylist-bounces@lists.mydomain.org
Sender: "mylist" <mylist-bounces@lists.mydomain.org>

gabriel writes:
On Thu, Jan 21, 2016 at 11:44:26AM +0900, Stephen J. Turnbull wrote:
Absent a bug in the email package. But yes, I'm suggesting exactly that there's a broken MUA out there sending something that doesn't parse correctly, and the email package is failing to respect the "In the face of ambiguity, refuse to guess" Zen.
unfortunately i cannot check for the python version as i don't have commandline access to the server.
Ah, that probably doesn't matter. That remark was really directed to Mark.
so the message of users getting bounced look like (abbreviated):
This is a delivery status notification from some.server.org, running the Courier mail server, version 0.75.0.
FYI, bounce messages may or may not be useful, as some bounce programs do mess with the mail they forward. I know you probably can't do anything about this, this is the best you can do.
From: =?utf-8?q?Val=C3=A9rie/Something_via_mylist_=3Cmyli?=, =?utf-8?b?ZW5AbGlzdHMubXRtZWRpYS5vcmc+?=
Sender: "mylist" <mylist-bounces@lists.mydomain.org>
So this has already been through Mailman. We really really need to see the mail as it was *before* Mailman handled it (possibly in the mbox file in the archive, if you have it).
And then you've redacted stuff, and that may matter. If you don't want to send unredacted headers to a list with public archives, we understand, but in that case you can and should send them to Mark (and possibly me, but Mark is the real expert if you really want to send it to the fewest people) privately.
I don't think this is a Mailman bug. Mailman would not choose to send using two different transfer encodings (Q in the first line, B in the second). So I suspect Mailman is just forwarding the garbage it receives, or something downstream of Mailman is doing it.

On Fri, Jan 22, 2016 at 01:06:29AM +0900, Stephen J. Turnbull wrote:
From: =?utf-8?q?Val=C3=A9rie/Something_via_mylist_=3Cmyli?=, =?utf-8?b?ZW5AbGlzdHMubXRtZWRpYS5vcmc+?=
Sender: "mylist" <mylist-bounces@lists.mydomain.org>
So this has already been through Mailman. We really really need to see the mail as it was *before* Mailman handled it (possibly in the mbox file in the archive, if you have it).
That may not be so easy. in fact there is no archive of that list, but haven't archived messages not been processed by mailman?
luckily exactly this message has been posted on one other list i have access to:
Return-Path: <otherlist-bounces@someotherserver.org> X-Original-To: otherlist@someotherserver.org Delivered-To: otherlist@someotherserver.org Received: from onedomain.de (mail.onedomain.de [IPv6:2a01:4f8:140:4063::6]) by someotherserver.org (Postfix) with ESMTP id DA99150142 for <otherlist@someotherserver.org>; Wed, 20 Jan 2016 16:24:07 +0100 (CET) Received: from eggmann.mydomain.org (eggmann.mydomain.org [178.63.68.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by onedomain.de (Postfix) with ESMTPS id E51E740036 for <otherlist@onedomain.de>; Wed, 20 Jan 2016 16:24:07 +0100 (CET) Received: from roundcube.mydomain.org ([::1]) by eggmann.mydomain.org with ESMTP; Wed, 20 Jan 2016 16:24:05 +0100 id 000000000002002A.00000000569FA695.00001D40 MIME-Version: 1.0 Date: Wed, 20 Jan 2016 16:24:05 +0100 From: valerie@mydomain.org To: otherlist <otherlist@onedomain.de> Message-ID: <7e498749e02d18656fd14393b90cdc38@mydomain.org> X-Sender: valerie@mydomain.org Subject: [otherlist] some subject X-BeenThere: otherlist@someotherserver.org X-Mailman-Version: 2.1.15 Precedence: list Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Errors-To: otherlist-bounces@someotherserver.org Sender: "otherlist" <otherlist-bounces@someotherserver.org>
what is grabbing my attention here is that there is no text in the From: field, only the blank email address. digging little further reveals that the text that gets merged originates from the members name field with which that address is associated in mailman.
And then you've redacted stuff, and that may matter. If you don't want to send unredacted headers to a list with public archives, we understand, but in that case you can and should send them to Mark (and possibly me, but Mark is the real expert if you really want to send it to the fewest people) privately.
thank you very much. i appreciate it. i will send the originals to you when this email doesn't help.
I don't think this is a Mailman bug. Mailman would not choose to send using two different transfer encodings (Q in the first line, B in the second). So I suspect Mailman is just forwarding the garbage it receives, or something downstream of Mailman is doing it.

On 01/21/2016 09:27 AM, gabriel wrote:
That may not be so easy. in fact there is no archive of that list, but haven't archived messages not been processed by mailman?
OK. I sent my last reply before I saw this.
So there are no archives from which to retrieve what I asked for. I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman, but I'm sure I know what's happening.
luckily exactly this message has been posted on one other list i have access to: ... what is grabbing my attention here is that there is no text in the From: field, only the blank email address. digging little further reveals that the text that gets merged originates from the members name field with which that address is associated in mailman.
Yes, Mailman does that.
Here's what I think is happening.
Mailman is munging the From: header from
From: valerie@mydomain.org
to
From: valérie via List <list@example.com>
where valérie comes from the member's name field in the list membership. This is done to try to identify the member without actually including an email address in the display name as that is said to be a red flag to some ISPs.
Mailman should really RFC 2047 encode the resultant display name or at least the non-ascii part of it, but it doesn't. I accept this as a Mailman bug and will work on fixing it.
Anyway, Mailman sends the message with that non-ascii From: header and the outgoing MTA attempts to fix it and in the process makes it even worse.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Thu, Jan 21, 2016 at 10:15:33AM -0800, Mark Sapiro wrote:
So there are no archives from which to retrieve what I asked for. I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman, but I'm sure I know what's happening.
i'm sorry there's really no easy way for me to get the raw headers before they are processed by mailman. but i guess you've already identified the problem correctly anyway.
Mailman should really RFC 2047 encode the resultant display name or at least the non-ascii part of it, but it doesn't. I accept this as a Mailman bug and will work on fixing it.
great, thank you and steven very much for your help on this issue. i'm looking forward to see it fixed.
cheers, gabriel

On 01/21/2016 10:40 AM, gabriel wrote:
On Thu, Jan 21, 2016 at 10:15:33AM -0800, Mark Sapiro wrote:
So there are no archives from which to retrieve what I asked for. I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman, but I'm sure I know what's happening.
i'm sorry there's really no easy way for me to get the raw headers before they are processed by mailman. but i guess you've already identified the problem correctly anyway.
I said "I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman" I.e., the message FROM Mailman, not the message TO Mailman.
The problem is really an MTA bug just triggered by a Mailman bug. I'd like to see that message so I can see what MTA(s) it passed through after Mailman and possibly file a bug report with the MTA.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Thu, Jan 21, 2016 at 10:50:55AM -0800, Mark Sapiro wrote:
On 01/21/2016 10:40 AM, gabriel wrote:
On Thu, Jan 21, 2016 at 10:15:33AM -0800, Mark Sapiro wrote:
So there are no archives from which to retrieve what I asked for. I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman, but I'm sure I know what's happening.
i'm sorry there's really no easy way for me to get the raw headers before they are processed by mailman. but i guess you've already identified the problem correctly anyway.
I said "I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman" I.e., the message FROM Mailman, not the message TO Mailman.
I don't quite understand. Mailman is configured to munge the From header. When
a message passes Mailman, headers will be munged. how could i get a unmunged,
raw header message delivered by Mailman? If i set the list to from_is_list =
off all my users will get bounced by yahoo. do you want me to enable archives
and wait for another incident?
The problem is really an MTA bug just triggered by a Mailman bug. I'd like to see that message so I can see what MTA(s) it passed through after Mailman and possibly file a bug report with the MTA.
yes sure that makes sense, too.

On 01/21/2016 11:08 AM, gabriel wrote:
On Thu, Jan 21, 2016 at 10:50:55AM -0800, Mark Sapiro wrote:
I said "I'm still interested in seeing the unmunged, raw headers from the message delivered by Mailman" I.e., the message FROM Mailman, not the message TO Mailman.
I don't quite understand. Mailman is configured to munge the From header. When
a message passes Mailman, headers will be munged. how could i get a unmunged, raw header message delivered by Mailman?
Sorry for the confusion.
By unmunged, I'm not referring to the munging done by Mailman's Munge From. I'm referring to the munging done by you when you replace actual names, list names and domains by things like Something, mylist and mydomain.
What I would like to see is the raw message headers as received by you from Mailman without alteration or editing by you. If you don't want to post that to the list, please send it off list to me.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 01/21/2016 10:15 AM, Mark Sapiro wrote:
Here's what I think is happening.
Mailman is munging the From: header from
From: valerie@mydomain.org
to
From: valérie via List <list@example.com>
where valérie comes from the member's name field in the list membership. This is done to try to identify the member without actually including an email address in the display name as that is said to be a red flag to some ISPs.
Mailman should really RFC 2047 encode the resultant display name or at least the non-ascii part of it, but it doesn't. I accept this as a Mailman bug and will work on fixing it.
Anyway, Mailman sends the message with that non-ascii From: header and the outgoing MTA attempts to fix it and in the process makes it even worse.
Actually, further testing of this scenario shows that Mailman is likely responsible for the mis-encoding as well. Once I realized the key was in providing the poster's real name from the list's membership, I was able to duplicate the issue including the mis-encoding.
So I don't need to see any further samples, and I have developed a fix. The basic fix is very simple. In Mailman/Handlers/Cookheaders.py around line 155 are the lines:
change_header('From',
formataddr(('%s via %s' % (realname, mlist.real_name),
mlist.GetListEmail())),
mlist, msg, msgdata)
Immediately before the change_header insert (indented the same 8 spaces)
realname = str(uheader(mlist, realname))
which will RFC 2047 encode the realname if it contains non-ascii.
This is now reported as <https://bugs.launchpad.net/mailman/+bug/1536816> and the fix will be committed soon.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
Actually, further testing of this scenario shows that Mailman is likely responsible for the mis-encoding as well.
To summarize and wrap up. It was possible for mailman to create a
From: User_name via list_name <list@example.com>
header with non-ascii in the User_name when creating the new From: for a message with Munge From or the From: for the outer mwrapper with Wrap Message. This would occur when the incoming From: had only an address and no display name so the poster's list membership real name would be used and that real name contained non-ascii. It could also occur if a post arrived containing an unencoded display name in From: if the sending MUA was non-compliant, but this scenario is unlikely.
Thanks to gabriel for reporting the problem so it could ultimately be found and fixed.
The underlying bug is that Mailman should RFC 2047 encode the User_name at that point, and now it does (bug at <https://bugs.launchpad.net/mailman/+bug/1536816>, fix at <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1597>[1]).
This was further exacerbated by the fact that Mailman's SMTPDirect.py module uses the Mailman Message as_string() method to flatten the message object to plain text for sending, and sometimes, but not always, depending on characteristics of the message itself this method would refold or rewrite certain headers and in so doing would 'see' the non-ascii in the From: header and RFC 2047 encode the entire header content without regard for that fact that it contained address specs that are not to be encoded.
Debugging this issue was complicated by the fact that the header encoding by as_string() didn't occur with every message and my initial test messages didn't trigger it.
[1] The only part of the fix at <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1597> that is necessary to fix this issue is the addition of
realname = str(uheader(mlist, realname))
The rest of it is for l10n of the "User_name via list_name" phrase.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 01/21/2016 08:06 AM, Stephen J. Turnbull wrote:
gabriel writes:
so the message of users getting bounced look like (abbreviated):
This is a delivery status notification from some.server.org, running the Courier mail server, version 0.75.0.
FYI, bounce messages may or may not be useful, as some bounce programs do mess with the mail they forward. I know you probably can't do anything about this, this is the best you can do.
Agreed. I'm not interested in the bounce at all.
From: =?utf-8?q?Val=C3=A9rie/Something_via_mylist_=3Cmyli?=, =?utf-8?b?ZW5AbGlzdHMubXRtZWRpYS5vcmc+?=
This is an absolute, non-compliant mess.
The first encoded word, if I ignore the comma which is non-compliant, decodes to "Valérie/Something via mylist <myli" and the second encoded word decodes to "st@lists.xxxxx.org>". Thus, if I put them together, I get "Valérie/Something via mylist <mylist@lists.xxxxx.org>"
(I've replaced the actual last bit of the list name list and part of the domain with st@lists.xxxxx since you seem to not want to reveal it, even though you have as anyone can decode the RFC2047 encoding.)
The comma at the end of the first line is wrong because of RFC2047, sec 5(1):
Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.
More importantly, RFC2047, sec 5(3) says in part:
- An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.
Sender: "mylist" <mylist-bounces@lists.mydomain.org>
So this has already been through Mailman. We really really need to see the mail as it was *before* Mailman handled it (possibly in the mbox file in the archive, if you have it).
And then you've redacted stuff, and that may matter. If you don't want to send unredacted headers to a list with public archives, we understand, but in that case you can and should send them to Mark (and possibly me, but Mark is the real expert if you really want to send it to the fewest people) privately.
What I would like to see, unmunged, sent directly to me off list if you don't want to post it, is listname.mbox/listname.mbox file[1] or if that's not possible, from the
- The complete, raw headers from the message as received from the list, and
- Either the complete raw headers of the message from the archive
archive "Downloadable version .txt (or .txt.gz) file.
I don't think this is a Mailman bug. Mailman would not choose to send using two different transfer encodings (Q in the first line, B in the second). So I suspect Mailman is just forwarding the garbage it receives, or something downstream of Mailman is doing it.
I'm certain Mailman did not create that encoded header. I suspect the outgoing MTA. This might in fact be precipitated by a Mailman bug; i.e., the fact I noted earlier in this thread that the header created by Mailman can contain a non-ascii character. This might be what triggers the outgoing MTA to arbitrarily encode the header without actually parsing it and encoding it correctly, but I'll know more after I see what I've asked for.
[1] You can get the listname.mbox/listname.mbox file via the web UI. There may be a link on the archive table of contents page, but usually there isn't. If there isn't a link, go to the private archive URL (even if the archive is public) - something like http://www.example.com/mailman/private/listname - and log in. Then retrieve http://www.example.com/mailman/private/listname.mbox/listname.mbox
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
gabriel
-
Mark Sapiro
-
Stephen J. Turnbull