Mailman introducing spurious References: or In-Reply-To: headers?

hi mailman folks--
over on dns-privacy@ietf.org, one of the participants (Hosnieh Rafiee, cc'ed here) suggests that mailman appears to be introducing spurious References: and In-Reply-To: headers (see the attached message below for some of the discussion.
Can you confirm whether this is a function of mailman (e.g. to try to merge threads between people whose MUAs don't properly set In-Reply-To: or References:? If so, is there anything we can do to avoid it happening in the future?
I believe the ietf.org mailman installation is running 2.1.15, fwiw.
Regards,
--dkg

On 10/27/2014 06:59 AM, Daniel Kahn Gillmor wrote:
Mailman does not do this. Mailman, at least standard GNU Mailman, does not alter References: or In-Reply-To: headers in any way.
Mailman's pipermail archiver makes use of these headers in determining threading in its archives, but it neither adds nor removes them or anything from them and anything it does has no effect on delivered messages in any case.
The only thing in Mailman which manipulates these headers is the DMARC mitigation wrap message option introduced in Mailman 2.1.16 which merely copies these if they exist from the wrapped message to the outer wrapping message.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 10/27/2014 11:33 AM, Mark Sapiro wrote:
I subsequently noticed in the message attached to the original post in this thread
On 10/27/2014 03:12 AM, Hosnieh Rafiee wrote: ...
As I wrote, Mailman makes use of In-Reply-To: and References: headers in determining threading in pipermail archives. It also does some Subject: header matching to augment threading decisions although not nearly as cavalier as "any similar words to the subject line of the previous thread", so unrelated posts can be threaded together in the archives, but this most often if not always occurs when something is posted as a reply to an unrelated post and thus is sent to Mailman with In-Reply-To: and/or References: referring to the unrelated thread.
In any case, as I said previously, Mailman does not ever add In-Reply-To: or References: headers to messages which it delivers.
If Hosnieh Rafiee's statement "I have started a completely new message. So the header of my message was quite new and not carry any information for the old thread." is correct and if his delivered post contained In-Reply-To: and References: headers referencing the old thread, I can only imagine that perhaps his own MUA was responsible in some way or possibly some non-Mailman process or Mailman modification is involved on the Mailman host that added them, but I'm certain that standard Mailman does not.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 10/27/14, 6:01 PM, Mark Sapiro wrote: pressed "Reply" to get the list address in the To: line, and then deleted everything, and is just assume that be deleting the message and changing the subject that "of course" the system should know it is a new message. I have seen this done many times by people who will claim they started a new message (and many time it still contain the old message because their system hid quoted message almost completely from them).
-- Richard Damon

Mark Sapiro writes:
On 10/27/2014 11:33 AM, Mark Sapiro wrote:
Thanks, Mark. You saved me a lot of words.
First I've heard of "misleading misinformation" (and I care a lot about threading; I would notice and follow up).
This appears to be a cut and paste of the archived mail from a browser, so it would be a Pipermail issue (note that the IETF's pipermail seems to be modified, though). Still, I don't see *any* problem there, and there are no threading header fields presented, so there's no way to diagnose.
Mark's description here is somewhat ambiguous.
First, to specifically describe header matching, it removes well-known prefixes (Re:, Fwd:, a few others) and list tags/serial numbers when enclosed in square brackets, then trims leading and trailing space. The result must match exactly.
Second, "augment" does not mean "add". Pipermail does *not* "add the similar subjects to the thread." What it does do is group threads with the same subject (after trimming as above) together, and then sort thread groups by date. Conceptually, each individual thread has a separate root. This behavior is strongly preferred by users precisely because the exact match described above is usually due to a user who cut and pasted headers or whose MUA doesn't add reference headers. It's in http://www.jwz.org/doc/threading.html by Jamie Zawinski, the author of Netscape's threading code, and I believe that algorithm was adopted by RFC 5256 for IMAP.
I suspect that this is what Hosnieh Rafiee is seeing: separate threads grouped by subject, and appearing to be a single thread because he expects strict sorting by date, and therefore the same-subject threads should appear together only by chance of very close dates.

Hi Mark,
I have tried even to forward and reply (change the sender to myself) of another message to myself. The header did not contain any reply to. Whatever happens is in IETF part and not the MUA in my side.
But in any case, I have started a new message. My message did not have any similarities to the other message except, it might be a word "privacy" in subject or content. Besides that there was no similarity.
Best, Hosnieh

On 10/27/2014 06:59 AM, Daniel Kahn Gillmor wrote:
Mailman does not do this. Mailman, at least standard GNU Mailman, does not alter References: or In-Reply-To: headers in any way.
Mailman's pipermail archiver makes use of these headers in determining threading in its archives, but it neither adds nor removes them or anything from them and anything it does has no effect on delivered messages in any case.
The only thing in Mailman which manipulates these headers is the DMARC mitigation wrap message option introduced in Mailman 2.1.16 which merely copies these if they exist from the wrapped message to the outer wrapping message.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 10/27/2014 11:33 AM, Mark Sapiro wrote:
I subsequently noticed in the message attached to the original post in this thread
On 10/27/2014 03:12 AM, Hosnieh Rafiee wrote: ...
As I wrote, Mailman makes use of In-Reply-To: and References: headers in determining threading in pipermail archives. It also does some Subject: header matching to augment threading decisions although not nearly as cavalier as "any similar words to the subject line of the previous thread", so unrelated posts can be threaded together in the archives, but this most often if not always occurs when something is posted as a reply to an unrelated post and thus is sent to Mailman with In-Reply-To: and/or References: referring to the unrelated thread.
In any case, as I said previously, Mailman does not ever add In-Reply-To: or References: headers to messages which it delivers.
If Hosnieh Rafiee's statement "I have started a completely new message. So the header of my message was quite new and not carry any information for the old thread." is correct and if his delivered post contained In-Reply-To: and References: headers referencing the old thread, I can only imagine that perhaps his own MUA was responsible in some way or possibly some non-Mailman process or Mailman modification is involved on the Mailman host that added them, but I'm certain that standard Mailman does not.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 10/27/14, 6:01 PM, Mark Sapiro wrote: pressed "Reply" to get the list address in the To: line, and then deleted everything, and is just assume that be deleting the message and changing the subject that "of course" the system should know it is a new message. I have seen this done many times by people who will claim they started a new message (and many time it still contain the old message because their system hid quoted message almost completely from them).
-- Richard Damon

Mark Sapiro writes:
On 10/27/2014 11:33 AM, Mark Sapiro wrote:
Thanks, Mark. You saved me a lot of words.
First I've heard of "misleading misinformation" (and I care a lot about threading; I would notice and follow up).
This appears to be a cut and paste of the archived mail from a browser, so it would be a Pipermail issue (note that the IETF's pipermail seems to be modified, though). Still, I don't see *any* problem there, and there are no threading header fields presented, so there's no way to diagnose.
Mark's description here is somewhat ambiguous.
First, to specifically describe header matching, it removes well-known prefixes (Re:, Fwd:, a few others) and list tags/serial numbers when enclosed in square brackets, then trims leading and trailing space. The result must match exactly.
Second, "augment" does not mean "add". Pipermail does *not* "add the similar subjects to the thread." What it does do is group threads with the same subject (after trimming as above) together, and then sort thread groups by date. Conceptually, each individual thread has a separate root. This behavior is strongly preferred by users precisely because the exact match described above is usually due to a user who cut and pasted headers or whose MUA doesn't add reference headers. It's in http://www.jwz.org/doc/threading.html by Jamie Zawinski, the author of Netscape's threading code, and I believe that algorithm was adopted by RFC 5256 for IMAP.
I suspect that this is what Hosnieh Rafiee is seeing: separate threads grouped by subject, and appearing to be a single thread because he expects strict sorting by date, and therefore the same-subject threads should appear together only by chance of very close dates.

Hi Mark,
I have tried even to forward and reply (change the sender to myself) of another message to myself. The header did not contain any reply to. Whatever happens is in IETF part and not the MUA in my side.
But in any case, I have started a new message. My message did not have any similarities to the other message except, it might be a word "privacy" in subject or content. Besides that there was no similarity.
Best, Hosnieh
participants (5)
-
Daniel Kahn Gillmor
-
Hosnieh Rafiee
-
Mark Sapiro
-
Richard Damon
-
Stephen J. Turnbull