Blank Characters Removed from "Subject:" Line
data:image/s3,"s3://crabby-images/1b154/1b154262c5802d53b9d1509158ab837408a39fe4" alt=""
I am running Mailman 2.1.9. I have a list where one posting has a "Subject:" line:
Change in Procedure for Computers on list with possible Antivirus Problems
The next posting in the thread has:
Change in Procedure for Computers on list with possible AntivirusProblems
A subsequent posting (not treated in the list archives as the same thread) has
Change in Procedure for Computers on list with possibleAntivirusProblems
The next posting in the thread has:
Change in Procedure for Computers on list withpossibleAntivirusProblems
The final posting in this thread has the same "Subject:" line as immediately above. I am not subscribed to this list and cannot post to it, so I do not know if a subsequent posting to this thread will remove another blank character in the "Subject:" line.
What I see in the list mbox file are these lines (with line numbers):
184232 Subject: Change in Procedure for Computers on list with possible Antivirus 184233 Problems
184331 Subject: RE: Change in Procedure for Computers on list with possible 184332 AntivirusProblems
184456 Subject: Re: Change in Procedure for Computers on list with 184457 possibleAntivirusProblems
184566 Subject: RE: Change in Procedure for Computers on list 184567 withpossibleAntivirusProblems
184735 Subject: RE: Change in Procedure for Computers on list 184736 withpossibleAntivirusProblems
Note that the original subject is split into two lines.
What might be causing this? Is this a problem with Mailman, or is it a problem with the sender's Mail User Agent (probably Outlook), or a problem with the sender's mail system:
X-MimeOLE: Produced By Microsoft Exchange V6.5
As the mbox file has the blanks removed, I have to believe that it is not Mailman that is removing the blanks.
Barry S. Finkel Computing and Information Systems Division Argonne National Laboratory Phone: +1 (630) 252-7277 9700 South Cass Avenue Facsimile:+1 (630) 252-4601 Building 222, Room D209 Internet: BSFinkel@anl.gov Argonne, IL 60439-4828 IBMMAIL: I1004994
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Barry Finkel writes:
I am running Mailman 2.1.9. I have a list where one posting has a "Subject:" line:
Change in Procedure for Computers on list with possible Antivirus Problems
The next posting in the thread has:
Change in Procedure for Computers on list with possible AntivirusProblems
What is happening, I guess, is that Mailman is folding that header to keep it within some number of characters, maybe 76 or so. RFC 2822 specifies that this may be done by inserting a linebreak (CRLF) before whitespace. The RFC implies that the right thing to do in that case is to remove the CRLF only, but some MUAs also remove a space. I suspect that is what is happening to this case.
Can you post a copy of the "raw" header as received by Mailman and as sent by Mailman?
data:image/s3,"s3://crabby-images/92078/920789fca9c5f85bcff835faa6ab7bec03f2f165" alt=""
Barry Finkel wrote:
I am running Mailman 2.1.9. I have a list where one posting has a "Subject:" line:
Change in Procedure for Computers on list with possible Antivirus Problems
The next posting in the thread has:
Change in Procedure for Computers on list with possible AntivirusProblems
A subsequent posting (not treated in the list archives as the same thread) has
Change in Procedure for Computers on list with possibleAntivirusProblems
First of all, threading in Mailman's pipermail archive is not based on Subject: at all. It is based on In-Reply-To: and References: headers so if a reply is not added to the thread, it is because the replier's MUA didn't add an In-Reply-To: or References: header, or it added one or both of these referencing an off-list post not in the archive.
The Subject: issue you observe has nothing to do with whether or not a reply is properly threaded in the archive.
The next posting in the thread has:
Change in Procedure for Computers on list withpossibleAntivirusProblems
The final posting in this thread has the same "Subject:" line as immediately above. I am not subscribed to this list and cannot post to it, so I do not know if a subsequent posting to this thread will remove another blank character in the "Subject:" line.
It may or may not.
What I see in the list mbox file are these lines (with line numbers):
184232 Subject: Change in Procedure for Computers on list with possible Antivirus 184233 Problems
184331 Subject: RE: Change in Procedure for Computers on list with possible 184332 AntivirusProblems
184456 Subject: Re: Change in Procedure for Computers on list with 184457 possibleAntivirusProblems
184566 Subject: RE: Change in Procedure for Computers on list 184567 withpossibleAntivirusProblems
184735 Subject: RE: Change in Procedure for Computers on list 184736 withpossibleAntivirusProblems
Note that the original subject is split into two lines.
What might be causing this? Is this a problem with Mailman, or is it a problem with the sender's Mail User Agent (probably Outlook), or a problem with the sender's mail system:
All of the above, or at least the first two.
As the mbox file has the blanks removed, I have to believe that it is not Mailman that is removing the blanks.
The basic issue revolves around the rules for folding and unfolding long header lines. The original standard was RFC 822 <http://www.faqs.org/rfcs/rfc822.html>, sec 3.1.1. The current recommendation is RFC 2822 <http://www.faqs.org/rfcs/rfc2822.html>, sec 2.2.3.
While careful reading of these two standards shows they are almost the same with respect to folding and exactly the same with respect to unfolding, the RFC 822 rules can result in the insertion of extra white space (the oposite of what you see here). Further, many MUAs and other mail processing software (such as the Python email library used by Mailman) don't follow the rules exactly, perhaps because in trying to compensate for too much white space they remove too much. Also, the rules really work best with structured headers where white space occurs between syntactic fields, and not so well with free form text headers like Subject:.
Aside: I just read Stephen's reply in which he says "The RFC implies that the right thing to do in that case is to remove the CRLF only, but some MUAs also remove a space." And, I add "or a tab". This whitespace removal in unfolding is the crux of the issue.
Part of the problem is Mailman will unfold and refold the header in the process of adding the subject_prefix. This process will lengthen the header, perhaps causing it to be folded when it wasn't before or folded in a different place. Also, Mailman tends to fold with <CR><LF><TAB> and MUAs tend to remove the <TAB> in unfolding.
Also, If a subject is folded, and then the white space removed in unfolding, that makes the joined 'word' long so the next time it will tend to fold at the <SP> preceding the long 'word' and than that <SP> can be lost in unfolding.
Mailman could behave in a completely RFC compliant manner (it doesn't), and there would still be the problem because MUAs don't behave in a completely compliant manner.
Also note that the actual removal of whitespace is done by the MUA, not Mailman, but that doesn't let Mailman completely off the hook, because in some cases, Mailman may replace <SP> with <TAB> and the MUA may be more likely to remove <TAB>.
See the (multiple) threads with subject "Subject Lines Wrapped After Commas, (Like This?)" starting at <http://mail.python.org/pipermail/mailman-users/2007-May/057117.html> for a different but related discussion.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Barry Finkel
-
Mark Sapiro
-
Stephen J. Turnbull