[Mailman-Users] Blank Characters Removed from "Subject:" Line

Mark Sapiro msapiro at value.net
Thu Jun 28 03:36:24 CEST 2007


Barry Finkel wrote:

>I am running Mailman 2.1.9.  I have a list where one posting has a
>"Subject:" line:
>
>     Change in Procedure for Computers on list with possible Antivirus Problems 
>
>The next posting in the thread has:
>
>     Change in Procedure for Computers on list with possible AntivirusProblems 
>
>A subsequent posting (not treated in the list archives as the same
>thread) has
>
>     Change in Procedure for Computers on list with possibleAntivirusProblems 


First of all, threading in Mailman's pipermail archive is not based on
Subject: at all. It is based on In-Reply-To: and References: headers
so if a reply is not added to the thread, it is because the replier's
MUA didn't add an In-Reply-To: or References: header, or it added one
or both of these referencing an off-list post not in the archive.

The Subject: issue you observe has nothing to do with whether or not a
reply is properly threaded in the archive.


>The next posting in the thread has:
>
>     Change in Procedure for Computers on list withpossibleAntivirusProblems 
>
>The final posting in this thread has the same "Subject:" line as
>immediately above.  I am not subscribed to this list and cannot
>post to it, so I do not know if a subsequent posting to this thread
>will remove another blank character in the "Subject:" line.


It may or may not.


>What I see in the list mbox file are these lines (with line numbers):
>
>------
>184232 Subject: Change in Procedure for Computers on list with possible Antivirus
>184233         Problems
>------
>184331 Subject: RE: Change in Procedure for Computers on list with possible
>184332         AntivirusProblems
>------
>184456 Subject: Re: Change in Procedure for Computers on list with
>184457         possibleAntivirusProblems
>------
>184566 Subject: RE: Change in Procedure for Computers on list
>184567         withpossibleAntivirusProblems
>------
>184735 Subject: RE: Change in Procedure for Computers on list
>184736         withpossibleAntivirusProblems
>------
>
>Note that the original subject is split into two lines.
>
>What might be causing this?  Is this a problem with Mailman, or is it
>a problem with the sender's Mail User Agent (probably Outlook), or
>a problem with the sender's mail system:


All of the above, or at least the first two.


>As the mbox file has the blanks removed, I have to believe that
>it is not Mailman that is removing the blanks.


The basic issue revolves around the rules for folding and unfolding
long header lines. The original standard was RFC 822
<http://www.faqs.org/rfcs/rfc822.html>, sec 3.1.1. The current
recommendation is RFC 2822 <http://www.faqs.org/rfcs/rfc2822.html>,
sec 2.2.3.

While careful reading of these two standards shows they are almost the
same with respect to folding and exactly the same with respect to
unfolding, the RFC 822 rules can result in the insertion of extra
white space (the oposite of what you see here). Further, many MUAs and
other mail processing software (such as the Python email library used
by Mailman) don't follow the rules exactly, perhaps because in trying
to compensate for too much white space they remove too much. Also, the
rules really work best with structured headers where white space
occurs between syntactic fields, and not so well with free form text
headers like Subject:.

Aside: I just read Stephen's reply in which he says "The RFC implies
that the right thing to do in that case is to remove the CRLF only,
but some MUAs also remove a space." And, I add "or a tab". This
whitespace removal in unfolding is the crux of the issue.

Part of the problem is Mailman will unfold and refold the header in the
process of adding the subject_prefix. This process will lengthen the
header, perhaps causing it to be folded when it wasn't before or
folded in a different place. Also, Mailman tends to fold with
<CR><LF><TAB> and MUAs tend to remove the <TAB> in unfolding.

Also, If a subject is folded, and then the white space removed in
unfolding, that makes the joined 'word' long so the next time it will
tend to fold at the <SP> preceding the long 'word' and than that <SP>
can be lost in unfolding.

Mailman could behave in a completely RFC compliant manner (it doesn't),
and there would still be the problem because MUAs don't behave in a
completely compliant manner.

Also note that the actual removal of whitespace is done by the MUA, not
Mailman, but that doesn't let Mailman completely off the hook, because
in some cases, Mailman may replace <SP> with <TAB> and the MUA may be
more likely to remove <TAB>.

See the (multiple) threads with subject "Subject Lines Wrapped After
Commas, (Like This?)" starting at
<http://mail.python.org/pipermail/mailman-users/2007-May/057117.html>
for a different but related discussion.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list