[Mailman-Users] This should not have happened

Mark Sapiro mark at msapiro.net
Sat May 8 23:38:45 CEST 2010


On 5/8/2010 1:05 PM, Lindsay Haisley wrote:
> 
> The poster used an "Approved" pseudo-header.  Mailman found the
> pseudo-header in the text/plain part, removed it, and approved the post
> for distribution.  However in the text/html portion, the pseudo-header
> was mucked up with markup and was apparently unrecognizable to Mailman.
> It shows up in the message source as:
> 
> <p style=3D"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial">Approved: =
> =A0Hon94Bar</p>
> 
> For rather obvious reasons, Mailman didn't find this rendition of the
> pseudo-header, but because it found the Approved pseudo-header in the
> text/plain portion, it distributed the message - with the administrator
> password clearly displayed to the subscriber list for everyone with an
> HTML-capable mail reader to see!  Now this (very technically challenged)
> customer has to change her list admin password and I have to work with
> her to insure that this won't happen again.
> 
> HTML-ized email is a real PITA, and we've had problems with the
> pseudo-header before.  It seems to me that if a submitted email has both
> a text/plain and a text/html part, Mailman should look _first_ for the
> pseudo-header in the text/html portion, and if it's not found there, the
> post should be rejected at that point even if the pseudo-header is
> clearly present in a text/plain part. These two sections are supposed to
> be identical as far as content goes, or at least we can expect Mailman
> to assume that they are.
> 
> How can this be prevented?  As far as I'm concerned, this is a bug.


It is a bug, <https://bugs.launchpad.net/mailman/+bug/266220>.

My comments in the code say

# MAS: Bug 1181161 - Now try all the text parts in case it's
# multipart/alternative with the approved line in HTML or other
# text part.  We make a pattern from the Approved line and delete
# it from all text/* parts in which we find it.  It would be
# better to just iterate forward, but email compatability for pre
# Python 2.2 returns a list, not a true iterator.
#
# This will process all the multipart/alternative parts in the
# message as well as all other text parts.  We shouldn't find the
# pattern outside the mp/a parts, but if we do, it is probably
# best to delete it anyway as it does contain the password.
#
# Make a pattern to delete.  We can't just delete a line because
# line of HTML or other fancy text may include additional message
# text.  This pattern works with HTML.  It may not work with rtf
# or whatever else is possible.


So the question is why does this fail in this case. The HTML part is
clearly QP encoded, but we decode that and it decodes to

<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial">Approved:
\xA0Hon94Bar</p>

Where the \xA0 is the hex representation of the actual character which
is a no-break space.

The issue is that the pattern constructed in this case is

  'Approved:(\s|&nbsp;)*Hon94Bar'

and the re.sub(pattern, '', lines) (where lines is the message body)
does not consider \xA0 to match \s.

This is clearly a deficiency in the code, but there are two underlying
issues:

1) the user double spaced between the Approved: and the password, and
2) the user's MUA encoded the two spaces as a space followed by a
no-break space for the HTML part but it represented the no-break space
as a raw character code instead of the HTML entity &nbsp;

Had either of the above conditions not been true, the Approved: password
would have been removed.

I will modify the code to add \xA0 to make the pattern
'Approved:(\xA0|\s|&nbsp;)*Hon94Bar' in this case, which will work for
this one and future ones like it, but I won't follow your suggestion to
check the HTML first. I think this is unworkable without implementing an
HTML rendering engine, and would likely be no different, at least in
some cases, from just not checking for the pseudo-header in the message
body at all.

Note that we have never guaranteed removal of the pseudo-header from
alternative parts, and if asked, I always recommend a true message
header for this purpose.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list