Chris Malme wrote:
That is correct, I did it manually (or rather, with a quick script I wrote), preceding each message text line that begins with a "From " with a ">". This enabled the mbox to be imported into the Mailman Archive without splitting messages as it did when I first tried it. However, those "From " lines that I manually escaped are showing clearly in the Archive *with* the escape character - i.e. as ">From".
That is the normal way of dealing with messages containing From_ in the message body. It's not just Mailman or pipermail, and it's problematic to unescape them for display, because while the escaping is normal, there is no standard for escaping/unescaping so when you see >From_ in a message, you don't know if it is an escaped From_, a quoted From_ or a literal >From_.
Furthermore any new emails to the list that have a newline/From in the message text are archiving incorrectly.
So the question is why is this happening with new messages? Again, what Mailman version is this?
Sorry, I should have stated earlier. This is Mailman version 2.1.9, which is the version that the folk who support my VPS (using Plesk) have rolled out.
The issue with unescaped From_ in the body causing archive corruption was fixed long before Mailman 2.1.9
To go to a more recent version is not impossible, but not trivial for me (a Linux VPS newbie), so I wanted to see if it was the solution before rolling my sleeves up.
You shouldn't need to. Mailman 2.1.9 should not have this problem.
Also note that escaping From_ by preceding it with '>' is the accepted way to deal with this. Many MUAs will do it before sending the message and MDAs will do it too before delivering a message. It is unusual to be able to pass a From_ through email from end to end without it being escaped to >From_ somewhere between source and destination.
That is what is puzzling me. I am able to send an email from Thunderbird through my VPS's mail server and see it go straight into the archive unescaped, splitting into multiple messages at every incidence of newline/From. I can also do it from Gmail.
It doesn't happen with every MUA/MTA/MDA, but for example, I just sent such a message from Tbird 3.0.3 via Exim on localhost to a mailbox on localhost, and the From_ was unescaped in Tbird's Sent folder, but it was escaped in the recipient mailbox.
I have a Test list set up that I am happy for anyone to access, if it might shed any light on the matter. The Test list does not have the imported archive, but it demonstrates the same behaviour regarding new messages.
I'm happy to post the list URL if that is appropriate.
I believe you, so I don't think I need to see the test list. The question is why isn't Mailman escaping the From_ when it archives and sends the message.
It actually relies on the Python email library to do this, but Mailman 2.1.9 should install its own version of the email package in Mailman's pythonlib/ directory, and this should always escape From_ lines when converting an email.Message.Message object to text. Why it doesn't is the question.
Also curious is that I think you said the problem occurs with "text\nFrom " in the body, but not with "text\n\nFrom ". If I understood that correctly, that is really strange.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan