[ mailman-Bugs-1661574 ] arch corrupts archives, but only for the last month

SourceForge.net noreply at sourceforge.net
Fri Feb 16 17:58:40 CET 2007

Bugs item #1661574, was opened at 2007-02-16 06:54
Message generated for change (Comment added) made by msapiro
You can respond by visiting: 

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: command line scripts
Group: 2.1 (stable)
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: David A. Desrosiers (desrod)
Assigned to: Nobody/Anonymous (nobody)
Summary: arch corrupts archives, but only for the last month

Initial Comment:
I think this problem has been reported before in previous versions, and its back again in 2.1.9. 

When I regenerate archives for our lists, if ANY message contains a '<' character in the body, Mailman splits it as a new message, and everything after that gets corrupted. 

This means if someone pastes some XML into the body of a message (which happens quite often on our lists) or some HTML, or the headers of an email, Mailman will break it, but *ONLY* for the latest month's messages, even if the message that started it, was months or years ago. 

If a message sent in April of 2003 includes an '<' as the first character anywhere in the body of the message, February 2007's archive will be corrupted. 

You can see the results of this over here: 


And also here: 


The raw mbox files are fine, every message is intact. 

I don't see this problem on other lists I maintain, it only seems to affect lists where HTML or XML or mail headers are pasted into the body of the message. 

I'd call this grave, because its odd how it just dumps itself on the latest month's archive, when the latest month's messages don't even have the problem. 


>Comment By: Mark Sapiro (msapiro)
Date: 2007-02-16 08:58

Logged In: YES 
Originator: NO

What am I looking for at
It looks OK to me.

returns a 404.

There is an issue in that if the body of some message in the
archives/private/<listname>.mbox/<listname>.mbox file (or whatever mbox is
input to bin/arch) contains a line that begins with "From ", the archiver
takes that line as an mbox message separator and the message is truncated
at that point, and the rest of the message is seen as a new message without
a date so it is archived with the current date.

It sounds like that may be what you are seeing, but it has nothing to do
with a '<' as the first character of a line. It has to do with 'unescaped'
'From ' lines in the bodies of messages.

Mailman currently precedes any 'From ' at the beginning of a body line
with a '>' making it '>From ' in the .mbox and avoiding the problem, but
old .mbox files and .mbox files from other sources may have unescaped 'From
' lines.

There is a bin/cleanarch script distributed with Mailman to help 'fix' old
.mbox files with this problem.


You can respond by visiting: 

More information about the Mailman-coders mailing list