[ mailman-Bugs-1661574 ] arch corrupts archives, but only for the last month

Bugs item #1661574, was opened at 2007-02-16 06:54 Message generated for change (Comment added) made by msapiro You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1661574&group_id=103 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: command line scripts Group: 2.1 (stable)
Status: Pending Resolution: None Priority: 9 Private: No Submitted By: David A. Desrosiers (desrod) Assigned to: Nobody/Anonymous (nobody) Summary: arch corrupts archives, but only for the last month
Initial Comment: I think this problem has been reported before in previous versions, and its back again in 2.1.9. When I regenerate archives for our lists, if ANY message contains a '<' character in the body, Mailman splits it as a new message, and everything after that gets corrupted. This means if someone pastes some XML into the body of a message (which happens quite often on our lists) or some HTML, or the headers of an email, Mailman will break it, but *ONLY* for the latest month's messages, even if the message that started it, was months or years ago. If a message sent in April of 2003 includes an '<' as the first character anywhere in the body of the message, February 2007's archive will be corrupted. You can see the results of this over here: http://lists.plkr.org/pipermail/plucker-list/2007-February/thread.html And also here: http://lists.plkr.org/pipermail/plucker-dev/2007-February/thread.html The raw mbox files are fine, every message is intact. I don't see this problem on other lists I maintain, it only seems to affect lists where HTML or XML or mail headers are pasted into the body of the message. I'd call this grave, because its odd how it just dumps itself on the latest month's archive, when the latest month's messages don't even have the problem. ----------------------------------------------------------------------
Comment By: Mark Sapiro (msapiro) Date: 2007-04-02 09:20
Message: Logged In: YES user_id=1123998 Originator: NO No response after 6 weeks. I'm setting status to Pending which will automatically close in 2 more weeks. ---------------------------------------------------------------------- Comment By: Mark Sapiro (msapiro) Date: 2007-02-16 08:58 Message: Logged In: YES user_id=1123998 Originator: NO What am I looking for at <http://lists.plkr.org/pipermail/plucker-list/2007-February/thread.html>? It looks OK to me. <http://lists.plkr.org/pipermail/plucker-dev/2007-February/thread.html> returns a 404. There is an issue in that if the body of some message in the archives/private/<listname>.mbox/<listname>.mbox file (or whatever mbox is input to bin/arch) contains a line that begins with "From ", the archiver takes that line as an mbox message separator and the message is truncated at that point, and the rest of the message is seen as a new message without a date so it is archived with the current date. It sounds like that may be what you are seeing, but it has nothing to do with a '<' as the first character of a line. It has to do with 'unescaped' 'From ' lines in the bodies of messages. Mailman currently precedes any 'From ' at the beginning of a body line with a '>' making it '>From ' in the .mbox and avoiding the problem, but old .mbox files and .mbox files from other sources may have unescaped 'From ' lines. There is a bin/cleanarch script distributed with Mailman to help 'fix' old .mbox files with this problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1661574&group_id=103
participants (1)
-
SourceForge.net