[Mailman-Users] Mailman archive messages(not rm, but install!)

Paul Tomblin ptomblin at xcski.com
Thu Dec 7 23:22:06 CET 2006

Quoting Alan McConnell (alan at patriot.net):
> Meanwhile, I am adminning(sp?), through my ISP, a new but quite active
> E-list.  But their mailman install is incomplete; they haven't put in
> Pipermail(about which I know _nothing_).  I'm saving all the messages --
> mbox format -- and have the hope that when the Pipermail archiving
> program is installed, I will be able to collect, collate, shuffle,
> and massage these messages and then ship them off to the new very
> skilful tech staff that my ISP is allegedly hiring, and they will
> be able to slip this collection adroitly into place.  And it will
> be as if archiving was always in place . . .
> Can this be done?  or am I dreaming wild dreams?

I've just spent two days manipulating a bunch of mbox files into archives.
Let me tell you how it goes:

1. Blow away the html archives.  You may prefer to use that arch command
we were just discussing, but I used "rm -rf

2. Stop mailman's qrunners using "/etc/init.d/mailman stop"

3. Run bin/arch on the huge mbox file.

4. Discover that bin/arch is consuming all the memory and swap on the
system, and your system has ground to a halt.

5. Kill bin/arch.  Wait for the system to recover the swap space.  At this
point, I should have rebooted because I think this is when my list
config.pck file got corrupted.  Restore the config.pck file from backup.

6. Discover an awk script in the mailman archives that will split the mbox
archive into managable chunks.  Fix it so that it splits them into 500
message chunks instead of the 80 message chunks it defaults to.

7. Run bin/arch on all the chunks one at a time.

8. Discover that the mbox file had a bunch of un-escaped "From " lines
that confused bin/arch and so you have a bunch of half-articles in today's
archive page that shouldn't be there.  Run bin/cleanarch to fix them, blow
away the html archives, and then resplit the mbox file and run bin/arch on
the splits.

9. Discover that in early 2000 some members of your mailing list were
using a MUA that set year to "100" in the "Date: " header, which confused
bin/arch.  Fix those up with sed, then blow away the html archives, then
resplit the mbox and run bin/arch on the splits.

10. Discover a couple of "From " lines that bin/cleanarch didn't fix
because somebody was quoting the mail headers of another message.  Fix
them with sed, then blow away the html archives, then resplit the mbox and
run bin/arch on the splits.

11. Discover you missed a "From " line in one message, say "to hell with
it", restart mailman, and go to bed.

Paul Tomblin <ptomblin at xcski.com> http://blog.xcski.com/
"The means of defense against foreign danger historically have become the
instruments of tyranny at home." - James Madison

More information about the Mailman-Users mailing list