[Mailman-Users] Importing old archives into Mailman
larry at qhpress.org
Fri May 10 17:27:15 CEST 2013
This is not a request for help but a report of experience in case
someone else finds it helpful.
I recently migrated some old mailing lists into Mailman. They had
previously run on different software (my own), and at first I assumed
I'd need to keep two sets of archives, putting the old ones on my
regular website (not the "lists." subdomain created by Mailman).
Then I saw in the FAQ that it was possible to edit list archives. The
emphasis there was on deleting posts, but I thought, if this works for
deleting posts it should also work for adding them.
Fortunately my old archives were already in mbox format. Or rather,
almost in mbox format. The old incarnation of my lists had been on a
server where I had a low usage quota, so I had been downloading all
archives over a year old and storing them on my home computer. In doing
so, I had passed them through a word processor macro to do some minimal
cleanup, which was chiefly to remove the ">" that mbox files put in
front of body lines beginning with "From " ("From the historian's
viewpoint," one subscriber wrote).
Undoing that change was easy enough, but what I didn't notice was that
word wrap had gotten imposed on some very long header lines (such as
"DomainKey-Signatures:"). This damaged the headers and made them appear
to end sooner, with some of their data falling through into the message
Usually, when this happened, the "Date:" line would be in the part that
fell through. Mailman seems to rely on this line when sorting posts by
date (it does _not_ rely on the physical order of messages in the mbox
file). In the absence of a "Date:" line in the header, Mailman seems to
use the current time (when it is indexing the archive).
To fix this I had to go back through the imported mbox files and clean
up the headers. Since I was doing this in vi over an SSH connection and
couldn't see clearly whether there was a newline character or only a
line that was too long for the screen, I decided the safest method was
just to delete all those overlong headers. They shouldn't be needed in
the archive anyway. (The "Received:" and "Delivered-To:" lines had long
since been removed by my program, when it saved out a week's files and
started a new archive.)
I also found some "Date:" lines that had been mistaken from the
beginning. One of my subscribers wrote that he had just switched to a
Mac in order to clear a Windows-based virus out of his mailbox. Somehow
his Macintosh had its system date set to August 27, 1956! Mailman made
this the first post on the list, followed by a silence of over 40 years.
I went back and corrected the date as well as I could and then indexed
the archive all over again.
Moral: You can import old mbox files to a Mailman archive, but be sure
to clean up the headers before you generate the index.
larry at qhpress.org
More information about the Mailman-Users