[Mailman-Users] Importing old archives into Mailman

Larry Kuenning larry at qhpress.org
Fri May 10 17:27:15 CEST 2013


This is not a request for help but a report of experience in case 
someone else finds it helpful.

I recently migrated some old mailing lists into Mailman.  They had 
previously run on different software (my own), and at first I assumed 
I'd need to keep two sets of archives, putting the old ones on my 
regular website (not the "lists." subdomain created by Mailman).

Then I saw in the FAQ that it was possible to edit list archives.  The 
emphasis there was on deleting posts, but I thought, if this works for 
deleting posts it should also work for adding them.

Fortunately my old archives were already in mbox format.  Or rather, 
almost in mbox format.  The old incarnation of my lists had been on a 
server where I had a low usage quota, so I had been downloading all 
archives over a year old and storing them on my home computer.  In doing 
so, I had passed them through a word processor macro to do some minimal 
cleanup, which was chiefly to remove the ">" that mbox files put in 
front of body lines beginning with "From " ("From the historian's 
viewpoint," one subscriber wrote).

Undoing that change was easy enough, but what I didn't notice was that 
word wrap had gotten imposed on some very long header lines (such as 
"DomainKey-Signatures:").  This damaged the headers and made them appear 
to end sooner, with some of their data falling through into the message 
body.

Usually, when this happened, the "Date:" line would be in the part that 
fell through.  Mailman seems to rely on this line when sorting posts by 
date (it does _not_ rely on the physical order of messages in the mbox 
file).  In the absence of a "Date:" line in the header, Mailman seems to 
use the current time (when it is indexing the archive).

To fix this I had to go back through the imported mbox files and clean 
up the headers.  Since I was doing this in vi over an SSH connection and 
couldn't see clearly whether there was a newline character or only a 
line that was too long for the screen, I decided the safest method was 
just to delete all those overlong headers.  They shouldn't be needed in 
the archive anyway.  (The "Received:" and "Delivered-To:" lines had long 
since been removed by my program, when it saved out a week's files and 
started a new archive.)

I also found some "Date:" lines that had been mistaken from the 
beginning.  One of my subscribers wrote that he had just switched to a 
Mac in order to clear a Windows-based virus out of his mailbox.  Somehow 
his Macintosh had its system date set to August 27, 1956!  Mailman made 
this the first post on the list, followed by a silence of over 40 years. 
  I went back and corrected the date as well as I could and then indexed 
the archive all over again.

Moral:  You can import old mbox files to a Mailman archive, but be sure 
to clean up the headers before you generate the index.

-- 
Larry Kuenning
larry at qhpress.org


More information about the Mailman-Users mailing list