[Mailman-Users] does arch interpret mbox files or read them literally?

Adam Lipson adaml at jbase.com
Wed Jun 4 15:10:18 CEST 2003


Thank you that is very interesting and well worth knowing.  So, if one has an mbox file not created by mailman that they wish to create an archive file they must strip out all encoding somehow and make it into a txt only version if I follow your posting correctly.  If this is the case then there are a few options as I see it for those of us that wish to convert an archive of historical emails from a previous list:

1) find some sort of parser that will scrub out the mime encoding
2) use an alternative program for the archives.  My guess is this is the best way to go and to use something like MHonArc that is designed for this and let the list server do a great job at being a list server and an archive management program do the archiving.

Please let me know if I did not get your message straight.


-----Original Message-----
From: Richard Barrett [mailto:r.barrett at openinfo.co.uk]
Sent: Wednesday, June 04, 2003 2:53 AM
To: Adam Lipson; Mailman-Users at python.org
Subject: Re: [Mailman-Users] does arch interpret mbox files or read them

At 00:17 04/06/2003, Adam Lipson wrote:
>When I run arch I get a different format than I do from pine or elm IE 
>does arch interpret the mbox file and post the messages in the archive as 
>formatted text or does it just litteraly look for the divisions of the 
>messages and post the messages into the archive?  The reason I ask is I 
>have been struggling to convert 4000+ messages from outlook, outlook 
>express, eudora, mozilla you name it to an mbox (and this I can do easily) 
>and then import that mbox into the archives or move the messages via 
>IMAP.  My unix mail programs read the mbox easily and the messages appear 
>properly formatted, but when arch parses them and posts them on the web 
>page everything looks like html encoded text ie

Talking MM 2.1.2 but same principle in MM 2.0.x, just some of the detail 

Assuming you have the normal default config variable of ARCHIVE_TO_MBOX = 2 
set then Mailman will append a verbatim copy of each post to a list to that 
list's mbox archive (in 
$prefix/archives/private/<listname>.mbox/listname>.mbox) and generate an 
HTML version of the post (in  a subdirectory 
of  $prefix/archives/private/<listname>).

If you have a bunch of historical stuff in mbox format then prepend it to 
the list's mbox archive before running $prefix/bin/arch with the --wipe 
option to generate the initial HTML archives, including the historical stuff.

The HTML archive pages are constructed using a template for which the 
English language default is $prefix/templates/en/article.html The raw email 
content (as appended to the mbox file) is massaged by MM's internal 
archiver to give an HTML page and this massaging discards most of the 
headers and may alos extract attachments and such.

You cannot fully reconstruct the raw email from the Mailman HTML archive 
version of an email; that is why it is default of ARCHIVE_TO_MBOX = 2 to 
have both mbox and HTML archives retained.

><font = soemthing> <size= somethign> text here </font> </size> or 
>something to that effect.
>is there a flag that I am missing?  I can't find it anywhere ./arch -h 
>tells me to just give the filename and man arch just tells me about my 
>system architecture.  Is there something I have done wrong?

There isn't a man page for MM's arch script; you are seeing the man page 
for the system's arch command.

$prefix/bin/arch wants a list name and a UNIX mailbox file to work off and 
it constructs/reconstructs the list's HTML archives. If you do not nominate 
the mbox file then arch looks for the lists' mbox archive file in the 
default location described above.

Is something missing? Depends on what you are you looking for.


Richard Barrett                                      http://www.openinfo.co.uk

More information about the Mailman-Users mailing list