[Mailman-Users] Importing large archives ... design limit hit, and possible bug
courtney at 4th.com
Sun Jun 2 16:15:16 CEST 2002
On Sunday 02 June 2002 10:00 am, LuKreme wrote:
> On Saturday, June 1, 2002, at 09:21 PM, Scott Courtney wrote:
> > On Saturday 01 June 2002 10:59 pm, LuKreme wrote:
> >> Out of curiosity, how did you split the mbox? I have about 1200 emails
> >> I
> >> want to add to the archive.
> > I wrote a little "awk" program to split them into 80-message chunks. Here
> > is
> > the source code:
> Ah.. awk. I hate awk. Good thing you wrote it. :)
I love awk! You probably would use <gasp!> <ugh!> .... Perl.
(I mean that humorously, by the way. Not trying to start a flamewar or
anything. Linux has lots of good tools, and it's great that each of us
can choose the ones we like best. <GRIN>)
> All the emails got loaded (thanks!) but I'm still getting errors when it's
> trying to finish.
> Updating index files for archive [2002-June]
> Computing threaded index
> Updating HTML for article 52
> article file /Users/mailman/archives/private/list/2002-June/000052.html is
My suggestion now is to do the following:
1. Fix up the "From " --> "rom " errors, since that is a known, obvious, and
severe problem. You've probably already done that.
2. Read my later emails. I found a better way to deal with the archives at
my end, namely by fixing the data so that "arch" doesn't fall out due to
excessive errors. It appears that was the root cause of my problem -- bad
input, and "arch" not having enough error diagnostics inside. Once I added
some new error reports to "arch", I started getting answers that led me
to the problem.
3. Consider using my *other* awk program, goodheaders.awk, to filter a copy
of your data, then try the import as one single file. Steps for this:
cp archives/private/mylist.mbox/mylist.mbox mylist.mbox.original
./goodheaders.awk < mylist.mbox.original > mylist.mbox.filtered
cp mylist.mbox.filtered archives/private/mylist.mbox/mylist.mbox
rm -r archives/private/mylist/*
Do these things with your qrunner and cron tasks temporarily halted.
The "rm" command will zap all the old HTML files so you can rebuild
from scratch. It also zaps the stateful information from previous
This worked quite well for me. I'm now mostly done transferring my lists,
and the remainder is just mechanics, not troubleshooting. It was 0500 here
and I was ready to get some sleep. ;-)
Scott Courtney | "I don't mind Microsoft making money. I mind them
courtney at 4th.com | having a bad operating system." -- Linus Torvalds
http://www.4th.com/ | ("The Rebel Code," NY Times, 21 February 1999)
More information about the Mailman-Users