[Mailman-Users] mmarch mbox-splitting weird...

IEM - network operating center (IOhannes m zmoelnig) noc at iem.at
Mon Mar 16 17:22:43 CET 2009


i recently upgraded my mailing list server from debian etch to lenny, 
which included an upgrade from mailman 2.1.9 to 2.1.11;

due to some local hackery this somewhat broke my mailing list archives 
(the hackery only makes monthly archives available as e.g. 2009-03 
rather than 2009-March or 2009-Maerz or whatever).
i fixed my hacks, but in order to get the archives right i ran
"mmarch --wipe ...", and found myself suprised that this did not produce 
the desired results: the new archives seemed to contain some more emails 
than the original ones, all of them having "No subject" and appearing in 
the current archive directory.

it turned out that these new emails where parts of old emails.

the problem seems to be within the parsing of the mbox file: at some (to 
me) arbitrary points, Mailbox.py would decide that the mail has finished 
and start a new one; since the new one had no proper header, it ended up 
as "No subject" (and no author information).

i did not get any error messages during building of the archives.
(else i would have thought of out-of-memory problems or similar)

i noticed that this "bug" (or whatever it is) might have been available 
for quite some time: after some searching of my original archives, i 
fould at least one similar case when i rebuild the entire archive in 
2006-03 (where part of an email from 2003 (or so) ended up in my 2006-03 
folder with "No subject")

any ideas?

my archives are rather big by now (i think); e.g. one list has about 
68000 emails archived; rebuilding the archives with the renumbering as 
found above somehow breaks the entire archive; fixing it manually is no 
real option :-(


