[Mailman-Users] mmarch mbox-splitting weird...
IEM - network operating center (IOhannes m zmoelnig)
noc at iem.at
Mon Mar 16 17:22:43 CET 2009
i recently upgraded my mailing list server from debian etch to lenny,
which included an upgrade from mailman 2.1.9 to 2.1.11;
due to some local hackery this somewhat broke my mailing list archives
(the hackery only makes monthly archives available as e.g. 2009-03
rather than 2009-March or 2009-Maerz or whatever).
i fixed my hacks, but in order to get the archives right i ran
"mmarch --wipe ...", and found myself suprised that this did not produce
the desired results: the new archives seemed to contain some more emails
than the original ones, all of them having "No subject" and appearing in
the current archive directory.
it turned out that these new emails where parts of old emails.
the problem seems to be within the parsing of the mbox file: at some (to
me) arbitrary points, Mailbox.py would decide that the mail has finished
and start a new one; since the new one had no proper header, it ended up
as "No subject" (and no author information).
i did not get any error messages during building of the archives.
(else i would have thought of out-of-memory problems or similar)
i noticed that this "bug" (or whatever it is) might have been available
for quite some time: after some searching of my original archives, i
fould at least one similar case when i rebuild the entire archive in
2006-03 (where part of an email from 2003 (or so) ended up in my 2006-03
folder with "No subject")
my archives are rather big by now (i think); e.g. one list has about
68000 emails archived; rebuilding the archives with the renumbering as
found above somehow breaks the entire archive; fixing it manually is no
real option :-(
IEM - network operation center
mailto:noc at iem.at
More information about the Mailman-Users