[Mailman-Users] Need help rebuilding archives...

Barry A. Warsaw barry at digicool.com
Wed Mar 28 17:49:15 CEST 2001


>>>>> "P" == Phydeaux  <reb at taco.com> writes:

    P> For those with any interest in this, the problem seems to be
    P> that Mailman sees the break between messages a bit differently
    P> than most mail packages. A simple "^N^NFrom " isn't enough to
    P> convince Mailman that a new message has started. Instead it
    P> also looks for the date/time on the "From" line. Inside
    P> Mailbox.py is this nifty chunk of code that appears to cause
    P> this behaviour:

    >> 'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
    >> r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'

So far, all correct, at least for Mailman 2.0.x.

    P> I have no idea what the actual RFC for mbox format states, but
    P> for now I have solved the mbox format part of the problem...

There is no RFC, just "standard" practices.  The best description of
the issue I've found is contained in this url:

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

Note that this is really a Python issue, since Mailman just uses the
mailbox module to split the mbox up.  I've actually refactored this
code in Python 2.1 to be more conformant with the note above and by
default Mailman 2.1 will use just 'From ' at the beginning of the line
to separate messages (making it exactly '\n\nFrom ' is harder given
the mailbox.py code, but the current implementation should be Good
Enough).

    P> MemoryError

Well, now this might be a different problem though.  Pipermail via
bin/arch slurps the entire archive into memory so if the archive is
big you could have this problem.  Have I mentioned that Pipermail
could use a good rewrite?  Volunteers are welcomed!  :)

-Barry




More information about the Mailman-Users mailing list