[Mailman-Developers] Re: [Mailman-Users] Poking and prodding the archiver

Phil Stracchino alaric@babcom.com
Mon, 9 Jul 2001 20:15:18 -0700

On Mon, Jul 09, 2001 at 10:28:19PM -0400, Barry A. Warsaw wrote:
> [Note: this discussion is more appropriate for mailman-developers, so
> I've changed the Cc: -baw]

> >>>>> "PS" == Phil Stracchino <alaric@babcom.com> writes:
>     PS> I've looked at some length through the code for the archiver
>     PS> now, and although I still don't understand python, I've
>     PS> figured out enough of what the archiver is doing to see that
>     PS> it's apparently intentional that the path to mbox archives is
>     PS> .../mailman/archives/private/list.mbox/list.mbox.
> Yes, and this is for security reasons as explained in the comment in
> Archiver.py (see InitVars()).  The comment is slightly out-of-date in
> that the file under listname.mbox/ is also called listname.mbox.

Right.  I understand why all archives are stored under archives/private;
what I wasn't understanding was why the last pathname element was
duplicated, because until I'd (a) worked around it temporarily and
(b) seen mailman/bin/arch in action, I didn't understand that it was a
case of a listname.mbox directory containing a listname.mbox file.

>     PS> nor why it is that the archiver is written in such a way that
>     PS> it attempts to access this mbox archive directory with its
>     PS> duplicated final pathname element even when mbox archives are
>     PS> disabled, and fails if it doesn't exist.
> If this is true (and I haven't tested it), then it's most likely just
> old lurking bugs.  The archiver/Pipermail stuff is the most neglected
> part of the codebase.  People keep threatening to help rewrite it, but
> so far nothing's materialized, and I have little time or energy to
> devote to the Pipermail side.

Well, I'm trying to figure out the problem, but my Python-fu is small.  :)

>     PS> I find this behavior even more curious in light of the fact
>     PS> that newlist apparently creates archives/private/list.mbox
>     PS> when it sets up the list, but does not create the
>     PS> archives/private/list.mbox/list.mbox without the existence of
>     PS> which the archiver fails.
> Do you mean the archiver fails or that the web access to the archiver
> fails?  Certainly not the former (unless I misunderstand) because it
> works for me, and loads of other people.  It's a known buglet that the
> pipermail url doesn't work until the first message is posted to the list.

Both.  If the mbox file does not exist, the pipermail URL points to a
zero-length document, so web access to the archives fails; and no HTML
archive files are created, so there's no HTML archives to access in the
first place.  Only flat text archives are created.

>     PS> I've applied the following patch to my HyperArch.py file
>     PS> (patch also attached separately):

[Note:  Now that I understand what's happening and why, I have removed
 the patch.]

> [patch deleted]
>     PS> I don't know what impact this has on mbox archives, but for
>     PS> me, it makes the HTML archiver work.
> Hmm, odd.  What I think will break is private archives.  If you toggle
> an archive to private, I seem to remember that you can craft a url to
> trick the web server into vending an archive page for you directly,
> instead of forcing you to go through authentication with the
> private.py cgi.

Actually, on further examination, I think what it'll do is break mbox

>     PS> It's still a mystery to me why the archiver should
>     PS> even *care* whether or not the mbox archive directory exists,
>     PS> when mbox archives are disabled in the master configuration
>     PS> anyway.
> It probably shouldn't, but then Mailman probably shouldn't support
> ARCHIVE_TO_MBOX=0.  Archiving to the mbox is about as fast as it gets,
> since it is just a file append, and it's /incredibly/ handy to have
> that .mbox file around (even as large as it can get), in case you want
> to regenerate your archive, or you want to migrate to a different
> external archiver.

True, though it can be regenerated from the year-month.txt files created
by the web archiver.  (This is what I did in order to regenerate my back
archives with arch.)

Right now I'm doing a test to find out whether the archiver cares whether
archives/private/listname.mbox/listname.mbox is non-zero length, so long
as it actually exists.  From what I've been able to glean from the code, I
don't think it should care.  If this is the case, then a simple workaround
exists to the pipermail problems, which is to simply touch the file at the
time the list directories are created.

I've looked at newlist to see if I can see where this is done, and I think
I've traced it to Utils.MakeDirTree(), but I don't know enough about
Python or the internals of Mailman to figure out where the path data for
it is coming from yet.  I'll continue to study it to see if I can figure
it out, but it'd probably be a lot quicker if one of the Real Developers
could suggest a test patch to add creation of an empty .mbox file to the
list creation operation.

 Linux Now!   ..........Because friends don't let friends use Microsoft.
 phil stracchino   --   the renaissance man   --   mystic zen biker geek
        alaric@babcom.com                halmayne@sourceforge.net
   2000 CBR929RR, 1991 VFR750F3 (foully murdered), 1986 VF500F (sold)