[Mailman-Developers] Re: [Mailman-Users] Poking and prodding the archiver
Phil Stracchino
alaric@babcom.com
Mon, 9 Jul 2001 20:15:18 -0700
On Mon, Jul 09, 2001 at 10:28:19PM -0400, Barry A. Warsaw wrote:
>
> [Note: this discussion is more appropriate for mailman-developers, so
> I've changed the Cc: -baw]
> >>>>> "PS" == Phil Stracchino <alaric@babcom.com> writes:
>
> PS> I've looked at some length through the code for the archiver
> PS> now, and although I still don't understand python, I've
> PS> figured out enough of what the archiver is doing to see that
> PS> it's apparently intentional that the path to mbox archives is
> PS> .../mailman/archives/private/list.mbox/list.mbox.
>
> Yes, and this is for security reasons as explained in the comment in
> Archiver.py (see InitVars()). The comment is slightly out-of-date in
> that the file under listname.mbox/ is also called listname.mbox.
Right. I understand why all archives are stored under archives/private;
what I wasn't understanding was why the last pathname element was
duplicated, because until I'd (a) worked around it temporarily and
(b) seen mailman/bin/arch in action, I didn't understand that it was a
case of a listname.mbox directory containing a listname.mbox file.
> PS> nor why it is that the archiver is written in such a way that
> PS> it attempts to access this mbox archive directory with its
> PS> duplicated final pathname element even when mbox archives are
> PS> disabled, and fails if it doesn't exist.
>
> If this is true (and I haven't tested it), then it's most likely just
> old lurking bugs. The archiver/Pipermail stuff is the most neglected
> part of the codebase. People keep threatening to help rewrite it, but
> so far nothing's materialized, and I have little time or energy to
> devote to the Pipermail side.
Well, I'm trying to figure out the problem, but my Python-fu is small. :)
> PS> I find this behavior even more curious in light of the fact
> PS> that newlist apparently creates archives/private/list.mbox
> PS> when it sets up the list, but does not create the
> PS> archives/private/list.mbox/list.mbox without the existence of
> PS> which the archiver fails.
>
> Do you mean the archiver fails or that the web access to the archiver
> fails? Certainly not the former (unless I misunderstand) because it
> works for me, and loads of other people. It's a known buglet that the
> pipermail url doesn't work until the first message is posted to the list.
Both. If the mbox file does not exist, the pipermail URL points to a
zero-length document, so web access to the archives fails; and no HTML
archive files are created, so there's no HTML archives to access in the
first place. Only flat text archives are created.
> PS> I've applied the following patch to my HyperArch.py file
> PS> (patch also attached separately):
[Note: Now that I understand what's happening and why, I have removed
the patch.]
>
> [patch deleted]
>
> PS> I don't know what impact this has on mbox archives, but for
> PS> me, it makes the HTML archiver work.
>
> Hmm, odd. What I think will break is private archives. If you toggle
> an archive to private, I seem to remember that you can craft a url to
> trick the web server into vending an archive page for you directly,
> instead of forcing you to go through authentication with the
> private.py cgi.
Actually, on further examination, I think what it'll do is break mbox
archives.
> PS> It's still a mystery to me why the archiver should
> PS> even *care* whether or not the mbox archive directory exists,
> PS> when mbox archives are disabled in the master configuration
> PS> anyway.
>
> It probably shouldn't, but then Mailman probably shouldn't support
> ARCHIVE_TO_MBOX=0. Archiving to the mbox is about as fast as it gets,
> since it is just a file append, and it's /incredibly/ handy to have
> that .mbox file around (even as large as it can get), in case you want
> to regenerate your archive, or you want to migrate to a different
> external archiver.
True, though it can be regenerated from the year-month.txt files created
by the web archiver. (This is what I did in order to regenerate my back
archives with arch.)
Right now I'm doing a test to find out whether the archiver cares whether
archives/private/listname.mbox/listname.mbox is non-zero length, so long
as it actually exists. From what I've been able to glean from the code, I
don't think it should care. If this is the case, then a simple workaround
exists to the pipermail problems, which is to simply touch the file at the
time the list directories are created.
I've looked at newlist to see if I can see where this is done, and I think
I've traced it to Utils.MakeDirTree(), but I don't know enough about
Python or the internals of Mailman to figure out where the path data for
it is coming from yet. I'll continue to study it to see if I can figure
it out, but it'd probably be a lot quicker if one of the Real Developers
could suggest a test patch to add creation of an empty .mbox file to the
list creation operation.
--
Linux Now! ..........Because friends don't let friends use Microsoft.
phil stracchino -- the renaissance man -- mystic zen biker geek
alaric@babcom.com halmayne@sourceforge.net
2000 CBR929RR, 1991 VFR750F3 (foully murdered), 1986 VF500F (sold)