[pydotorg-www] Archives corruption (was: [PythonInfo Wiki] Update of "tftp" by 79.132.252.94)

Paul Boddie paul at boddie.org.uk
Tue Jul 6 21:32:25 CEST 2010


On Tuesday 06 July 2010 16:39:36 Barry Warsaw wrote:
>
> Here's the issue.
>
> Pipermail has never maintained a database between message-ids and the urls.
> This is true even before Pipermail was bolted into Mailman and that's never
> changed, despite being high on my wish list for a decade.  In any case, the
> problem occurs because Pipermail messages are numbered sequentially, and
> there is a difference between generating the archive on the fly (i.e. as
> messages arrive) and as a regenerated whole.  This is complicated by the
> fact that there was a bug in Mailman years ago that broke the mbox
> separator so that regens couldn't be done reproducibly.   This is why
> Mailman has a cleanarch script.

Thanks for the summary! I knew it had something to do with that thread I 
referenced, but I didn't really put all the pieces together.

> The best way to regenerate a clean archive is to take the mbox file, run
> cleanarch over it, then run 'arch --wipe'.  The urls will probably be
> broken, so if the original urls can be retrieved then I think the easiest
> way to "fix" them is to write some alias rules for Apache to do permanent
> redirects to the new urls.  This is not a trivial amount of work, which is
> probably why it hasn't been done yet.  Who wants to - and can - volunteer
> to see this through to the end?

Is it not possible to get an old version of Mailman to generate archives which 
presumably have the same traits as those previously generated, record the 
identifier to Message-Id (or other "anchoring" property) correspondence, and 
then relabel the messages in the fixed archives with the old identifiers? Or 
does the whole "as messages arrive" thing completely prevent any possibility 
of reproducing the correct archived message ordering?

> On a second level, I've been searching for volunteers to fix this wart in
> Pipermail for at least a decade.  No one's stepped forward so far.  If
> you're interested in helping, the Mailman project would love to have you.

It sounds like a fun project, and I'm tempted, but I also have a fair amount 
of other stuff to do right now, including writing a talk for EuroPython, 
although this might make for some interesting material for that talk. ;-)

Paul


More information about the pydotorg-www mailing list