[pydotorg-www] Archives corruption (was: [PythonInfo Wiki] Update of "tftp" by 79.132.252.94)

Fri Jul 30 22:47:33 CEST 2010

Trying to catch up on some old threads...

On Jul 06, 2010, at 09:32 PM, Paul Boddie wrote:

>> The best way to regenerate a clean archive is to take the mbox file,
>> run cleanarch over it, then run 'arch --wipe'.  The urls will
>> probably be broken, so if the original urls can be retrieved then I
>> think the easiest way to "fix" them is to write some alias rules for
>> Apache to do permanent redirects to the new urls.  This is not a
>> trivial amount of work, which is probably why it hasn't been done
>> yet.  Who wants to - and can - volunteer to see this through to the
>> end?
>
>Is it not possible to get an old version of Mailman to generate
>archives which presumably have the same traits as those previously
>generated, record the identifier to Message-Id (or other "anchoring"
>property) correspondence, and then relabel the messages in the fixed
>archives with the old identifiers? Or does the whole "as messages
>arrive" thing completely prevent any possibility of reproducing the
>correct archived message ordering?

I think it will be problematic with an archive as old as python-dev.  It's
worth a shot of course <wink>, but python-dev's mbox definitely spans the
problematic region and cleanarch is just a heuristic.  The on-demand archive
generation is different enough (even though it uses much of the same code
path) that I'm not positive it's stable even without the mbox bug, and it will
be difficult to verify.

I just don't have the cycles to do much testing of this, but I'll answer
questions for anyone who does.

>> On a second level, I've been searching for volunteers to fix this
>> wart in Pipermail for at least a decade.  No one's stepped forward
>> so far.  If you're interested in helping, the Mailman project would
>> love to have you.
>
>It sounds like a fun project, and I'm tempted, but I also have a fair
>amount of other stuff to do right now, including writing a talk for
>EuroPython, although this might make for some interesting material for
>that talk. ;-)

Hope that went well!
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pydotorg-www/attachments/20100730/1f941527/attachment.pgp>