[pydotorg-www] Archives corruption (was: [PythonInfo Wiki] Update of "tftp" by 188.8.131.52)
techtonik at gmail.com
Tue Jul 6 19:20:14 CEST 2010
On Tue, Jul 6, 2010 at 5:39 PM, Barry Warsaw <barry at python.org> wrote:
> Here's the issue.
> Pipermail has never maintained a database between message-ids and the urls.
> This is true even before Pipermail was bolted into Mailman and that's never
> changed, despite being high on my wish list for a decade. In any case, the
> problem occurs because Pipermail messages are numbered sequentially, and there
> is a difference between generating the archive on the fly (i.e. as messages
> arrive) and as a regenerated whole. This is complicated by the fact that
> there was a bug in Mailman years ago that broke the mbox separator so that
> regens couldn't be done reproducibly. This is why Mailman has a cleanarch
So, the bug is fixed, but archives still need to be repaired with
1. Is that right?
2. If the bug is fixed - how come that Python archives become corrupted?
3. If they were not corrupted - why they were regenerated?
> The best way to regenerate a clean archive is to take the mbox file, run
> cleanarch over it, then run 'arch --wipe'. The urls will probably be broken,
> so if the original urls can be retrieved then I think the easiest way to "fix"
> them is to write some alias rules for Apache to do permanent redirects to the
> new urls. This is not a trivial amount of work, which is probably why it
> hasn't been done yet. Who wants to - and can - volunteer to see this through
> to the end?
We need to clearly define the problem first.
- mbox file - some kind or binary file with messages inside
- archive site served by Apache with some content
1. What is this site?
2. How .html files are generated (statically/dynamically)?
3. What are linking rules?
4. What are name generating formulas?
- de-facto information loss - symptoms - broken URL links, broken thread chains
Need to find out:
- source of information loss
- if the recovery is possible
- recovery scenarios
- implement recovery scenario
- run implementation
> On a second level, I've been searching for volunteers to fix this wart in
> Pipermail for at least a decade. No one's stepped forward so far. If you're
> interested in helping, the Mailman project would love to have you.
It will be expensive to get me for the whole project. The only thing I
can promise is to put some effort into this specific data
More information about the pydotorg-www