[pydotorg-www] Archives corruption

Stephan Deibel sdeibel at wingware.com
Tue Jul 6 20:00:22 CEST 2010

anatoly techtonik wrote:
> On Tue, Jul 6, 2010 at 5:39 PM, Barry Warsaw <barry at python.org> wrote:
>> Here's the issue.
>> Pipermail has never maintained a database between message-ids and the urls.
>> This is true even before Pipermail was bolted into Mailman and that's never
>> changed, despite being high on my wish list for a decade.  In any case, the
>> problem occurs because Pipermail messages are numbered sequentially, and there
>> is a difference between generating the archive on the fly (i.e. as messages
>> arrive) and as a regenerated whole.  This is complicated by the fact that
>> there was a bug in Mailman years ago that broke the mbox separator so that
>> regens couldn't be done reproducibly.   This is why Mailman has a cleanarch
>> script.
> So, the bug is fixed, but archives still need to be repaired with
> `cleanarch` script.
> 1. Is that right?
> 2. If the bug is fixed - how come that Python archives become corrupted?
> 3. If they were not corrupted - why they were regenerated?

Note that for some Python email lists the mbox file contains messages 
for which we received and acted on take down notices.  I'm fairly sure 
we removed the message from the html archive but not from the mbox.  So 
regenerating from those will cause re-posting of those messages.  
Perhaps not a big deal (it's years ago) but thought I'd mention it.

>> On a second level, I've been searching for volunteers to fix this wart in
>> Pipermail for at least a decade.  No one's stepped forward so far.  If you're
>> interested in helping, the Mailman project would love to have you.
> It will be expensive to get me for the whole project. The only thing I
> can promise is to put some effort into this specific data
> transformation scenario.

"Volunteer" means doing it for free as a service to the community.  What 
is needed is actual help not attempts to flog or embarrass others into 
doing work that you think is important.  I think many agree it's a 
problem but frankly it's a bit odd that in response to a call for 
volunteers to fix the problem you essentially say "I can help but it'll 
cost you a lot". 

Hopefully I'm just misunderstanding your email.

- Stephan

More information about the pydotorg-www mailing list