[Mailman-Developers] Re: 2006 archives already online!

Owen Taylor otaylor@redhat.com
01 May 2001 09:00:06 -0400

barry@digicool.com (Barry A. Warsaw) writes:

> >>>>> "OT" == Owen Taylor <otaylor@redhat.com> writes:
>     OT> What I did for the gnome.org archives (using mhonarc plus
>     OT> custom perl) is to used the Received: header for the date.
> Ah, but which one? :)  There's going to have a Received: header for
> each hop that message takes.  By the time your message got to me, it
> had 7 Received: headers, and 3 (I think) by the time it reached
> Mailman.

The most recent. So you'll only get the wrong date if clobber_data
would also get it wrong; people will tell you pretty quickly
if your mail server has an incorrect clock.

>     OT> Which is, almost always, quite close to the time the person
>     OT> actually sent it, and assuming that your local server's time
>     OT> isn't screwed up (which is a much bigger problem...) does
>     OT> not have the 2004 problem.
>     OT> And it has the advantage over clobber_date of:
>     |  - Not munging the mail
> True, with the disadvantage that if you use an external archiver,
> it'll have to handle checking for outrageous dates.  clobber_date
> munges the message before it hits either archiver (Pipermail or
> external). 

I guess I'd consider getting the date right to be the concern
of the archiver, not of the system feeding to the archiver - 
after all, its pretty common to also want to feed a chunk
of old pre-mailman archives into an archiver along with the
mailman ones.

> If I was smart, I'd also count as a major disadvantage the
> fact that I'll have to track down all the places where the Date:
> header is used in Pipermail, and I /hate/ diving in that code. ;(

Yeah, there are always practical concerns ;-)
>     |  - Not being skewed by moderation delays
> Dang, yep, but fixable.
>     |  - Being independent of the archiving process, so if you
>     |    import a bunch of old mail with incorrect Date: lines
>     |    into the archiving process you still get the 2004 
>     |    protection.
> True, with the caveat above.
> This would be a reasonable option, however if you use the most recent
> Received: header, won't you still be subject to local server clock
> skew?  And if you use the earliest Received: you'll be subject to the
> same bogosity in the Date: header.  Or do you just start parsing the
> Received:'s back from the most recent and take the first sane one you
> find?

As mentioned above, local server clock breakage isn't really something
I worry about; my concern with the  clobber_date method wasn't
that the local clock could be wrong, but rather, that it was discarding
information from the message headers and replacing it with 
the current time, which, in some circumstances could be significantly

Also, I did actually have a lot of old mail with bogus Date: fields
that I was importing...