Re: [Mailman-Developers] Re: 2006 archives already online!
barry@digicool.com (Barry A. Warsaw) writes:
"OT" == Owen Taylor <otaylor@redhat.com> writes:
OT> What I did for the gnome.org archives (using mhonarc plus OT> custom perl) is to used the Received: header for the date.
Ah, but which one? :) There's going to have a Received: header for each hop that message takes. By the time your message got to me, it had 7 Received: headers, and 3 (I think) by the time it reached Mailman.
The most recent. So you'll only get the wrong date if clobber_data would also get it wrong; people will tell you pretty quickly if your mail server has an incorrect clock.
OT> Which is, almost always, quite close to the time the person OT> actually sent it, and assuming that your local server's time OT> isn't screwed up (which is a much bigger problem...) does OT> not have the 2004 problem. OT> And it has the advantage over clobber_date of: | - Not munging the mail
True, with the disadvantage that if you use an external archiver, it'll have to handle checking for outrageous dates. clobber_date munges the message before it hits either archiver (Pipermail or external).
I guess I'd consider getting the date right to be the concern of the archiver, not of the system feeding to the archiver - after all, its pretty common to also want to feed a chunk of old pre-mailman archives into an archiver along with the mailman ones.
If I was smart, I'd also count as a major disadvantage the fact that I'll have to track down all the places where the Date: header is used in Pipermail, and I /hate/ diving in that code. ;(
Yeah, there are always practical concerns ;-)
| - Not being skewed by moderation delays
Dang, yep, but fixable.
| - Being independent of the archiving process, so if you | import a bunch of old mail with incorrect Date: lines | into the archiving process you still get the 2004 | protection.
True, with the caveat above.
This would be a reasonable option, however if you use the most recent Received: header, won't you still be subject to local server clock skew? And if you use the earliest Received: you'll be subject to the same bogosity in the Date: header. Or do you just start parsing the Received:'s back from the most recent and take the first sane one you find?
As mentioned above, local server clock breakage isn't really something I worry about; my concern with the clobber_date method wasn't that the local clock could be wrong, but rather, that it was discarding information from the message headers and replacing it with the current time, which, in some circumstances could be significantly different.
Also, I did actually have a lot of old mail with bogus Date: fields that I was importing...
Regards, Owen
participants (1)
-
Owen Taylor