[Mailman-Developers] Improving the archives
Dale Newfield
Dale at Newfield.org
Thu Jul 26 09:37:37 CEST 2007
Jeff Breidenbach wrote:
> So I just looked at 2 million raw messages from 2007, spread over
> a few thousand mailing lists (all data is from mail-archive.com). My
> first question was - when comparing only with messages from the
> same list - how many times do I see a repeated message-id? The
> answer was ... drumroll please ... 260 thousand. What the hell?
I think the question you were originally going to ask got sidetracked.
If we assume that all these "multiple paths from list to archive"
duplicates not only share a Message-ID but also a Date (they were the
same message originally, so they should!), then both schemes (messageid,
and messageid+date) would decide that all (but one of) these messages
are redundant.
What we really want to know is how many (non-empty) Message-ID
collisions are there that *don't* share a Date? This is the number of
messages that only-messageid loses, and that the composite identifier
method would not lose.
-Dale
More information about the Mailman-Developers
mailing list