[Mailman-Developers] Improving the archives
jeff at jab.org
Wed Aug 8 06:44:19 CEST 2007
> What we really want to know is how many (non-empty) Message-ID
> collisions are there that *don't* share a Date? This is the number of
> messages that only-messageid loses, and that the composite identifier
> method would not lose.
I took a look at a larger dataset, 5.85 million messages from several
thousand lists. Of the messages that share message-id but not date,
most come from a small number of based web services.
875 come from forums.slimdevices.com
378 come from lists.openplans.org
265 come from nabble.com
164 come from egroups.com
135 come from yahoo.com
166 come from elsewhere
That's 0.03% if you count all the messages. It is 0.008% if you
discard the top three offenders, all of which I have contacted.
I didn't try contacting Yahoo/eGroups because in my past
experience, talking to a brick wall is easier. I have not analyzed
how many of these messages are spam or have duplicate bodies,
which further discounts the percentages.
Hope this data helps.
More information about the Mailman-Developers