-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jul 20, 2007, at 9:21 AM, Stephen J. Turnbull wrote:
How likely is it that two messages with the same message-id and date are /not/ duplicates?
For message id generators that include a time-stamp in the generated id, approximately the same as the probability that two messages with the same message-id are not duplicates, no?
Good point, though clearly not all message-ids have timestamp
information in them. It does help explain why I see 600-odd more
collisions when taking other data into account too. I've modified my
script to sort collisions and dupes into maildir folders, so I'll
take a closer look when that finishes running (it takes a long time
to slog through all 5 mboxes, even on a fairly zippy dual-G5).
Heck, at that point, I'd feel justified in simply automatically rejecting the duplicate and chucking it from the archive.
I'd rather not go there. There may be applications for the archiver that require that all mail received be filed.
True. It would ultimately be an archiver policy though.
Counterproposal: have a "collisions" namespace, and provide an interface for the list owner to decide what to do with them. They could be thrown away, they could be given an alternative global ID somehow and added (eg, the archive page could add a "See probable duplicates too" link), or they could be put into a moderation-like queue for list admins to decide about.
I like this.
So now, think of the interface to a message store that supports this addressing scheme. Well it's something like:
I don't understand how the calling application is supposed to deal with a DuplicateMessageError exception since it should not change either the Message-ID or the Date if present.
I see this as a major problem with any proposal to use only author headers in computing the "global id".
Mailman would probably log and ignore DuplicateMessageErrors. It
wouldn't be Mailman's responsibility to ensure the message gets
archived, although I concede that as currently defined, you could end
up with list copies that had a global id header that wasn't unique.
OTOH, if the archiver implements a collision resolution policy such
as a 'collisions' namespace, it wouldn't ever raise
Or by using the global id, or by rejecting messages with duplicate message ids.
Er, the MTA has already accepted it. Do you plan to generate a list manager bounce to the poster? This has the unpleasant misfeature that it could be used to bounce spam off the list manager, since the poster needs to see content to determine whether this is a multiple send or actually the "intended version" after a "fat-finger" send; we already know the message-id isn't good enough.
Yes, this wouldn't be an MTA bounce, it would be a Mailman bounce.
But it would have to be subject to the same bounce rules as any other
auto-response which could be used as a spam vector, e.g. limit the
number of bounces per time period and don't include the entire
original message in the bounce (as both can be, and are used as spam