[Mailman-Users] attachements question

Thu Sep 29 07:29:58 CEST 2005

>>>>> "Brad" == Brad Knowles <brad at stop.mail-abuse.org> writes:

    Brad> At 2:03 PM +0900 2005-09-28, Stephen J. Turnbull wrote:

    >> Why archivers don't use Message-Id for the URL, I don't know.

    Brad> 	Because some MUAs generate message-ids that are likely
    Brad> to collide.

Can we stop pandering to the broken mailers, please?  Are we not
hackers?  We know how to handle collisions.  Here's the algorithm:

1)  Look for a unique ID in the (X-)List-Archive-Message-ID field.  If
    not found:
    a)  Generate a unique ID according to the usual algorithm as if
        the post were about to be sent from the archive host.
    b)  Add it to the header in the (X-)List-Archive-Message-ID field.

2)  Extract the message ID from the message.  If none, set the program
    variable equal to the ID generated in 1, and (optionally) add it as
    Message-ID to the message's header.

4)  Generate the URL for the archived message based on *Message-ID*.

5)  Check for collision.

6)  If there is a collision, make a directory (could be a file-system
    directory, could be just an HTML file, could be a digest message)
    with the URL generated in 4.  Generate URLs for the colliding
    messages based on Message-ID plus List-Archive-Message-ID, and
    include them in the directory.  Conforming implementations MAY also
    extract MUA information and make nasty comments about the broken
    MUAs, their implementers, and their users to go with the directory.

    If Message-ID == List-Archive-Message-ID, go to 1a.  At this point
    a conforming implementation MAY mail /vmunix to its implementer,
    who obvious snafu'ed.

7) PUT the colliding messages at those URLs.

Rationale:

1.  You could actually derive an URN from this:
    archived-message://list-archive.your.org/MESSAGE-ID.

2.  The URL is unique and will persist across regeneration of the
    archive as long is the message is present.

3.  People who use conformant software implemented competently should
    be given precedence.

4.  Users who don't subscribe to the archiver's client but somehow get
    their hands on a message ID can use Google to find it (and the
    rest of the thread).

5.  People who use software that doesn't conform will suffer.

    Brad> For some time now, I've been arguing that they should use a
    Brad> hash of the relevant information (maybe all the headers,
    Brad> maybe just selected headers, maybe the entire message,
    Brad> whatever is reasonable to assume will survive), making sure
    Brad> to at least include the value of the "Date:", "Message-ID:",
    Brad> and "Received:" headers as part of that input.

This gives 1 and 2, but not 3, 4, and 5.  (No, you can't generate a
Google search item from knowledge of the algorithm because you don't
necessarily have the Received headers.)  Seems like overkill for Step
1 of the algorithm, too.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.