[Mailman-Developers] Improving the archives

Stephen J. Turnbull stephen at xemacs.org
Wed Jul 25 18:40:08 CEST 2007


Barry Warsaw writes:

 > I agree, I just don't think message-ids are user friendly enough to  
 > be this canonical url.  Especially in this context, which is exactly  
 > where urls are thrown in users faces.  An archiving service is  
 > exactly the right place for redirecting human readable urls to the  
 > archiver's canonical url (by, I agree, 302).

I'm confused (to be precise, you're confusing me).  If human readable
URLs are exactly right for redirection to the canonical URL, why does
the canonical URL need to be user friendly?

A quick remark: the git SCM uses BASE16 SHA1s for object names, but
allows you to abbreviate them to the unique prefix.  A friendly
archive could do the same for your BASE32 ids.

Without going much into implementation, here's how I would write the
conformance section for our RFC.  The point is that I don't see any
need to discuss user-friendliness or the implementation of UUIDs for
the RFC!  This means that getting those right from the start is
not that important.

    0. Conformance

    0.1 List managers

    A conforming list manager MUST provide the List-Archive header
    field if the post is being archived.

    A conforming list manager MAY provide the List-Archive-UUID header
    field.  If so, the value MUST be guaranteed unique, and it MUST be
    present in the post as provided to the archiver.  The contents of
    this header need not be distinct from the contents of the
    Message-ID header, as long as the uniqueness guarantee is
    maintained.

    0.2 Archives

    A conforming archive MUST reserve the namespaces "message-id/" and
    "list-post-id/" relative to its base URL for the uses described
    below.

    A conforming archive MUST support retrieval by Message-ID, using
    the namespace "message-id/$(MESSAGE-ID)" relative to its base URL.
    The archive specified in the List-Archive header field MUST
    support access using the value of that field as its base URL.

    A conforming archive SHOULD support retrieval by UUID, using the
    namespace "list-post-id/$(LIST-ARCHIVE-UUID)" relative to its base
    URL.  If the scheme is "http" or "https", a conforming archive
    that does not support retrieval by UUID SHOULD return status 501
    NOT IMPLEMENTED with an entity explaining that retrieval by UUID
    is not implemented.

    A conforming archive MAY support "friendlyurls" for use where
    space is constrained (eg, in a post's footer).  A conforming
    archive may support any other URIs it wants to, too.<wink>  A
    third party SHOULD be able to regenerate a friendlyurl from the
    original message contents.

    0.3 Software

    Conforming archive software SHOULD provide interfaces for
    generating UUIDs and friendlyurls, if retrieval is supported.
    Conforming list managers SHOULD use these interfaces.

Some comments:

The interfaces for generated URLs should be provided as command line
utilities as well as callable functions.

Although the conformance level for friendlyurl support is "may", I
expect that essentially all archives will support friendlyurls.

The namespace for UUIDs and friendlyurls should probably be more
restricted than "any valid URI".

"List manager" denotes any source of archival content (eg, you could
imagine a user storing their outbox in a archive, so that the "list
manager" would actually be the user's MUA).  The namespaces suggested
above are good enough, I think, but there may be better ones.

Instead of 501 NOT IMPLEMENTED, I considered 410 GONE, but that
implies a request to delete the reference.  Since this is implemented
as a header in the post, the archive could be augmented to support it
later.

In the phrase "guaranteed unique", "guaranteed" means "to the level
provided by uuidgen or standard Message-ID generators".

Generation of friendlyurls or unique ids based on message body content
is probably a bad idea.



More information about the Mailman-Developers mailing list