[Mailman-Developers] Google Summer of Code: Integration of Search Code

Barry Warsaw barry at list.org
Thu Mar 29 01:07:47 CEST 2012


On Mar 28, 2012, at 10:29 AM, Stephen J. Turnbull wrote:

>The only tricky issue is that we *do* have to worry about message-ID
>collisions of truly different messages and about messages without message
>IDs, especially for converted historical archives.  So the API needs to be
>able to deal with these issues, probably by returning a set or sequence of
>messages.

Mailman 3 itself requires unique Message-IDs.  IIRC, the Mail Archive guys
found a very very low collision rate over millions of messages, and I think
all such cases were basically spam.  The LMTP runner doesn't yet reject
duplicates, but it should (LP: #967951).

s>I would guess she'll probably store messages in YY-MM/MSGID, or as git does
>in "unpacked" XX/YYYYYYYY... format, where XX are the first two digits of the
>hash ID, and YY... are the remaining ones).  But it could easily be backed by
>an IMAP store or something more specialized; we don't really care as long as
>it's object-ID-addressable.

Don't forget too that the LMTP runner automatically adds the X-Message-ID-Hash
header, which is a Base32 encoding of the SHA1 hash of the Message-ID contents
(without the angle brackets).  This hash could be used as well.

-Barry


More information about the Mailman-Developers mailing list