
On Mar 28, 2012, at 10:29 AM, Stephen J. Turnbull wrote:
The only tricky issue is that we *do* have to worry about message-ID collisions of truly different messages and about messages without message IDs, especially for converted historical archives. So the API needs to be able to deal with these issues, probably by returning a set or sequence of messages.
Mailman 3 itself requires unique Message-IDs. IIRC, the Mail Archive guys found a very very low collision rate over millions of messages, and I think all such cases were basically spam. The LMTP runner doesn't yet reject duplicates, but it should (LP: #967951).
s>I would guess she'll probably store messages in YY-MM/MSGID, or as git does
in "unpacked" XX/YYYYYYYY... format, where XX are the first two digits of the hash ID, and YY... are the remaining ones). But it could easily be backed by an IMAP store or something more specialized; we don't really care as long as it's object-ID-addressable.
Don't forget too that the LMTP runner automatically adds the X-Message-ID-Hash header, which is a Base32 encoding of the SHA1 hash of the Message-ID contents (without the angle brackets). This hash could be used as well.
-Barry