Re: [Mailman-Developers] Google Summer of Code: Integration of Search Code
On Mar 28, 2012, at 10:29 AM, Stephen J. Turnbull wrote:
The only tricky issue is that we *do* have to worry about message-ID collisions of truly different messages and about messages without message IDs, especially for converted historical archives. So the API needs to be able to deal with these issues, probably by returning a set or sequence of messages.
Mailman 3 itself requires unique Message-IDs. IIRC, the Mail Archive guys found a very very low collision rate over millions of messages, and I think all such cases were basically spam. The LMTP runner doesn't yet reject duplicates, but it should (LP: #967951).
s>I would guess she'll probably store messages in YY-MM/MSGID, or as git does
in "unpacked" XX/YYYYYYYY... format, where XX are the first two digits of the hash ID, and YY... are the remaining ones). But it could easily be backed by an IMAP store or something more specialized; we don't really care as long as it's object-ID-addressable.
Don't forget too that the LMTP runner automatically adds the X-Message-ID-Hash header, which is a Base32 encoding of the SHA1 hash of the Message-ID contents (without the angle brackets). This hash could be used as well.
-Barry
On Thu, Mar 29, 2012 at 8:07 AM, Barry Warsaw <barry@list.org> wrote:
Mailman 3 itself requires unique Message-IDs.
So? FWIW, I don't think I agree with that requirement (even RFC 5322 doesn't make it a "MUST"), but I'm not going to argue with you about Mailman 3 design, that's your pidgin. But there's nothing particularly Mailman-3-dependent about archiver web UIs, though. I don't see any reason why the front end shouldn't be used on my several gigs of personal archives going back to about 1980, eg, or as a poor man's webmail.
IIRC, the Mail Archive guys found a very very low collision rate over millions of messages, and I think all such cases were basically spam.
Sure, but XEmacs archives go back to at least 1994. mailarchive.com is a more recent phenomenon. In the early days of Linux/*BSD diffusion, there were lots of buggy MUAs/very simple MTAs out there.
hash ID, and YY... are the remaining ones). But it could easily be backed by an IMAP store or something more specialized; we don't really care as long as it's object-ID-addressable.
Don't forget too that the LMTP runner automatically adds the X-Message-ID-Hash header, which is a Base32 encoding of the SHA1 hash of the Message-ID contents (without the angle brackets). This hash could be used as well.
It doesn't do that for subobject content IDs, and more important, users don't necessarily have the X-Message-ID-Hash (they may have not-metoo set, they may have gotten the message as a direct Cc). True, it's easy enough to compute -- if you're a Mailman 3 developer and know it's present.<wink/> And, of course, why have a Mailman 3 dependency that is absolutely unnecessary?
participants (2)
-
Barry Warsaw
-
Stephen J. Turnbull