
On Wed, Mar 28, 2012 at 4:21 AM, Terri Oda <terri@zone12.com> wrote:
Looks like archiver for mm3 is still in development stage. As far as I understand searcher depends on the srchiver, right? Not completely but it somewhat depends on archiver. I am not sure if searcher can be implemented without archiver. If possible I can implement for mm3 also.
Searcher and archiver are interdependent *if* we want to share caches and data stores, which we probably do for any installation with larger archives where storing 2 copies vs 4 of each message would make a difference. Plus, many archive views may be basically searches "messages in the last month" "messages which are replies to messageid $foo" etc.
Actually, as far as I can see, the summary/search/index/retrieval functions depend only on the API for the message store. If you want, you can split this into the database layer and a presentation layer, of course. However, the database layer is surely going to have its own schema optimized for the kinds of retrieval its designer considers important. If the designer emphasizes threads, however, she is *not* going to try to store messages in thread order or anything like that. Rather, any reasonable store will be message-ID-addressable.
The only tricky issue is that we *do* have to worry about message-ID collisions of truly different messages and about messages without message IDs, especially for converted historical archives. So the API needs to be able to deal with these issues, probably by returning a set or sequence of messages.
Oh, and we probably ought to have a more general notion of retrievable "object" rather than just messages, as some archive/retrieval backends may store some types of MIME part separately. Hopefully these would be presented to us as MIME parts with external bodies and content IDs.
I would guess she'll probably store messages in YY-MM/MSGID, or as git does in "unpacked" XX/YYYYYYYY... format, where XX are the first two digits of the hash ID, and YY... are the remaining ones). But it could easily be backed by an IMAP store or something more specialized; we don't really care as long as it's object-ID-addressable.
And that's all we want to say about the archiver and the associated message-retrieval logic, I think. (In fact, it occurs to me that maybe we should say "RFC 3501" and be done with it. I don't mean that we necessarily implement IMAP protocol per se, but some subset of its functionality probably is what we need from an archiver.)
Then the schema-specific stuff will use hash IDs to represent message objects in a portable but schema-specific way. As it's schema-specific, I don't really see how data structures can be shared by different searchers.
So I would say not to worry about the archiver side at all. If large installations want to implement specialized message- retrieval, bully for them. But we can go with simple backends, maildir, mbox, and maybe IMAP, I think.