Re: [Mailman-Developers] Google Summer of Code: Integration of Search Code

On Wed, Mar 28, 2012 at 6:59 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
On Wed, Mar 28, 2012 at 4:21 AM, Terri Oda <terri@zone12.com> wrote:
Looks like archiver for mm3 is still in development stage. As far as I understand searcher depends on the srchiver, right? Not completely but it somewhat depends on archiver. I am not sure if searcher can be implemented without archiver. If possible I can implement for mm3 also.
Searcher and archiver are interdependent *if* we want to share caches and data stores, which we probably do for any installation with larger archives where storing 2 copies vs 4 of each message would make a difference. Plus, many archive views may be basically searches "messages in the last month" "messages which are replies to messageid $foo" etc.
Actually, as far as I can see, the summary/search/index/retrieval functions depend only on the API for the message store. If you want, you can split this into the database layer and a presentation layer, of course. However, the database layer is surely going to have its own schema optimized for the kinds of retrieval its designer considers important. If the designer emphasizes threads, however, she is *not* going to try to store messages in thread order or anything like that. Rather, any reasonable store will be message-ID-addressable.
The only tricky issue is that we *do* have to worry about message-ID collisions of truly different messages and about messages without message IDs, especially for converted historical archives. So the API needs to be able to deal with these issues, probably by returning a set or sequence of messages.
Oh, and we probably ought to have a more general notion of retrievable "object" rather than just messages, as some archive/retrieval backends may store some types of MIME part separately. Hopefully these would be presented to us as MIME parts with external bodies and content IDs.
I would guess she'll probably store messages in YY-MM/MSGID, or as git does in "unpacked" XX/YYYYYYYY... format, where XX are the first two digits of the hash ID, and YY... are the remaining ones). But it could easily be backed by an IMAP store or something more specialized; we don't really care as long as it's object-ID-addressable.
Assuming that we have something like this(object-ID-addressable, If I am not wrong, mailman3 made it possible but not yet implemented as it's part of archiver), is it over ambitious to plan to implement indexer/searcher for mailman3 and a REST API to use this searcher, extend client to use this api, and django search form along with this client api? All this independent of archiver. Because the only part common with archiver is message retrieval part, If we implement whole searcher, and rest of the client code, later when archiver is implemented message retrieval code can used in searcher. When archiver is completely mature may we can even merge them together. Is it possible? Or this plan has any 'non-sense' parts?
And that's all we want to say about the archiver and the associated message-retrieval logic, I think. (In fact, it occurs to me that maybe we should say "RFC 3501" and be done with it. I don't mean that we necessarily implement IMAP protocol per se, but some subset of its functionality probably is what we need from an archiver.)
Then the schema-specific stuff will use hash IDs to represent message objects in a portable but schema-specific way. As it's schema-specific, I don't really see how data structures can be shared by different searchers.
So I would say not to worry about the archiver side at all. If large installations want to implement specialized message- retrieval, bully for them. But we can go with simple backends, maildir, mbox, and maybe IMAP, I think.
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/mdoshayan%40gmail....
Security Policy: http://wiki.list.org/x/QIA9
participants (1)
-
Shayan Md