Re: [Mailman-Developers] Grackle archive framework
On Sun, Mar 18, 2012 at 10:55:19AM +0530, Aamir Khan wrote:
On Sun, Mar 18, 2012 at 4:24 AM, Barry Warsaw <barry@list.org> wrote:
On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:
On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw <barry@list.org> wrote:
On IRC, we talked about a storm + Python mailbox library based backend, with a REST+JSON wsgi based application vending the data. This would allow us to integrate fairly easily with MM3 I think, and would possibly better enable some of the archiver work being done by Terri and others.
I understand that we will store the messages in .mbox format. But I don't understand why do we need to use storm for the archiving purpose.
I meant to say "maildir". Please let's not use mbox format! It's way too easy to corrupt the file, as we did with a bug once in MM2.1, and we've paid the price ever since.
I read the difference between maildir and mbox format and it clearly states that mbox is prone to corruption while maildir is not. Also there are more advantages using maildir in a way that there is no file locking problem. But since we will be storing each mail in a separate file, searching through them will not as fast enough. Using database alone also have problems like, it will use more hard disk, more CPU cycles will be consumed.
So, if we can store the messages in maildir format with a copy of it it database. we can serve the searching request using database query which will powered by full-text search engine. But then there will be problems of synchronization between the maildir messages and messages stored in database. What are your thoughts about it ?
As for searching the archive, there are solutions like Elastic Search, Solr, lucene. Can we use one of them to search directly through the maildir.
Note that a few of us have been playing with a searching-archiver. An initial prototype used notmuch. We looked into using raw xapian at pycon. And now, one of our developers (pingou on IRC) has pushed out a prototype that uses mongodb for the backend.
You can take a look at our development copy here:
http://mm3test.fedoraproject.org/2/list/devel@fp.o
I'll be working on splitting out a tested copy from an in-development copy later today. That way we won't be creating web pages with tracebacks all the time :-)
Code for this is available in the hyperkitty mongodb branch:
bzr branch bzr://bzr.fedorahosted.org/bzr/hyperkitty/mongodb
-Toshio
participants (1)
-
Toshio Kuratomi