Mailman 3 Re: [Mailman-Developers] Grackle archive framework - Mailman-Developers

19 Mar 2012


      On Sun, Mar 18, 2012 at 10:55:19AM +0530, Aamir Khan wrote:
...
On Sun, Mar 18, 2012 at 4:24 AM, Barry Warsaw <barry@list.org> wrote:
...
On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:
...
On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw <barry@list.org> wrote:
...
On IRC, we talked about a storm + Python mailbox library based backend,
with a
REST+JSON wsgi based application vending the data.  This would allow us
to
integrate fairly easily with MM3 I think, and would possibly better
enable
some of the archiver work being done by Terri and others.
I understand that we will store the messages in .mbox format. But I don't
understand why do we need to use storm for the archiving purpose.
I meant to say "maildir".  Please let's not use mbox format!  It's way too
easy to corrupt the file, as we did with a bug once in MM2.1, and we've
paid
the price ever since.
I read the difference between maildir and mbox format and it clearly states
that mbox is prone to corruption while maildir is not. Also there are more
advantages using maildir in a way that there is no file locking problem.
But since we will be storing each mail in a separate file, searching
through them will not as fast enough. Using database alone also have
problems like, it will use more hard disk, more CPU cycles will be consumed.
So, if we can store the messages in maildir format with a copy of it it
database. we can serve the searching request using database query which
will powered by full-text search engine. But then there will be problems of
synchronization between the maildir messages and  messages stored in
database. What are your thoughts about it ?
As for searching the archive, there are solutions like Elastic Search,
Solr, lucene. Can we use one of them to search directly through the maildir.
Note that a few of us have been playing with a searching-archiver.  An
initial prototype used notmuch.  We looked into using raw xapian at pycon.
And now, one of our developers (pingou on IRC) has pushed out a prototype
that uses mongodb for the backend.
You can take a look at our development copy here:
http://mm3test.fedoraproject.org/2/list/devel@fp.o
I'll be working on splitting out a tested copy from an in-development copy
later today.  That way we won't be creating web pages with tracebacks all
the time :-)
Code for this is available in the hyperkitty mongodb branch:
bzr branch bzr://bzr.fedorahosted.org/bzr/hyperkitty/mongodb
-Toshio

Re: [Mailman-Developers] Grackle archive framework

Toshio Kuratomi

tags

participants (1)