[Mailman-Developers] Grackle archive framework

Mon Mar 19 20:32:45 CET 2012

On Sun, Mar 18, 2012 at 10:55:19AM +0530, Aamir Khan wrote:
> On Sun, Mar 18, 2012 at 4:24 AM, Barry Warsaw <barry at list.org> wrote:
> 
> > On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:
> >
> > >On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw <barry at list.org> wrote:
> > >> On IRC, we talked about a storm + Python mailbox library based backend,
> > >> with a
> > >> REST+JSON wsgi based application vending the data.  This would allow us
> > to
> > >> integrate fairly easily with MM3 I think, and would possibly better
> > enable
> > >> some of the archiver work being done by Terri and others.
> > >>
> > >
> > >I understand that we will store the messages in .mbox format. But I don't
> > >understand why do we need to use storm for the archiving purpose.
> >
> > I meant to say "maildir".  Please let's not use mbox format!  It's way too
> > easy to corrupt the file, as we did with a bug once in MM2.1, and we've
> > paid
> > the price ever since.
> >
> 
> I read the difference between maildir and mbox format and it clearly states
> that mbox is prone to corruption while maildir is not. Also there are more
> advantages using maildir in a way that there is no file locking problem.
> But since we will be storing each mail in a separate file, searching
> through them will not as fast enough. Using database alone also have
> problems like, it will use more hard disk, more CPU cycles will be consumed.
> 
> So, if we can store the messages in maildir format with a copy of it it
> database. we can serve the searching request using database query which
> will powered by full-text search engine. But then there will be problems of
> synchronization between the maildir messages and  messages stored in
> database. What are your thoughts about it ?
> 
> As for searching the archive, there are solutions like Elastic Search,
> Solr, lucene. Can we use one of them to search directly through the maildir.
> 
Note that a few of us have been playing with a searching-archiver.  An
initial prototype used notmuch.  We looked into using raw xapian at pycon.
And now, one of our developers (pingou on IRC) has pushed out a prototype
that uses mongodb for the backend.

You can take a look at our development copy here:

http://mm3test.fedoraproject.org/2/list/devel@fp.o

I'll be working on splitting out a tested copy from an in-development copy
later today.  That way we won't be creating web pages with tracebacks all
the time :-)

Code for this is available in the hyperkitty mongodb branch:

bzr branch bzr://bzr.fedorahosted.org/bzr/hyperkitty/mongodb

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120319/0472e947/attachment.pgp>