Re: [Mailman-Developers] Grackle archive framework
On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:
On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw <barry@list.org> wrote:
On IRC, we talked about a storm + Python mailbox library based backend, with a REST+JSON wsgi based application vending the data. This would allow us to integrate fairly easily with MM3 I think, and would possibly better enable some of the archiver work being done by Terri and others.
I understand that we will store the messages in .mbox format. But I don't understand why do we need to use storm for the archiving purpose.
I meant to say "maildir". Please let's not use mbox format! It's way too easy to corrupt the file, as we did with a bug once in MM2.1, and we've paid the price ever since.
As for archiving, it isn't strictly necessary to use storm, it's just a nice lightweight ORM I happen to like. But I think it *does* make sense to back a full-fledged archiver with a database and a full-text search engine. For example, using our RFC 5064+X-Message-ID-Hash scheme, the database would handle the lookup from hash to actual message storage location.
Cheers, -Barry
On Sun, Mar 18, 2012 at 4:24 AM, Barry Warsaw <barry@list.org> wrote:
On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:
On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw <barry@list.org> wrote:
On IRC, we talked about a storm + Python mailbox library based backend, with a REST+JSON wsgi based application vending the data. This would allow us to integrate fairly easily with MM3 I think, and would possibly better enable some of the archiver work being done by Terri and others.
I understand that we will store the messages in .mbox format. But I don't understand why do we need to use storm for the archiving purpose.
I meant to say "maildir". Please let's not use mbox format! It's way too easy to corrupt the file, as we did with a bug once in MM2.1, and we've paid the price ever since.
I read the difference between maildir and mbox format and it clearly states that mbox is prone to corruption while maildir is not. Also there are more advantages using maildir in a way that there is no file locking problem. But since we will be storing each mail in a separate file, searching through them will not as fast enough. Using database alone also have problems like, it will use more hard disk, more CPU cycles will be consumed.
So, if we can store the messages in maildir format with a copy of it it database. we can serve the searching request using database query which will powered by full-text search engine. But then there will be problems of synchronization between the maildir messages and messages stored in database. What are your thoughts about it ?
As for searching the archive, there are solutions like Elastic Search, Solr, lucene. Can we use one of them to search directly through the maildir.
As for archiving, it isn't strictly necessary to use storm, it's just a nice lightweight ORM I happen to like. But I think it *does* make sense to back a full-fledged archiver with a database and a full-text search engine. For example, using our RFC 5064+X-Message-ID-Hash scheme, the database would handle the lookup from hash to actual message storage location.
Cheers, -Barry
-- Aamir Khan | 3rd Year | Computer Science & Engineering | IIT Roorkee
participants (2)
-
Aamir Khan
-
Barry Warsaw