Thanks for posting this Pierre-Yves!
On Apr 23, 2012, at 08:17 PM, Pierre-Yves Chibon wrote:
mm-core (handles the lists themselves) --send emails to archivers-->
Note that the core doesn't *have* to send an email to the archiver. From the
core's perspective, the IArchiver
interface has three functions:
- add a message to the archive
- get a 'permalink' to the message in the archive
- get the url to the "top" of the list's archive
The important things are 1) calculating the 'permalink' should not require a round-trip with the archiver; 2) the details of adding a message to the archiver are irrelevant to the core.
For external archivers, such as M-A or Gmane, the implementation of IArchiver may indeed send an email. For a local archiver like MHonArch, the implementation just shells out to a command. For HK or anything else, it could be anything. Every archiver needs a way to get messages sent to it, and the core can adapt to any of those.
archive-core (store the emails and expose them through an API) --> archivers/stats/NNTP
The questions are then:
- how do we store the emails ?
- how do we expose the API ?
- how to make it such that it becomes easy to extend ? (ie: the stats module wants to read the db, but probably also to store information on it)
Sharing is good, but it's also important to remember that any specific system may or may not have a local archiver. I could certainly imagine a site that only archives on M-A or Gmane and doesn't waste the space to archive locally.
I think we've pretty much come to agreement that the core itself doesn't need a full copy of all the messages after it's sent them, but of course, the "prototype" archiver could be used to keep a local copy of everything in a maildir. That could be shared at the lower level (maildir) or through some kind of API in minikitty.
Having played with mongodb (HK relies on it atm), I quite like the possibilities it gives us. We can easily store the emails in it, query them and since it is a NoSQL database system extending it becomes also easy. On the other hand, having the archiver-core relying on the same system as the core itself would be nicer from a sysadmin pov. I have not tried to upload archives to a RDBMS and test its speed, but for mongodb the results of the tests are presented at [1].
The challenge will be speed and designing an API which allow each component to do its work.
I think the archiver should *definitely* have a REST API for programmatic access to its messages and data.
I think it would be nice if we could reach some kind of agreement before the GSoC starts (even if we change our mind later on) to be sure that if we get students their work don't overlap too much.
The second point I want to present is with respect to the archiver itself. At the moment we have HyperKitty (HK), the current version:
- exposes single emails
- exposes single threads
- presents the archives for one month or day
- allows to search the archives using the sender, subject, content or subject and content
- presents a summary of the recent activities on the list (including the evolution of the number of post sent over the last month)
I think these are the basis functionality that we would like to see in an archiver. But HK aims at much more, the ultimate goal of HK is to provide a "forum-like" interface to the mailing-lists, with it HK would provide a number of option (social-web like) allowing to "like" or "dislike" a post or a thread, allowing to "+1" someone, allowing to tag the mails or assign them categories. These are all nice feature but, imho, they go beyond what one would want from a basic archiver.
I think it would be fine for a basic archiver to be essentially feature-equivalent to Pipermail, with two caveats:
- Truly stable URLs, so that when you regenerate the archive from the raw maildir, none of your links break.
- Search.
Other than that, it's all gravy (as we say :). Nice-to-have features like CSS for customizing the look and feel, dynamic rendering of raw messages, etc. would be cool, but IMHO of secondary importance.
Cheers, -Barry