We spoke on IRC about the archiver the other day and I said that I should present here my thoughts about it. So here they are (beware that might be long).
First I think we should think about the structure/architecture of things. We have a number of component which need to be archives aware, without being exhaustive I'm thinking about:
health of the list itself (emails/month, last threads, biggest threads...)
download the archives since the creation of the list/the last year/month)
All of these components needs to be aware about the archives. We agreed that the core does not want to know about it.
So we have several solutions:
its own way to storing the archives (and eventually its own system to do so)
an API to access, modify, extend them.
Of course, we prefer the second solution :) So we would have the following architecture:
mm-core (handles the lists themselves) --send emails to archivers--> archive-core (store the emails and expose them through an API) --> archivers/stats/NNTP
The questions are then:
module wants to read the db, but probably also to store information on it)
Having played with mongodb (HK relies on it atm), I quite like the possibilities it gives us. We can easily store the emails in it, query them and since it is a NoSQL database system extending it becomes also easy. On the other hand, having the archiver-core relying on the same system as the core itself would be nicer from a sysadmin pov. I have not tried to upload archives to a RDBMS and test its speed, but for mongodb the results of the tests are presented at .
The challenge will be speed and designing an API which allow each component to do its work. I think it would be nice if we could reach some kind of agreement before the GSoC starts (even if we change our mind later on) to be sure that if we get students their work don't overlap too much.
The second point I want to present is with respect to the archiver itself. At the moment we have HyperKitty (HK), the current version:
subject and content
evolution of the number of post sent over the last month)
I think these are the basis functionality that we would like to see in an archiver. But HK aims at much more, the ultimate goal of HK is to provide a "forum-like" interface to the mailing-lists, with it HK would provide a number of option (social-web like) allowing to "like" or "dislike" a post or a thread, allowing to "+1" someone, allowing to tag the mails or assign them categories. These are all nice feature but, imho, they go beyond what one would want from a basic archiver.
So what I would like to propose is to split HK into a sub-project (MiniKitty?) which would provide these basic functionality.
We would keep HyperKitty as a more extensive archiver and try to bring HK to its ultimate goal. This will need some more work and time as we will have to make HK speak with core for authentication, find a way to send emails to core/the lists and of course add all the other features (tags, categories...)
Comments welcome :)