[Mailman-Developers] Speaking about kitties (or archivers)

Mon Apr 23 20:17:16 CEST 2012

Meeow miaou*

We spoke on IRC about the archiver the other day and I said that I
should present here my thoughts about it. So here they are (beware that
might be long).

First I think we should think about the structure/architecture of
things. We have a number of component which need to be archives aware,
without being exhaustive I'm thinking about:
- the archiver itself (which present the archive (ie: mails and threads)
- the NNTP bits which should be able to return emails and/or threads
- the stats module which want to give information to the user about the
health of the list itself (emails/month, last threads, biggest
threads...)
- archives retrieval (we probably want to give the user a way to
download the archives since the creation of the list/the last
year/month)

All of these components needs to be aware about the archives. We agreed
that the core does not want to know about it.

So we have several solutions:
- each module becomes an "archiver" wrt to core, meaning each module has
its own way to storing the archives (and eventually its own system to do
so)
- we create a archive-core module which manage the archives and provides
an API to access, modify, extend them.

Of course, we prefer the second solution :)
So we would have the following architecture:

  mm-core (handles the lists themselves) --send emails to archivers-->
archive-core (store the emails and expose them through an API) -->
archivers/stats/NNTP

The questions are then:
- how do we store the emails ?
- how do we expose the API ?
- how to make it such that it becomes easy to extend ? (ie: the stats
module wants to read the db, but probably also to store information on
it)

Having played with mongodb (HK relies on it atm), I quite like the
possibilities it gives us. We can easily store the emails in it, query
them and since it is a NoSQL database system extending it becomes also
easy.
On the other hand, having the archiver-core relying on the same system
as the core itself would be nicer from a sysadmin pov. I have not tried
to upload archives to a RDBMS and test its speed, but for mongodb the
results of the tests are presented at [1].

The challenge will be speed and designing an API which allow each
component to do its work.
I think it would be nice if we could reach some kind of agreement before
the GSoC starts (even if we change our mind later on) to be sure that if
we get students their work don't overlap too much.

The second point I want to present is with respect to the archiver
itself.
At the moment we have HyperKitty (HK), the current version:
- exposes single emails
- exposes single threads
- presents the archives for one month or day
- allows to search the archives using the sender, subject, content or
subject and content
- presents a summary of the recent activities on the list (including the
evolution of the number of post sent over the last month)

I think these are the basis functionality that we would like to see in
an archiver.
But HK aims at much more, the ultimate goal of HK is to provide a
"forum-like" interface to the mailing-lists, with it HK would provide a
number of option (social-web like) allowing to "like" or "dislike" a
post or a thread, allowing to "+1" someone, allowing to tag the mails or
assign them categories.
These are all nice feature but, imho, they go beyond what one would want
from a basic archiver.

So what I would like to propose is to split HK into a sub-project
(MiniKitty?) which would provide these basic functionality.

We would keep HyperKitty as a more extensive archiver and try to bring
HK to its ultimate goal. This will need some more work and time as we
will have to make HK speak with core for authentication, find a way to
send emails to core/the lists and of course add all the other features
(tags, categories...)

Comments welcome :)

Thanks,
Pierre

[1]
http://blog.pingoured.fr/index.php?post/2012/03/16/Mailman-archives-and-mongodb
* Hi everyone