
On Mon, Apr 23, 2012 at 06:20:18PM -0400, Barry Warsaw wrote:
Thanks for posting this Pierre-Yves!
On Apr 23, 2012, at 08:17 PM, Pierre-Yves Chibon wrote:
archive-core (store the emails and expose them through an API) --> archivers/stats/NNTP
The questions are then:
- how do we store the emails ?
- how do we expose the API ?
- how to make it such that it becomes easy to extend ? (ie: the stats
module wants to read the db, but probably also to store information on it)
Sharing is good, but it's also important to remember that any specific system may or may not have a local archiver. I could certainly imagine a site that only archives on M-A or Gmane and doesn't waste the space to archive locally.
I think we've pretty much come to agreement that the core itself doesn't need a full copy of all the messages after it's sent them, but of course, the "prototype" archiver could be used to keep a local copy of everything in a maildir. That could be shared at the lower level (maildir) or through some kind of API in minikitty.
Ive been thinking about this and I'm in mild disagreement. I think that a mailing list system should give people an archive-store which is acessible behind a generalized API. That may be a non-local archiver if it's still possible to implement the API. That archiver-store should be pluggable (the storage could be SQL, mongodb, or remote) but having the store be accessbile is important.
The store may be accessible via a REST API but I'm not certain that its the correct level to deal with when talking about it in this contect. The current mailman3 doesn't have an API for plugging in archivers via REST... it has an API for plugging in archivers via python. That may be the correct level to be looking at this.
Now the important part -- why an archive store is more integral than the current architecture makes it out to be...
One way to look at this is conceptually. Mailman2 is what I've come to think of as a complete mailing list system. By contrast mailman3-core is only a mailing list manager. Mailman3 contains the information necessary to send messages to an address and have those message disseminated to a wider audience. By itself, this is just fancy management of email aliases. Mailing lists seem to be something more than this. In addition to being management of where email is sent, they're also repositories of knowledge on a particular subject. This is the role filled by archives.
One could also look at it from a sysadmin standpoint. If a sysadmin wants to deploy mailman3 with archives. And wants to have a forum-like interface, an nntp interface, a standard archives interface, and a REST interface to the archives are they going to want to set up for different storage technologies for those, import the generic archives into all four of those, and then maintain and update the storage technologies to keep them safe and secure? Will they want to buy warrantied storage for all of them? I think that theyll be happier if the design of our system could consolidate those.
A different way to look at this is from a programmers standpoint. Many of the interfaces to archives that were talking about are going to share common needs. They need access to the email messages. They need to know how the email messages thread together. They're going to want to search the messages. Under the current scheme, programmers will be creating very similar code to access the email messages in their particular store even if they all choose to use the same underlying storage technology.
At the beginning I said that I was only in mild disagreement... where's the qualifier come in? I think that what we have with mailman3 right now is something like this:
[mailman3 core] -- maintainance of the list metadata, sending and receiving provides a REST API [Web UIs] -- web ui to the Core functions [Archivers] -- mailing list storage and user interface to those stored messages.
I think we should look into something a little more symmetrical:
[mailman3 core] -- maintainance of list metadata, sending and receiving, provides a REST API [Web UIs] -- web ui to Core functions [Archive-stores] -- stores the messages sent to the mailing lists. Provides a (REST?) API to apps built on top of it [Archiver UIs] -- web ui, nntp interface, REST API (if not implemented at the storage layer), etc to the archive-store
By splitting the archive storage from the archive UI similar to how mailman3-core splits with the web ui, we can allow a sysadmin to choose one archive-storage for all of the archive front-ends that they run on their systems.
Question: Why have multiple stores? The big reason is that archives are being much more rapidly developed right now. So I anticipate that people are going to be working on different storage technology with different tradeoffs. One storage might be faster. Another might be more generally available. We'll have to reexamine this in the future. It's possible that we'll find one storage system that is perfect for all cases. It's also possible that we'll find all storage solutions have tradeoffs in which case we'll likely want to support third-party stores forever.
Question: This is all dangling off of the archiver interface for mailman3 anyway so how can we affect the outcome? Well, in some ways people can create anything they want in there so we cant enforce a solution. However, if we think that it's desirable, we can certainly document this (maybe with an interface if we go the python route for that layer of API or with a specification of what the REST API should look like for that.) We can also enhance our current archivers to provide the API that we come up with. I have a feeling that the prototype archiver with maildir will be a little slow but if it provides the API and comments about separation between core, storage, and archive UI it gives people a starting point to creating their own.
Question: Where do we start? I think that we'll either succeed or fail very quickly by trying to define what the API between archive-store and archiver-ui should look like. We'll either be able to agree on a common set of features there (from which we'll be able to go forth and create our own archive-storage plugins) or we'll decide that we all need/want to do different things that no common API can address. If there's no common API definition then we won't be able to do any of the rest of this so there won't be any sense continuing down that path.
-Toshio