On Apr 24, 2012, at 11:12 AM, Toshio Kuratomi wrote:
Ive been thinking about this and I'm in mild disagreement. I think that a mailing list system should give people an archive-store which is acessible behind a generalized API.
I'm warming up to this.
The IArchiver interface is generic enough to support both internal and external archivers. If there are deficiencies in either, we can fix the API, as long as both use cases are supportable, in a manner similar to IArchiver.permalink() returning None if the archiver doesn't support stable urls.
(A known omission from the current IArchiver API is that there's no way to access attachments. Does anybody have good ideas about that?)
The store may be accessible via a REST API but I'm not certain that its the correct level to deal with when talking about it in this contect. The current mailman3 doesn't have an API for plugging in archivers via REST... it has an API for plugging in archivers via python. That may be the correct level to be looking at this.
From a systems perspective, yes. Archivers must be enabled system-wide via the config file, but I think we should allow individual lists to opt-in or -out of system-enabled archivers. I'm on the fence as to what to do about the prototype archiver, which is beginning to seem much more like the default archiver-core, i.e. sans ui.
I think we should look into something a little more symmetrical:
[mailman3 core] -- maintainance of list metadata, sending and receiving, provides a REST API [Web UIs] -- web ui to Core functions [Archive-stores] -- stores the messages sent to the mailing lists. Provides a (REST?) API to apps built on top of it [Archiver UIs] -- web ui, nntp interface, REST API (if not implemented at the storage layer), etc to the archive-store
This is compelling.
Question: Why have multiple stores? The big reason is that archives are being much more rapidly developed right now. So I anticipate that people are going to be working on different storage technology with different tradeoffs. One storage might be faster. Another might be more generally available. We'll have to reexamine this in the future. It's possible that we'll find one storage system that is perfect for all cases. It's also possible that we'll find all storage solutions have tradeoffs in which case we'll likely want to support third-party stores forever.
I always envisioned the core's storage being splittable into three main partitions. One would be the list-centric data, another would be the user-centric data, and the third would be the message-centric data. If you look carefully for example, you'll see that there are no direct foreign key references between members and the mailing lists they're associated with. This link is by fqdn listname, *not* mailinglist table ids. This is deliberate.
(It's entirely possible the implementation doesn't actually allow these three partitions to be stored in completely separate places. I'd consider that a bug.)
OTOH, I don't think it makes sense for the core to rely on more than one ORM. For now, that's Storm.
(I'm slightly lying here because the technology that shows the most promise for supporting schema migrations is Alembic which is based on a stripped down version of SQLAlchemy. But migrations are probably a completely off-line operation.)
Question: This is all dangling off of the archiver interface for mailman3 anyway so how can we affect the outcome? Well, in some ways people can create anything they want in there so we cant enforce a solution. However, if we think that it's desirable, we can certainly document this (maybe with an interface if we go the python route for that layer of API or with a specification of what the REST API should look like for that.) We can also enhance our current archivers to provide the API that we come up with. I have a feeling that the prototype archiver with maildir will be a little slow but if it provides the API and comments about separation between core, storage, and archive UI it gives people a starting point to creating their own.
Some IArchiver implementations will be purely external archivers. I like that we can have a Mail Archive implementation, or potentially a Gmane implementation. Those are very different from a MHonArc implementation, which is again different from the prototype (default? built-in? always-enabled?) archiver. Having a common API for all of these simplifies the parts of the core that send messages to the archives, but what happens once the data is inserted into the different archivers is another question.
Remember too that archiver speed is less important, since that doesn't live in the critical path for message delivery. There is a handle that basically copies the message to the archiver queue, and there's a separate runner that dequeues those messages and sends them off to the individual archivers, via the IArchive interface. So I think the performance of message insertion isn't something we should worry about for now.
Question: Where do we start? I think that we'll either succeed or fail very quickly by trying to define what the API between archive-store and archiver-ui should look like. We'll either be able to agree on a common set of features there (from which we'll be able to go forth and create our own archive-storage plugins) or we'll decide that we all need/want to do different things that no common API can address. If there's no common API definition then we won't be able to do any of the rest of this so there won't be any sense continuing down that path.
Places to start:
Look at the IArchiver interface and try to figure out whether it's complete from a message-insertion POV. Maybe in that case, we don't care about attachments since the archiver will do whatever it wants with them.
Look at the IMessageStore API. Is this complete? IOW, could you build a purely Python-level archiver like HyperKitty on top of this API? Here's where proper attachment handling would probably be necessary.
How would you want to expose the IMessageStore interface into the REST API? My sense is that you could probably take a fairly straightforward translation of IMessageStore into REST and *that* would be what you'd build the various archiver UIs on top of. REST needs to answer questions like batching which are necessary for efficient transfer of data over HTTP but not for direct Python calls.
Should threading information be part of the IMessageStore, or a separate interface? If the prototype archiver becomes the default implementation for the IMessageStore, it probably needs to grow a lot more functionality to support threading information.
The way I'm seeing it is that IArchiver is the interface for getting messages *into* the IMessageStore. The IMessageStore is the interface for making Python level queries needed to get the raw messages out of the system, and a REST API is how you publish this data for the various ui consumers.