Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 07:04:19 +0100 Brad Knowles <brad.knowles@skynet.be> wrote:
At 12:40 AM -0500 2003/10/30, J C Lawrence wrote:
I've already said my bits there and proposed what I see as the cheap, easy, incremental improvement course: Twisted's NNTP supports for storage, Message IDs for keys, a variant best-effort detection and rewriting policy for collisions, and a MeoWWW derivative for HTML presentation/posting.
I don't know anything about Twisted or MeoWWW, so I can't say how they address the subjects above.
Twisted is a pythonic library that implements most of the basic network protocols. Among other things it has an RFC conformant NNTP server and client implementations. Creating an NNTP server with a backing message store is, literally, three lines in Python. Of course it doesn't support all the nifties that real netnews servers do ala expires, administrative controls, feeds, etc. Its not intended for that market, and Mailman doesn't need those supports. If deployment sites need that, they're going to be using inn2|[BCD}News|Diablo anyway.
MeoWWW is a (very inefficient but fixable) pythonic CGI which supports reading and posting to netnews via NNTP. It has various nice UI points, a decent feature set (more than we have now), and does The Right Thing in almost every aspect I've checked except for performance in the spool reads.
I can say that I'm not sure about an NNTP-based storage solution...
We should really start out by splitting that discussion. NNTP is an access protocol. Netnews servers have various storage formats and techniques. Currently NNTP and IMAP are the only standardised wide-deployment protocols for message spool access. I'm not interested in IMAP for the reasons previously discussed. NNTP isn't great, but it is already supported by Mailman for the new gating features and adds a clean abstraction model which allows trivial replacement of Mailman's implementation by inn2|[BCD]news|Diablo|whatever should the deployment site wish. Additionally, again as a standards-etc based protocol, it allows clean abstraction for archive presentation: anything that talks NNTP can now be an effective Mailman archive presenter. Ditto for archive indexing.
As a dev I'm interested in arguments about how to handle the store behind the NNTP interface -- I find that stuff fun and intriguing -- but also think they are fairly uninteresting right now for Mailman specifically. The 90% case for Mailman will have less than 200K messages in their site-wide spool, and most of those an order of magnitude less. For me the interesting point is that once we abstract the message storage behind a well-supported standards-based protocol we can incrementally improve our implementation and those really concerned with the larger cases can throw in inn2 or whatever else, like a filter to SQL, instead. ITMT we get the flexibility and time to grow and do it Really Right. Additionally, having adopted such a well defined abstraction model once, moving down the road should something else better appear it should be a comparatively small cost to support that in addition or instead.
... although certain storage techniques we've recently discussed borrow a lot from extant NNTP implementations, and I'm not sure how much sense it would make to rip out just those parts we know we need, or if we could actually reasonably take the whole thing, kit-n-caboodle.
Which may indeed happen.
I do believe that we need an alternative solution to the message-id header as it was presented to us in the message, as a stable guaranteed unique (well, as good as MD-5 or SHA-1 gets) message identifier that can always be used to refer to the exact same message no matter what.
I'm in split minds here. I see the temptation. I like using Message-IDS, and they are a natural fit to the model semantically, but messing with Message-IDs has unpleasant effects for some other systems.
<shrug>
Whether we use this message identifier as a replacement for the message-id header value as it was presented to us -- I think that's a more philosophical discussion, and I think we should address it by allowing both options but deciding which would be a reasonable default to take.
<nod> I'm on the side of rewriting Message-IDs if we do generate our own keys. I don't like it, but it seems the cleanest approach.
Given that the mailman UI is basically completely contained within the CGI, I'm inclined to leave it there and work on improving it internally, allowing us to continue to work with most any webserver the client may have.
Agreed.
I don't know how MeoWWW addresses this issue, either by replacing the webserver, or providing additional tools that may make it easier to present a good and consistent UI.
MeoWWW is a CGI as discussed above. Twisted implements both sides of HTTP in addition to the NNTP discussed above, but I haven't looked at the details.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
participants (1)
-
J C Lawrence