
Barry Warsaw writes:
Remember too that in MM3, messages only get fed to the registered IArchiver interfaces by a separate archive runner. So they aren't a bottleneck for delivery to the user, but on heavily trafficked sites, they can potential consume a lot of resources if the archiver is local and relatively inefficient.
I'm talking about total load on the server host, not load on the Mailman subsystem. So I don't think the Mailman-to-archive function will consume many resources compared to delivery to subscribers if there are any remote users at all. A local archiver communicates at CPU-to-disk speed basically once or maybe twice as I understand it. The MTA resources for queuing alone will exceed and probably overwhelm this. Then there are the multiple Mailman queues, etc, etc.
Of course the *other* side of the archiver (the client access to the message store) can be extremely resource consuming. I'm just saying that in the grand scheme of message distribution (including to the archiver), the efficiency of a local archiver is not going to be a bottleneck.
In the long run (ie, when nobody who's anybody uses Python 2 at all) I think everybody would be happier if you refactor to keep KittyStore at arm's length from Mailman core.
Agreed, with of course the caveat that we'll need a thin HK IArchiver implementation in the core to generate the permalink and communicate with HK over IPC. Generally we want the permalink to be able to be generated without direction communication with the archiver (see the motivation for X-Message-ID-Hash),
By the way, I would say to adopt modern IETF practice here and drop the "X-" (in practice collisions are rare while the annoyance of fixing platforms to use the standardized name is frequent), and include the algorithm in the name. Eg, Message-ID-MD5 or Hashed-Message-ID-MD5. Or we could use the List-* namespace.
We should do this while we still can. :-) If you want I can try to write an RFC to make it official.
but if the core *has* to talk to HK to generate the permalink,
I personally don't think that is a good idea, but see below.
then I don't think an LMTP channel will work.
The only reason I can think of is that you want to check that the permalink isn't already occupied (that's the only thing HyperKitty proper knows that can't be computed the same way in the IArchiver as in HyperKitty proper AFAICS), and that can be implemented as follows
Mailman> LHLO mailman-host HyperKitty> 250 OK Mailman> MAIL FROM Mailman@mailman-host HyperKitty> 250 OK Mailman> RCPT TO <permalink-variable-part>@archiver-host HyperKitty> 553 Permalink already occupied Mailman> RCPT TO <new-permalink-variable-part>@archiver-host HyperKitty> 250 OK Mailman> DATA HyperKitty> 354 Go for it!
and so on. I don't think this even violates the spirit of the LMTP protocol, but it certainly conforms to the letter as long as permalink variable parts are valid email localparts. (One could quibble about which 5xx response to give. AFAICS only "551 user not local" is out.)
My own preference is for a permalink that can be computed from the originator header data (author, recipients, date, message ID, subject) by anyone with access to the message, and that means you need the archive server to be able to deal gracefully with collisions. (In practice message IDs are not perfect UUIDs, although they're very close, and some messages don't have them or have different ones assigned by mediating hosts at arrival at multiple recipients.)
Steve