[Mailman-Developers] Re: Introduction, FOSDEM, scaling down, latency, OpenPGP support

3 Mar 2024 · *much*

      Hi :)
"Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> writes:
...
Hi Justus!
...
Besides cleanups and bugfixes, there are three things I'd like to
do:

Improve Mailman to better scale down to small installations

Not sure what you can really do about that without rearchitecting.
The full suite of daemons is something like 13, including 3 WSGI
processes, the master daemon for mailman and about 7 or 8 runners.
But I'm pretty sure people have run Mailman 3 on a Raspberry Pi.  How
constrained an environment are you aiming for?
I had problems on my shared hoster that provided 1 gigabyte of RAM per
user (I'm not a 100% on how they measure that).  I first noticed the
problem because every now and then the OOM killer would kill a Mailman
runner process, and because of a bug in the master process [0] it wasn't
restarted, resulting in stalled mail processing with no indication,
quite frustrating.
0: https://gitlab.com/mailman/mailman/-/merge_requests/1094
And, while I fixed the reliability issue, seeing my small installations
(I'd be surprised if we see more than 1 message per day on average)
consume so much memory was frustrating [1].
1: https://gitlab.com/mailman/mailman/-/issues/1050
...
...

Improve latency of messages

What latency are you observing?  My last project was getting about
100,000 incoming per day across 20K lists, two incoming runners, 8
outgoing, 1 each for the other Mailman runners. Never saw more than
about 5 seconds dwell in the Mailman system, except when the Mailman
to outgoing Postfix SMTP connection started glitching.  We fixed that
by reconfiguring the Mailman host (in Dallas) to use an MX in the same
datacenter instead of one in Boston. (!!)  And the normal case with a
process where I'd do "ls queue/*" evey 5s was completely empty queues.
Stuff just didn't stay around long enough for ls to see it.
I see no reason to suppose you can do much better than that, but
again, tell me what you're seeing.  I'm not experienced in dealing
with Mailman at scale, and that host was quite beefy.  Still I have a
strong feeling that latency is mostly a communication with MTA issue,
not in Mailman 3 itself.
The latency may be currently small, in absolute terms, but this comes
at a considerable cost: the runners are polling their queues in loops.
My installations that hardly see any traffic at all are all doing: do I
have work, no, sleep 1, do I have work, no, sleep 1... I can see that
this will amortize in big installations, but for small ones this is
quite sad.
And even for big installations, or if we say that efficiency is not
important, if a mail goes through the hands of three queue runners, the
worst-case latency is three seconds in an otherwise idle installation!
We can definitively improve upon that.
The key insight here is that emails in queues don't appear out of thin
air, another runner is putting them there.  If each runner that goes to
sleep does so by waiting on a condition variable associated with its
queue, and every runner that deposits a mail into the queue signals the
sleeping runners, that latency goes away while at the same time
improving efficiency by no longer having to poll the queue every second.
...
...

Implement OpenPGP support

What does that mean?
OpenPGP can be used to provide confidentiality and integrity for email.
What exactly that means in the setting of mailing lists varies by threat
model and policy.  My prototype [2] simply records associations between
addresses and OpenPGP certificates by consuming Autocrypt headers [3]
and when sending an outgoing mail opportunistically encrypting it if a
certificate is known.  Details and future work in [2].
2: https://gitlab.com/mailman/mailman/-/merge_requests/1166
3: https://autocrypt.org
...
...
Here are the things I did so far:

I have Mailman running with runners in threads instead of
processes, but that is in a proof-of-concept stage at this
point and needs some cleaning up

I guess this is supposed to address the resource consumption (memory
footprint?) issue?
Yes.
...
After working with Mailman 3 and Postfix, I've become fond of the HUPD
(HUPD of Uncontrolled Proliferation of Daemons) model of application
design, at least for email.  I feel *much* more comfortable messing
with individual daemons this way, knowing that I can't affect the
others.  I'm not going to object to providing the threaded version if
people want it, but I would object to wholesale conversion to that
model without a lot of production experience based on it.
My prototype let's you chose, for every kind of runner, whether to use
the process or thread model, so it is actually a continuum between the
current model, and using threads for all runners (with the exception of
the REST runner, because gunicorn doesn't like to be run in the non-main
thread).
I don't quite buy (or maybe I'm not understanding the whole picture)
into the argument that having individual processes improves the
robustness of the whole system.
From my experience, having individual runners killed can render Mailman
unusable [0] (and to my then untrained eye it was impossible to see that
a runner was missing, if on the other hand Mailman would have been a
single process, or a significantly smaller number of processes, a single
missing process would have been more apparent), and when a runner has
picked up a mail from a queue, and then crashes, that mail is lost
forever (i.e. runner operations are not atomic).
...
...
(I understand that Mailman is a GNU project that wants copyright
assignments, and I have done that in the past for other GNU
projects, and would be happy to do that for Mailman, but at the
same time I feel like putting up *any* barrier to contributing is
unfortunate.)
My experience has been that about 2/3 of resistence has been to any
paperwork as such, only about 1/3 to assignment vs. some sort of
formal license ("contributor agreement", as the PSF calls it).
As far as Mailman is concerned, a lot of the core code has been
completely rewritten for Mailman 3.  However, I know that in
implementing Mailman 2 features not yet in Mailman 3 I've been at
least heavily influenced by Mailman 2 code.  Not sure that anybody
else has been particularly careful about "clean implementations",
although Barry has said that the core of Mailman 3 core is completely
rewritten from scratch.  In any case, the last time licensing was
discussed, the founder (John Viega) was not on board with a separation
from GNU and a permissive license, and Barry and I at least are pretty
sentimental about that.  For those reasons, I believe at at least this
generation of Mailman core devs is unlikely to move in that direction.
I have no issue with the license, and I don't want to open a can of
worms.  I merely observed little activity and was concerned about the
project dying, and wanted to mention that reducing barriers to
contributions may be a way to attract more developers and drive-by
contributions.
...
I will take a look at the work you mention, but it will be a couple of
weeks at least before I have useful comments.
Cool, thanks!
Best,
Justus

[Mailman-Developers] Re: Introduction, FOSDEM, scaling down, latency, OpenPGP support

Justus Winter