I've been doing some (limited) performance testing lately, and I wanted
to share some numbers and get some feedback. I've also been having fun
re-reading some old mm-dev threads related to performance. :)
I'm specifically looking for places to improve Mailman's raw
throughput. I understand that MTA tuning can have a huge impact on the
system, but I think that subject's been hashed out quite well in the
past. On the table are anything from low-hanging fruit hacks to
mm3-level redesigns. What I actually can implement all depends. :)
I've been testing the following set up:
- Postfix 2.0.9 configured with a special test transport such that all
email @example.com gets dd'd to /dev/null
- Postfix running on the same machine as Mailman 2.1.2+ and a second
test with Postfix (similarly configured) on a separate, very
unloaded, but less beefy machine sitting next to me on a 100Mb
- RH9 2.4.20-9 kernel, 863Mhz Dell PIII, 512MB (1723 bogomips), ext3,
a WDC IDE drive of some 2 y.o. vintage.
- Python 2.2.2 built from source
My list consists of 8000 members like abcdefg(a)example.com where the
localpart varies randomly. I've tried deliveries of 10KB, 50KB, 220KB,
1MB of text/plain and a 220KB multipart/related snapshot of a web page
. I have VERP and personalization both turned on. I started looking
at memory usage, but I'm not so concerned about that now. It may be
something to address later but I think it's "reasonable".
First the (approximate) numbers. All deliveries are to 8000 members,
each with their own personalized copy. SMTP_MAX_RCPTS is 500 unless
otherwise specified (minimal impact seemingly).
msgsz type time msg/hour
----- ---- ---- --------
10k plain/text 6min 80k/hr
50k p/t 9.5min 50k/hr (SMTP_MAX_RCPTS=5)
220k p/t 24m 20k/hr
1MB p/t 105m 4500/hr
220k m/related 44m 10k/hr
220k m/related 41m 11k/hr (SMTP_MAX_RCPTS=5)
220k m/related 46m 10k/hr (remote MTA)
A few high-level bits:
- Disk i/o probably isn't much of an issue. Once the message hits the
out qrunner, it's only two files and all the personalization weaving
happens in memory just before the message goes out on the socket.
Since using a remote MTA was actually slightly slower, I'm guessing
that MTA overhead in the /dev/null pipe is actually minimal (the
remote machine is a 500MHz, 128 KB, 999 bogomips, mostly idle).
- email.Generator.Generator (and email.Parser.Parser) are good
candidates for optimization. You can see that with the 220KB
messages, the fact that one has structure and the other doesn't,
affects performance significantly. That doesn't surprise me. ;)
- Even so, a factor of 100 in message size has a 20x hit on
performance. Part of that may be the way the personalization
weaving gets done. Right now, we make a copy.deepcopy() of the
original message object model, then poke in the personalization
parts in the headers and such, then do all the complex stuff in
Decorate.py (footers, headers, etc.), then generate the flat text.
Maybe we can speed things up by converting the message to flat text
as early as possible and just doing string substitution at the point
- What kind of a hit does the memberdb-in-a-pickle take? Would things
go faster if we stored the member data in a Berkeley, MySQL, or
other real database? I'd like to do some testing with my BDB member
code and I'm wondering if the folks working on other member adapters
have any performance feedback.
- XVERP might be interesting, but it seems useless for personalization.
- Do we win or lose with the process model, as compared to say, a
threading model? I've been wondering if our fears of the Python GIL
are unfounded. We could certainly reduce memory overhead by
multi-threading, and we might be able to leverage something like
Twisted, which is still in the back of my mind as a very cool way to
get multi-protocol support into Mailman.
- Does our "NFS-safe" locks impose too much complexity and overhead to
be worth it? Does anybody actually /use/ Mailman over NFS? Don't
we sorta suspect the LockFile implementation anyway? Would we be
better off using kernel file locks, or thread locks if we go to a MT
Okay, now I'm rambling. What is the lowest hanging fruit that we might
be able to attack? I'm up for any other ideas people have.
 wget -E -H -k -p -nH -nd -Pdownload <url>
followed by a little Python script to multipart/related it