--On 2 June 2009 10:11:44 -0400 Bob Puff <bob@nleaudio.com> wrote:
---------- Original Message ----------- From: Ian Eiloart <iane@sussex.ac.uk> To: Bob Puff <bob@nleaudio.com>, Mailman-Developers@python.org Sent: Tue, 02 Jun 2009 10:49:39 +0100 Subject: Re: [Mailman-Developers] very large lists
--On 2 June 2009 00:16:39 -0400 Bob Puff <bob@nleaudio.com> wrote:
Hi gang,
I have a customer who wants a one-way distribution list set up that will handle millions of recipients... like 10 million. It is a double opt-in, and its not spam.. <g>
I'm thinking this is probably too much for MM 2 to handle. Will MM3 be able to scale to this size?
If anyone has suggestions for me for current software that can do this, feel free to email me off-list. Bounce processing is important, as is user management.
You've not specified the problem very clearly. How frequently do you expect to send messages to this list? Do you have several lists? How urgent are the messages? Do they need to be delivered in the space of a few minutes, hours, or days?
Hi Ian,
The problem I've seen is that with the "large" lists I have now (up to 10,000), python is the dominate process taking up time. I recall there being some discussion about how the list data files are locked during bounce processing, preventing parallel processes from doing much.
If your messages aren't personalised, you can do the verp in your MTA. Then your Mailman server could deliver one message per recipient _domain_, instead of one per recipient. And, if you're out of server resource, you could build a cluster. If it's CPU bound, then multiple queue runners may help if you have multiple processor cores.
We got huge improvements delivering mail when we introduced parallel q runners. Our problem was that delivery to small but time sensitive lists was held up when someone sent mail to a large list. So, it still takes a while to deliver to a large list, but other jobs aren't held up.
I'm not sure its physically possible to deliver a million emails in a matter of a few minutes or even a few hours, unless its a bot-net! :-) Even on a 100mb link, I would expect more on the order of many hours. Messages would be sent at a maximum of one per week.
Well, you didn't say that your client wasn't gmail, for example!
If you don't need personalised messages, then you can do the VERP in your MTA. That should make it much more feasible. Configure Mailman with several parallel queue runners to prevent messages to large lists from holding up messages to small lists.
What and where would be the config statement specifying the number of queue runners?
we have this, in mm_cfg.py:
# Which queues should the qrunner master watchdog spawn? This is a list of # 2-tuples containing the name of the qrunner class (which must live in a # module of the same name within the Mailman.Queue package), and the number of # parallel processes to fork for each qrunner. If more than one process is # used, each will take an equal subdivision of the hash space.
# BAW: Eventually we may support weighted hash spaces. # BAW: Although not enforced, the # of slices must be a power of 2
QRUNNERS = [ ('ArchRunner', 4), # messages for the archiver ('BounceRunner', 4), # for processing the qfile/bounces directory ('CommandRunner', 4), # commands and bounces from the outside world ('IncomingRunner', 4), # posts from the outside world ('NewsRunner', 4), # outgoing messages to the nntpd ('OutgoingRunner', 32), # outgoing messages to the smtpd ('VirginRunner', 4), # internally crafted (virgin birth) messages ('RetryRunner', 4), # retry temporarily failed deliveries ]
Bob
-- Ian Eiloart IT Services, University of Sussex 01273-873148 x3148 For new support requests, see http://www.sussex.ac.uk/its/help/