Hello world,
I have a very strange performance problem which only affects one small announce-only list with approximately 11000 recipients: The smtp logfile shows that it takes Mailman about 8400 seconds to deliver the mails, which just doesn't make sense.
Setup: Mailman is configured to deliver outgoing mails to a dedicated Postfix smtpd(8) daemon listening on port 10031. That daemon is configured with all the usual stuff, no DNS lookups, no recipient verification, no pre-queue filters, dedicated DNS caches and so on (if it really matters, I can post the complete configuration to this list). VERP and personalization are turned off on the Mailman side. Postfix version is 2.8-20100213, but I ran a quick test with 2.6.5 just to be safe, and it doesn't change anything. Mailman version is 2.1.11 from Debian/stable.
After I first noticed the problem, I checked the logs - nothing suspicious there. So I decided to to take some TCP captures.
For all other lists on this server, the conversation between Postfix and Mailman is very fast paced, but for that one list, it takes almost one second for a recipient to be specified (which is then acknowledged immediately by Postfix).
I really don't have any idea where I coul start debugging, or how. Normally it is the MTAs performance that people need to worry about, but that particular mailserver isn't busy at all, not even handling 550k messages per day. Posting to all other lists only takes a fraction of time, even ones that are much larger. There are no old queue files around, no exceptions being thrown, no process running at 100% CPU load during delivery, nothing shunted - it's just slow.
As a quick workaround, I've increased the overall parallelism, as Ian Eiloart pointed out in [1], to ensure that the one slow list doesn't block anything, so the issue isn't really "top priority" - I'd be very grateful for any hints, though.
Stefan
[1] http://mail.python.org/pipermail/mailman-developers/2009-June/020643.html
On 2/20/2010 4:21 AM, Stefan Foerster wrote:
For all other lists on this server, the conversation between Postfix and Mailman is very fast paced, but for that one list, it takes almost one second for a recipient to be specified (which is then acknowledged immediately by Postfix).
So, without VERP or personalization, you should be seeing SMTP transactions that look like
HELO response MAIL FROM response RCPT TO response (repeated for up to SMTP_MAX_RCPTS recipients) DATA response (message data) (MAIL FROM through DATA repeats until all recipients are delivered) QUIT
And, if I understand what you're saying, the delay is in the RCPT TO/response loop and it occurs between the response and the next RCPT TO.
This is really wierd. There is not even any Mailman code involved in this. The entire sequence from MAIL FROM to end of DATA is done by one call to the Python smtplib.SMTP.sendmail() method.
There is nothing list specific other than the envelope sender in the sendmail() call/MAIL FROM command in Mailman's interaction with smtplib, and if it were related somehow to that, I would expect the delay to be in Postfix between the RCPT TO and response.
I really don't have any idea where I coul start debugging, or how.
Nor do I really. You could look at the FAQ at <http://wiki.list.org/x/-IA9> for the way to enable smtplib debugging (as noted in the FAQ, only for Python 2.4.x and newer). This will produce voluminous Mailman error log output which may help pinpoint where in smtplib.py the delay is, but probably the time-stamp granularity is not fine enough.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
- Mark Sapiro <mark@msapiro.net>:
On 2/20/2010 4:21 AM, Stefan Foerster wrote: So, without VERP or personalization, you should be seeing SMTP transactions that look like
HELO response MAIL FROM response RCPT TO response (repeated for up to SMTP_MAX_RCPTS recipients) DATA response (message data) (MAIL FROM through DATA repeats until all recipients are delivered) QUIT
Yes. I assume you wanted me to check if there were any errors in this dialogue - but there are none. Neither logfiles from the Python smtplib, nor the the Postfix logs. The TCP caputure didn't show any errors, too. And yes, I verified that there really isn't any VERP or personalization involved (again, I read Postfix logs ("nrcpt=<large number">) and TCP streams).
And, if I understand what you're saying, the delay is in the RCPT TO/response loop and it occurs between the response and the next RCPT TO.
Yes. From debuglevel(1) logs:
Feb 20 19:03:15 2010 qrunner(7551): send: 'rcpt TO:<recipient1@example.com>\r\n' Feb 20 19:03:15 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n' Feb 20 19:03:15 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok Feb 20 19:03:17 2010 qrunner(7551): send: 'rcpt TO:<recipient2@example.com>\r\n' Feb 20 19:03:17 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n' Feb 20 19:03:17 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok
If you want me to, I can gather detailed timing data with tcpdump and/or wireshark.
I really don't have any idea where I coul start debugging, or how.
Nor do I really.
Humurous remark: "That's not what you want to hear from the guy who actually wrote the application!" ;-)
I'm running out of ideas. My Postfix smtpd(8) for mailman looks like that:
127.0.0.1:10031 inet n - - - - smtpd -o mynetworks=127.0.0.0/8 -o content_filter= -o smtpd_proxy_filter= -o receive_override_options=no_header_body_checks,no_address_mappings,no_unknown_recipient_checks -o smtpd_client_connection_count_limit=0 -o smtpd_client_connection_rate_limit=0 -o smtpd_error_sleep_time=0 -o smtpd_soft_error_limit=1001 -o smtpd_hard_error_limit=1000 -o smtpd_restriction_classes= -o smtpd_client_restrictions= -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o smtpd_data_restrictions= -o smtpd_end_of_data_restrictions= -o smtpd_authorized_xforward_hosts=127.0.0.0/8 -o syslog_name=postfix-mm
No magic involved here. My mm_cfg.py:
MAILMAN_SITE_LIST = 'mailman' DEFAULT_URL_PATTERN = 'http://%s/mailman/' PRIVATE_ARCHIVE_URL = '/mailman/private' IMAGE_LOGOS = '/images/mailman/' DEFAULT_EMAIL_HOST = 'lists.example.com' DEFAULT_URL_HOST = 'lists.example.com' add_virtualhost(DEFAULT_URL_HOST, DEFAULT_EMAIL_HOST) DEFAULT_SERVER_LANGUAGE = 'en' DEFAULT_SEND_REMINDERS = 0 USE_ENVELOPE_SENDER = 0 MTA=None # Misnomer, suppresses alias output on newlist DEB_LISTMASTER='postmaster@example.net' SMTPPORT = 10031
I am not able to find anything that is specific to this list, and I'm not sure where I could look further.
Stefan
On 2/20/2010 10:27 AM, Stefan Foerster wrote:
Yes. From debuglevel(1) logs:
Feb 20 19:03:15 2010 qrunner(7551): send: 'rcpt TO:<recipient1@example.com>\r\n' Feb 20 19:03:15 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n' Feb 20 19:03:15 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok Feb 20 19:03:17 2010 qrunner(7551): send: 'rcpt TO:<recipient2@example.com>\r\n' Feb 20 19:03:17 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n' Feb 20 19:03:17 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok
So, in the above we see greater than 1 second between
reply: retcode (250); Msg: 2.1.5 Ok
and the next
send: 'rcpt TO:<recipient2@example.com>\r\n'
but virtually nothing occurs between those two events. We are in the sendmail method in a for loop over the recipient list. The first of those two messages is written at the end of getreply() which returns to rcpt() which returns to the for loop which checks the status and calls rcpt() again with the next recipient. rcpt() calls putcmd() which calls send() which writes the second message before doing anything else. There are no system calls of any kind (other than writing the messages themselves, but the delay exists without logging) in between those two messages.
If you want me to, I can gather detailed timing data with tcpdump and/or wireshark.
Presumably it will just show the delay between the response to one RCPT TO and the sending of the next RCPT TO. The delay in the above log narrows it even further.
And none of this is list specific, yet it only affects one list.
You could try strace or ?? on the OutgoingRunner, but I don't know what that might show beyond what we already know.
Does this delay occur uniformly over the entire list, or only within some group of recipients?
You could try running OutgoingRunner with Python's trace module <http://docs.python.org/library/trace.html#command-line-usage>, e.g.
python -m trace [trace opts] bin/qrunner --runner=OutgoingRunner:0:1
To do this, you'd probably want to stop OutgoingRunner(s), post to the list and then stop Mailman so you have only the one message to this list in the out/ queue, and then run the trace as above, but I would only do this as a last ditch effort, because I'm not sure it would be helpful.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
- Mark Sapiro <mark@msapiro.net>:
Does this delay occur uniformly over the entire list, or only within some group of recipients?
It occurs for all recipients, more or less - sometimes, it gets about 5 recipients done per second, but that's still far too slow.
You could try running OutgoingRunner with Python's trace module <http://docs.python.org/library/trace.html#command-line-usage>, e.g.
python -m trace [trace opts] bin/qrunner --runner=OutgoingRunner:0:1
To do this, you'd probably want to stop OutgoingRunner(s), post to the list and then stop Mailman so you have only the one message to this list in the out/ queue, and then run the trace as above, but I would only do this as a last ditch effort, because I'm not sure it would be helpful.
I fear I've got a decision to make here: To "fix" that problem, I'd normally simply export the recipient list and recreate the mailing list thereafter. But since we don't know what causes this behaviour, I can't be sure that my backups include all files I need to recreate that problem on a different machine for debugging purposes.
So, if you are personally interested in this, I would talk to a lawyer to find a way how I can legally provide you with a copy of every file that is in any way related to this list.
If you are not _that_ interested, I'd just go ahead and wipe that list (and cross fingers).
Thank you for your time and your insightful comments.
Stefan
- Stefan Foerster <cite+mailman-users@incertum.net>:
I fear I've got a decision to make here: To "fix" that problem, I'd normally simply export the recipient list and recreate the mailing list thereafter.
Is this guaranteed to help? Have you tried this?
-- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebrandt@charite.de | http://www.charite.de
On Feb 20, 2010, at 09:56 PM, Stefan Foerster wrote:
I fear I've got a decision to make here: To "fix" that problem, I'd normally simply export the recipient list and recreate the mailing list thereafter. But since we don't know what causes this behaviour, I can't be sure that my backups include all files I need to recreate that problem on a different machine for debugging purposes.
So, if you are personally interested in this, I would talk to a lawyer to find a way how I can legally provide you with a copy of every file that is in any way related to this list.
If you are not _that_ interested, I'd just go ahead and wipe that list (and cross fingers).
Thank you for your time and your insightful comments.
Have you tried any of the Postfix debugging strategies?
http://www.postfix.org/DEBUG_README.html
-Barry
- Barry Warsaw <barry@python.org>:
Have you tried any of the Postfix debugging strategies?
Yes he did. Stefan usually knows what he's doing :)
-- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebrandt@charite.de | http://www.charite.de
On Feb 20, 2010, at 10:17 PM, Ralf Hildebrandt wrote:
Have you tried any of the Postfix debugging strategies?
Yes he did. Stefan usually knows what he's doing :)
Ah, sorry about that!
culling-inbox-during-pycon-talk-ly y'rs, -Barry
- Barry Warsaw <barry@python.org>:
On Feb 20, 2010, at 09:56 PM, Stefan Foerster wrote:
So, if you are personally interested in this, I would talk to a lawyer to find a way how I can legally provide you with a copy of every file that is in any way related to this list.
If you are not _that_ interested, I'd just go ahead and wipe that list (and cross fingers).
Thank you for your time and your insightful comments.
Have you tried any of the Postfix debugging strategies?
As expected, Postfix is not the culprit. Delivery to smtp-sink is running at the speed of molasses, too.
Stefan
On Feb 20, 2010, at 11:01 PM, Stefan Foerster wrote:
As expected, Postfix is not the culprit. Delivery to smtp-sink is running at the speed of molasses, too.
Now this is getting interesting <wink>.
http://mail.python.org/pipermail/mailman-users/2010-February/068829.html
has some perplexing numbers. If you're really seeing a 2 second delay between the reading of one RCPT reply to the next, then this points to problems in Python or its smtplib module. I did a quick search through the Python bug tracker and nothing jumped out at me.
As Mark said, Mailman basically just calls SMTP.sendmail() to send the message to each chunk of recipients. The part of that method that sends the RCPTs to Postfix is this code (in Py2.6):
for each in to_addrs:
(code,resp)=self.rcpt(each, rcpt_options)
if (code != 250) and (code != 251):
senderrs[each]=(code,resp)
It's hard to see what would cause that loop to sit there between the 19:03:15 retcode and 19:03:17 send. You're not even touching the socket between these calls. Looking at putcmd() and getreply() and the way they're called, I just don't see any opportunity for hanging. I suppose it's possible you're setting Python issues, but that doesn't really explain why it would affect only this list.
I probably missed it but what platform are you running on? What version of Python?
I see that you've worked around the problem, which of course only adds oddness. If you're still able and interested in debugging this, I can think of a couple of things to do. Let me know and I'll lay out a few ideas.
-Barry
On 2/20/2010 12:56 PM, Stefan Foerster wrote:
I fear I've got a decision to make here: To "fix" that problem, I'd normally simply export the recipient list and recreate the mailing list thereafter. But since we don't know what causes this behaviour, I can't be sure that my backups include all files I need to recreate that problem on a different machine for debugging purposes.
Assuming your list doesn't use any custom MemberAdaptor, the lists/LISTNAME/config.pck file is the only list specific thing that could be involved. These get continuously updated, bet since the problem is persistent, any one since the problem started should do.
OF course, if you just drop this config.pck into some other Mailman installation for testing, there's no guarantee you'd see the problem. At a minimum, you'd want the same Mailman version and Python version. I think you said you'd tried a different Postfix and it didn't change things.
So, if you are personally interested in this, I would talk to a lawyer to find a way how I can legally provide you with a copy of every file that is in any way related to this list.
As I said, I think it would just be the config.pck. Everything else is open source software, but I don't think I want it. It's not that I'm not curious because I definitely am, but I don't want to accidentally send mail to any of the list members. I suppose I could just create a pseudo MTA to listen on the SMTPPORT you use and just respond with 250 to every message.
Actually, you could try that too and see what it does with your list. I'll make a little Python script for that.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Feb 20, 2010, at 01:27 PM, Mark Sapiro wrote:
As I said, I think it would just be the config.pck. Everything else is open source software, but I don't think I want it. It's not that I'm not curious because I definitely am, but I don't want to accidentally send mail to any of the list members. I suppose I could just create a pseudo MTA to listen on the SMTPPORT you use and just respond with 250 to every message.
Actually, you could try that too and see what it does with your list. I'll make a little Python script for that.
Take a look at lazr.smtptest, which is what MM3 uses in its test framework.
https://edge.launchpad.net/lazr.smtptest
-Barry
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2/20/2010 1:38 PM, Barry Warsaw wrote:
Take a look at lazr.smtptest, which is what MM3 uses in its test framework.
Thanks Barry,
That's helpful.
Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32)
iEYEARECAAYFAkuAW38ACgkQVVuXXpU7hpPjAwCgt/hrFoAVDODerkHiYjdKFnWy XI4An2+HFKurip8yffCOdEJeHx/BkMBO =6tdf -----END PGP SIGNATURE-----
On Feb 20, 2010, at 3:27 PM, Mark Sapiro wrote:
As I said, I think it would just be the config.pck. Everything else is open source software, but I don't think I want it. It's not that I'm not curious because I definitely am, but I don't want to accidentally send mail to any of the list members.
Another test would be to break up the large list into a number of smaller sub-lists with an umbrella list. That would allow Mailman to have a lot more internal parallelism, and not get into lock synchronization issues over config.pck. You could also run multiple sets of qrunners, if you split things correctly according to the "powers of 2" rule.
I suppose I could just create a pseudo
MTA to listen on the SMTPPORT you use and just respond with 250 to every message.
Actually, you could try that too and see what it does with your list. I'll make a little Python script for that.
Much simpler solution here is to use the "smtpsink" program that Wietse supplies as part of the test harness for postfix.
-- Brad Knowles <bradknowles@shub-internet.org> LinkedIn Profile: <http://tinyurl.com/y8kpxu>
- Brad Knowles <brad@shub-internet.org>:
On Feb 20, 2010, at 3:27 PM, Mark Sapiro wrote:
As I said, I think it would just be the config.pck. Everything else is open source software, but I don't think I want it. It's not that I'm not curious because I definitely am, but I don't want to accidentally send mail to any of the list members.
Another test would be to break up the large list into a number of smaller sub-lists with an umbrella list. That would allow Mailman to have a lot more internal parallelism, and not get into lock synchronization issues over config.pck. You could also run multiple sets of qrunners, if you split things correctly according to the "powers of 2" rule.
What is a "smaller sub-list"? The list in question does only hold 11k recipients, which is not exactly large. Some off my SVN announce lists are much larger.
Stefan
Hi!
...[very slow list]...
Seen from a very abstract standpoint, is there some pattern in the adresses to send? Some sorting algorithms go completely bonkers if fed with the wrong kind of pre-sorted or patterned input list. I did NOT look into this (not knowing enough python yet), but having studied math and computer science (long time ago) I've seen examples of those.
Just an idea - may be of topic - Stucki
--
Christoph von Stuckrad * * |nickname |Mail <stucki@mi.fu-berlin.de>
Freie Universitaet Berlin |/_*|'stucki' |Tel(Mo.,Mi.):+49 30 838-75 459|
Mathematik & Informatik EDV |\ *|if online| (Di,Do,Fr):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(home): +49 30 77 39 6601/
On Feb 20, 2010, at 4:15 PM, Stefan Foerster wrote:
What is a "smaller sub-list"? The list in question does only hold 11k recipients, which is not exactly large. Some off my SVN announce lists are much larger.
Yeah, but an announce-only list that is larger doesn't really compare to a discussion list which is smaller. The smaller discussion list is likely to be much more active, and if you multiply the number of unique messages posted to the list by the number of subscribers, you may find that the smaller discussion list actually results in considerably more traffic than the larger announce-only list.
Now, I'm not saying that this is definitely the case. And for just 11k users, it does seem unlikely. But this is a possibility that would be useful to eliminate.
In your case, I think I'd probably first try the multiple queue-runners thing by doing powers-of-2 splits. Regretfully, this is not well documented, but I think there is one or two FAQ Wiki questions that discuss it.
-- Brad Knowles <bradknowles@shub-internet.org> LinkedIn Profile: <http://tinyurl.com/y8kpxu>
- Mark Sapiro <mark@msapiro.net>:
On 2/20/2010 12:56 PM, Stefan Foerster wrote:
So, if you are personally interested in this, I would talk to a lawyer to find a way how I can legally provide you with a copy of every file that is in any way related to this list.
As I said, I think it would just be the config.pck. Everything else is open source software, but I don't think I want it. It's not that I'm not curious because I definitely am, but I don't want to accidentally send mail to any of the list members. I suppose I could just create a pseudo MTA to listen on the SMTPPORT you use and just respond with 250 to every message.
A plan! I will have a look at the wiki to see how I go about moving a list to another host. Then, tomorrow morning (it's 11pm here), I'll setup a VM, install the same set of packages and copy over anything that is closely mailman related.
Actually, you could try that too and see what it does with your list. I'll make a little Python script for that.
I did the testing with smtp-sink, which is running just fine. I'll report back if I can reproduce the problem on the virtual machine.
Ralf just had the idea to cimpare the output from "config_list" to that of another announce-only list, but apart from owners, messages and the ususal stuff, there is absolutely no difference.
Stefan
- Stefan Foerster <cite+mailman-users@incertum.net>:
- Mark Sapiro <mark@msapiro.net>:
As I said, I think it would just be the config.pck. Everything else is open source software, but I don't think I want it. It's not that I'm not curious because I definitely am, but I don't want to accidentally send mail to any of the list members. I suppose I could just create a pseudo MTA to listen on the SMTPPORT you use and just respond with 250 to every message.
A plan! I will have a look at the wiki to see how I go about moving a list to another host. Then, tomorrow morning (it's 11pm here), I'll setup a VM, install the same set of packages and copy over anything that is closely mailman related.
Bad news. I was not able to reproduce the problem on a VM, using backups from the day the problem first occured. And worse, this night, while I slept a troubled, disturbed sleep, dreaming of SMTP dialogues, the list roster changed (one new member)- and the problem is gone.
I've been doing system administration tasks since 1997, and this still feels more like voodoo than science, sometimes.
Stefan
On 2/21/2010 2:15 AM, Stefan Foerster wrote:
Bad news. I was not able to reproduce the problem on a VM, using backups from the day the problem first occured. And worse, this night, while I slept a troubled, disturbed sleep, dreaming of SMTP dialogues, the list roster changed (one new member)- and the problem is gone.
Since I couldn't understand what possibly caused the problem in the first place, I'm not totally surprised.
I've been doing system administration tasks since 1997, and this still feels more like voodoo than science, sometimes.
Yes, it does, but I've found that there usually is an explanation. It's just that finding it may not be easy.
You could try taking the problem list's config.pck from the backup and dropping that into a lists/ directory with a different name (effectively creating a new list with the exact configuration of the old one).
Then you could install this withlist script in Mailman's bin/ directory as bin/test_smtp.py
from Mailman import mm_cfg mm_cfg.SMTPPORT = 10123 # or whatever you want from Mailman import Message from Mailman.Handlers import CalcRecips from Mailman.Handlers import SMTPDirect
def test_smtp(mlist): msg = Message.Message() msg['From'] = 'the usual poster to the list' msg.set_payload('message body') msgdata = {} CalcRecips.process(mlist, msg, msgdata) SMTPDirect.process(mlist, msg, msgdata)
And then run smtp-sink on the port defined in the script and run
bin/withlist -r test_smtp listname
where listname is the name of the new lists/ directory into which you put the old config.pck. This will short circuit a lot of Mailman stuff and strip it down to building the recipient list and sending the mail to the smtp-sink port.
Possibly you've already done something similar, but this would give you a low impact way to determine if you can duplicate the problem with the old config.pck.
Unfortunately, I don't have any good ideas as to how to proceed from there, even if this does duplicate the problem, but Barry indicated he has a couple of ideas.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
- Mark Sapiro <mark@msapiro.net>:
On 2/21/2010 2:15 AM, Stefan Foerster wrote:
Bad news. I was not able to reproduce the problem on a VM, using backups from the day the problem first occured. And worse, this night, while I slept a troubled, disturbed sleep, dreaming of SMTP dialogues, the list roster changed (one new member)- and the problem is gone.
Since I couldn't understand what possibly caused the problem in the first place, I'm not totally surprised.
Good news (kinda) - another list on that server just started to slow down, and this time, it is a very unimportant and small list (472 members, 466 of them have mail delivery enabled), so I can take all the time in the world to try and debug this issue.
[instructions for list duplication /SMTP redirection]
Unfortunately, I don't have any good ideas as to how to proceed from there, even if this does duplicate the problem, but Barry indicated he has a couple of ideas.
Well, unfortunately, this doesn't reproduce the problem. Neither does stopping Mailman and copying every single file to another server. However, restarting Mailman (something I don't do very often) does _not_ solve the problem, either.
Do you think I can drop Barry a PM off-list and ask him for further advice if he doesn't read this? I'm really interested in debugging this, and as I said, this time I really don't care about the list delivery being slow.
Stefan
Stefan Foerster wrote:
- Mark Sapiro <mark@msapiro.net>:
[instructions for list duplication /SMTP redirection]
Unfortunately, I don't have any good ideas as to how to proceed from there, even if this does duplicate the problem, but Barry indicated he has a couple of ideas.
Well, unfortunately, this doesn't reproduce the problem. Neither does stopping Mailman and copying every single file to another server. However, restarting Mailman (something I don't do very often) does _not_ solve the problem, either.
Can you update/upgrade or simply reinstall Python on this server? The delays you observed _must_ be occurring in the Python interpreter itself, but this seems _impossible_ since the interpreter shouldn't be affected by which list or a change in list membership.
I wonder if there could somehow be some interaction through the file system.
Do you think I can drop Barry a PM off-list and ask him for further advice if he doesn't read this? I'm really interested in debugging this, and as I said, this time I really don't care about the list delivery being slow.
Barry is often on the #mailman irc channel at freenode.net. It might be best to ping him there.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mar 10, 2010, at 05:51 AM, Stefan Foerster wrote:
Good news (kinda) - another list on that server just started to slow down, and this time, it is a very unimportant and small list (472 members, 466 of them have mail delivery enabled), so I can take all the time in the world to try and debug this issue.
I agree with Mark that this sounds like a problem with the Python interpreter. I just don't see what could be causing Mailman to slow down. I think if you want to continue to debug this, it will involve hacking SMTPDirect.py or replacing it with a simpler but instrumented handler for the list in question. Do you want to go down that route? (Installing say Python 2.6.5 and rebuilding Mailman might be an easier first step.)
Do you think I can drop Barry a PM off-list and ask him for further advice if he doesn't read this? I'm really interested in debugging this, and as I said, this time I really don't care about the list delivery being slow.
I read this list, but usually just skim it and sometimes it can take a long while to respond.
-Barry
participants (7)
-
Barry Warsaw
-
Barry Warsaw
-
Brad Knowles
-
Chr. von Stuckrad
-
Mark Sapiro
-
Ralf Hildebrandt
-
Stefan Foerster