
"JCL" == J C Lawrence <claw@varesearch.com> writes:
JCL> I believe I've found out how to reliably reproduce the
JCL> performance problemsI've noticed here at VA and at Kanga.Nu,
JCL> and which Barry and another (forget name, sorry) have
JCL> observed as well:
JCL> 1) Create a moderated list.
JCL> 2) Subscribe 200 addresses to the list (can be bogus
JCL> addresses but the local MTA must accept them)
JCL> 3) Post at least 30 messages of an average of at least 2K
JCL> size to the list.
JCL> 4) Go to the moderation page, approve every message, and hit
JCL> submit.
JCL> 5) Watch your system load peg and stay there for an
JCL> obscenely long time.
Just a quick note 'cause I have very little time. I'm currently seeing python.org massively pegged, and Guido and I were talking about some Python tools we'd like to develop that would help debug situations like this. What I wanted was something like gdb's ability to attach to and print stack traces of running external programs. We got into some brainstorming and came up with A Certified Very Cool Trick[1].
This yielded a traceback for where at least two pegged processes are spinning. Seems to make sense, but I'm not very familar with the archiving guts, so I post this traceback to spur some discussion. Maybe Scott or Harald can craft a fix.
Here's the traceback:
Looks like the archiver is doing way too much work for every message it has to process. When python.org came back up today, it got slammed with incoming mail for a bazillion lists. Each message spins in this HyperDatabase.clearIndex() loop.
-Barry
[1] CVCT:
Use gdb to attach to the running Python program, then type this at the gdb prompt:
(gdb) call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()")
Sitting in /tmp/tb will be the stack trace of where the Python program was when you stopped it. There's reason to believe this will not always work, but it likely will, and you can even detach the program and let it continue on.