Paul Moore <p.f.moore@gmail.com> writes:
Presumably, you're inserting a pskill command somewhere into the actual build process. I don't know much about buildbot, but I thought that was controlled by the master and/or the Python build scripts, neither of which I can change.
If I want to add a pskill command just after a build/test has run (i.e., about where kill_python runs at the moment) how do I do that?
I haven't been able to - as you say there's no good way to hook into the build process in real time as the changes have to be external or they'll get zapped on the next checkout. I suppose you could rapidly try to monitor the output of the build slave log file, but then you risk killing a process from a next step if you miss something or are too slow. And I've had cases (after long periods of continuous runtime) where the build slave log stops being generated even while the slave is running fine. Anyway, in the absence of changes to the build tree, I finally gave up and now run an external script (see below) that whacks any python_d process it finds running for more than 2 hours (arbitrary choice). I considered trying to dig deeper to identify processes with no logical test parent (more similar to the build kill_python itself), but decided it was too much effort for the minimal extra gain. So not terribly different from your once a day pskill, though as you say if you arbitrarily kill all python_d processes at any given point in time, you risk interrupting an active test. So the AutoIt script covers pop-ups and the script below cleans up hung processes. On the subject of pop-ups, I'm not sure but if you find your service not showing them try enabling the "Allow service to interact with the desktop" option in the service definition. In my experience though if a service can't perform a UI interaction, the interaction just fails, so I wouldn't expect the process to get stuck in that case. Anyway, in my case the kill script itself is Cygwin/bash based, but using the pstools tools, and conceptually just kills (pskill) any python_d process identified as having been running for 2 or more hours of wall time (via pslist): - - - - - - - - - - - - - - - - - - - - - - - - - #!/bin/sh # # kill_python.sh # # Quick 'n dirty script to watch for python_d processes that exceed a few # hours of runtime, then kill then assuming they're hung # PROC="python_d" TIMEOUT="2" while [ 1 ]; do echo "`date` Checking..." PIDS=`pslist 2>&1 | grep "^$PROC" | awk -v TIMEOUT=$TIMEOUT '{split($NF,fields,":"); if (int(fields[1]) >= int(TIMEOUT)) {print $2}}'` if [ "$PIDS" ]; then echo ===== `date` for pid in $PIDS; do pslist $pid 2>&1 | grep "^$PROC" pskill $pid done echo ===== fi sleep 300 done - - - - - - - - - - - - - - - - - - - - - - - - - It's a kludge, but as you say, for us to impose this on the build slave side requires it to be outside of the build tree. I've been running it for about a month now and it seems to be doing the job. I run a similar script on OSX (my Tiger slave also sometimes sees stuck processes, though they just burn CPU rather than interfere with tests), but there I can identify stranded python_d processes if they are owned by init, so the script can react more quickly. I'm pretty sure the best long term fix is to move the kill processing into the clean script (as per issue 9973) rather than where it currently is in the build script, but so far I don't think the idea has been able to attract the interest of anyone who can actually commit such a change. (See also the Dec continuation of this thread - http://www.mail-archive.com/python-dev@python.org/msg54389.html) I had also created issue 10641 from when I thought I found a problem with kill_python, but that turned out incorrect, and in subsequent tests kill_python in the build tree always worked, so the core issue seems to always be the failure to run it at all as opposed to it not working. For now though, these two external "monitors" seem to have helped contain the number of manual operations I have to do on my two Windows slaves. (Though recently I've begun seeing two new sorts of pop-ups under Windows 7 but both related to memory, so I think I just need to give my VM a little more memory) -- David