[Python-Dev] Stable buildbots

Trent Nelson trent at snakebite.org
Tue Nov 23 09:40:50 CET 2010


On 14-Nov-10 3:48 AM, David Bolen wrote:
> This is a completely separate issue, though probably around just as
> long, and like the popup problem its frequency changes over time.  By
> "hung" here I'm referring to cases where something must go wrong with
> a test and/or its cleanup such that a python_d process remains
> running, usually several of them at the same time.

My guess: the "hung" (single-threaded) Python process has called 
select() without a timeout in order to wait for some data.  However, the 
data never arrives (due to a broken/failed test), and the select() never 
returns.

On Windows, processes seem harder to kill when they get into this state. 
  If I purposely wedge a Windows process via select() via the 
interactive interpreter, ctrl-c has absolutely no effect (whereas on 
Unix, ctrl-c will interrupt the select()).

As for why kill_python.exe doesn't seem to be able to kill said wedged 
processes, the MSDN documentation on TerminateProcess[1] states the 
following:

	The terminated process cannot exit until all
	pending I/O has been completed or canceled. (sic)

It's not unreasonable to assume a wedged select() constitutes pending 
I/O, so that's a possible explanation as to why kill_python.exe isn't 
able to terminate the processes.

(Also, kill_python currently assumes TerminateProcess() always works; 
perhaps this optimism is misplaced.  Also note the XXX TODO regarding 
the fact that we don't kill processes that have loaded our python*.dll, 
but may not be named python_d.exe.  I don't think that's the issue here, 
though.)

On 14-Nov-10 5:32 AM, David Bolen wrote:
 > "Martin v. Löwis"<martin at v.loewis.de>  writes:
 >
 >> This is what kill_python.exe is supposed to solve. So I recommend to
 >> investigate why it fails to kill the hanging Pythons.
 >
 > Yeah, I know, and I can't say I disagree in principle - not sure why
 > Windows doesn't let the kill in that module work (or if there's an
 > issue actually running it under all conditions).
 >
 > At the moment though, I do know that using the sysinternals pskill
 > utility externally (which is what I currently do interactively)
 > definitely works so to be honest,

That's interesting.  (That kill_python.exe doesn't kill the wedged 
processes, but pskill does.)  kill_python is pretty simple, it just 
calls TerminateProcess() after acquiring a handle with the relevant 
PROCESS_TERMINATE access right.  That being said, that's the recommended 
way to kill a process -- I doubt pskill would be going about it any 
differently (although, it is sysinternals... you never know what kind of 
crazy black magic it's doing behind the scenes).

Are you calling pskill with the -t flag? i.e. kill process and all 
dependents?  That might be the ticket, especially if killing the child 
process that wedged select() is waiting on causes it to return, and 
thus, makes it killable.

Otherwise, if it happens again, can you try kill_python.exe first, then 
pskill, and confirm if the former fails but the latter succeeds?

	Trent.


[1]: http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx


More information about the Python-Dev mailing list