[Python-Dev] socketserver ForkingMixin waiting for child processes
rmsr at lab.net
Fri Aug 11 18:34:51 EDT 2017
I agree that blocking shutdown by default isn't a good idea. A child will
eventually get indefinitely stuck on a nonresponsive connection and hang
the whole server. This behavior change is surprising and should be reverted
in master, and definitely not backported.
As for block-timeout or block-timeout-kill, waiting more than zero seconds
in server_close() should be optional, because you're right that the best
timeout is circumstantial. Since ThreadingMixIn also leaks threads,
server_close() could grow a timeout flag (following the socket module
timeout convention) and maybe a terminate boolean. ThreadingMixIn could
then also be fixed. I'm not sure how useful that is though, since I'd bet
almost all users of socketserver exit the process shortly after
server_close(). Plus it can't be backported to the feature-freeze branches.
It seems like this is getting complicated enough that putting the fix in
test_socketserver.py is probably best. Another solution is to add a secret
terminating-close flag to ForkingMixIn just for the tests. Is that good
practice in the stdlib?
On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner <victor.stinner at gmail.com>
> I'm working on reducing the failure rate of Python CIs (Travis CI,
> AppVeyor, buildbots). For that, I'm trying to reduce test side effects
> using "environment altered" warnings. This week, I worked on
> support.reap_children() which detects leaked child processes (usually
> created with os.fork()).
> I found a bug in the socketserver module: it waits for child processes
> completion, but only in non-blocking mode. If a child process takes
> too long, the server will never reads its exit status and so the
> server leaks "zombie processes". Leaking processes can increase the
> memory usage, spawning new processes can fail, etc.
> => http://bugs.python.org/issue31151
> I changed the code to call waitpid() in blocking mode on each child
> process on server_close(), to ensure that all children completed when
> on server close:
> After pushing my change, I'm not sure anymore if it's a good idea.
> There is a risk that server_close() blocks if a child is stuck on a
> socket recv() or send() for some reasons.
> Should we relax the code by waiting a few seconds (problem: hardcoded
> timeouts are always a bad idea), or *terminate* processes (SIGKILL on
> UNIX) if they don't complete fast enough?
> I don't know which applications use socketserver. How I can test if it
> breaks code in the wild?
> At least, I didn't notice any regression on Python CIs.
> Well, maybe the change is ok for the master branch. But I would like
> your opinion because now I would like to backport the fix to 2.7 and
> 3.6 branches. It might break some applications.
> If we *cannot* backport such change to 2.7 and 3.6 because it changes
> the behaviour, I will fix the bug in test_socketserver.py instead.
> What do you think?
> Python-Dev mailing list
> Python-Dev at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev