Hi,
# I'm not sure about netiquette here:
# I decided to continue posting to the python-list without CCing to
everyone.
First of all, here's the prototype. It's a prototype and I know it's far from perfect, but it works for me
# !/bin/env python
# -- coding: iso-8859-1 --
""" File ftools.py: Useful tools for working with files. """
import os import os.path import time import shutil
rmtree = shutil.rmtree move = shutil.move
builtin_open = open
if os.name != "nt": import fcntl
def open(fname, mode="r", bufsize=None): """ Like the "open" built-in, but does not inherit to child processes. The code is using os.open and os.fdopen. On Windows, to avoid inheritance, os.O_NOINHERIT is used directly in the open call, thus it should be thread-safe. On other operating systems, fcntl with FD_CLOEXEC is used right after opening the file; however in a mutli-threaded program it may happen that another thread starts a child process in the fraction of a second between os.open and fcntl.
Note: The bufsize argument is ignored (not yet implemented).
"""
flags = 0
if "r" in mode:
flags += os.O_RDONLY
elif "w" in mode:
flags += os.O_RDWR + os.O_CREAT + os.O_TRUNC
elif "a" in mode:
flags += os.O_RDWR + os.O_CREAT + os.O_APPEND
else:
raise NotImplementedError ("mode=" + mode)
if os.name == "nt":
if "b" in mode:
flags += os.O_BINARY
else:
flags += os.O_TEXT
flags += os.O_NOINHERIT
try:
fd = os.open (fname, flags)
if os.name != "nt":
old = fcntl.fcntl(fd, fcntl.F_GETFD)
fcntl.fcntl(fd, fcntl.F_SETFD, old | fcntl.FD_CLOEXEC)
return os.fdopen (fd, mode)
except OSError, x:
raise IOError(x.errno, x.strerror, x.filename)
def copyfile(src, dst): """ Copies a file - like shutil.copyfile, but the files are opened non-inheritable. Note: This prototype does not test _samefile like shutil.copyfile. """ fsrc = None fdst = None try: fsrc = open(src, "rb") fdst = open(dst, "wb") shutil.copyfileobj(fsrc, fdst) finally: if fdst: fdst.close() if fsrc: fsrc.close()
""" blah blah: I googled around a bit, and it more and more seems to me that the Posix system has a serious design flaw, since it seems to be SO hard to write multi-threaded programs that also start child-processes. It's funny that right now Linus Torvalds himself seems to be aware of this problem and that the Linux kernel developers are discussing ways to solve it. Let's hope they find a way to get around it on the OS level. To me, the design of MS Windows looks better in the aspect of process-creation, handle inheritance and multi-threading... Anyway, it has its drawbacks, too. For example, I still cannot specify in a thread-safe way that a handle should be inherited to one child process but not to another - I would have to use a lock to synchronize it, which has its own problems, as Ross Cohen noted. The best solution at the OS level would be to explitly specify the handles/fds I want to be inherited in a "create child process" system call.
BTW the Linux kernel developers face the same situation as we do: They could somehow implement a new system function like "open_noinherit", but there's a whole bunch of existing "standard code" that uses open and similar functions like socket(), accept() etc., and they don't want to change all these calls.
So perhaps, for Python development, we just have to accept that the problem persists and that at this time a 100% solution just does not exist - and we should watch the discussion on http://lwn.net/Articles/237722/ to see how they solve it for Linux. """
That being said, I still think it's necessary for Python to provide a solution as good as possible.
For example, in my production application, by consequently using the ftools module above, I could reduce the error rate dramatically:
There are still errors sometimes, and I suspect it has to do with the unintenional inheritance of socket handles (I did not dig into SocketServer.py, socket.py and socket.c to solve it). (However, the errors are so rare now that our clients think it's just errors in their network :-).
Martin, you mentioned that for sockets, inheritance is not a problem unless accept(), recv() or select() is called in the child process (as far as I understood it). Though I am not an expert in socket programming at the C level, I doubt that you are right here. Apart from by own experiences, I've found some evidence in the WWW (searching for "child process socket inherit respond", and for "socket error 10054 process").
Any Windows Guru around who can explain what's going on with socket handles and CreateProcess? I mean - is the explanation Martin gave for accept(), recv(), select() correct for Windows, too? And if so - how can the errors be explained that are mentioned in the URLs above?
Martin v. Löwis wrote:
In Python 3, it would be possible to implement the "n" flag for open(), as we call CreateFile directly. BTW, if it will be an additional flag to "open", let me correct myself: "n" is not a good character. Now I prefer "p" like personal, private, protected, process.
And to open a file non-inheritable should be possible in an easy and platform-independent way for the average python programmer. I don't see why it is a requirement to open the file in non-inheritable mode. Why is not sufficient to modify an open file to have its handle non-inheritable in an easy and platform-independent way? Because it wouldn't be thread-safe, unless a lock is used for synchronizing subprocess and open calls, which would cause other issues.
Are you still reading?
Here's a pragmatic proposal:
What do you think?
Henning
Footnote: I bet that about 50% of all unexplainable, seemingly random, hard-to-reproduce errors in any given program (written in any programming language) that uses child processes are caused by unintentional handle inheritance.