Re: [Python-Dev] Proposal for a new function "open_noinherit" to avoid problems with subprocesses and security risks

Hi, # I'm not sure about netiquette here: # I decided to continue posting to the python-list without CCing to everyone. First of all, here's the prototype. It's a prototype and I know it's far from perfect, but it works for me (in production code) - however, I did not yet test it on Non-Windows. --------------------------------- #!/bin/env python # -*- coding: iso-8859-1 -*- """ File ftools.py: Useful tools for working with files. """ import os import os.path import time import shutil rmtree = shutil.rmtree move = shutil.move builtin_open = open if os.name != "nt": import fcntl def open(fname, mode="r", bufsize=None): """ Like the "open" built-in, but does not inherit to child processes. The code is using os.open and os.fdopen. On Windows, to avoid inheritance, os.O_NOINHERIT is used directly in the open call, thus it should be thread-safe. On other operating systems, fcntl with FD_CLOEXEC is used right after opening the file; however in a mutli-threaded program it may happen that another thread starts a child process in the fraction of a second between os.open and fcntl. Note: The bufsize argument is ignored (not yet implemented). """ flags = 0 if "r" in mode: flags += os.O_RDONLY elif "w" in mode: flags += os.O_RDWR + os.O_CREAT + os.O_TRUNC elif "a" in mode: flags += os.O_RDWR + os.O_CREAT + os.O_APPEND else: raise NotImplementedError ("mode=" + mode) if os.name == "nt": if "b" in mode: flags += os.O_BINARY else: flags += os.O_TEXT flags += os.O_NOINHERIT try: fd = os.open (fname, flags) if os.name != "nt": old = fcntl.fcntl(fd, fcntl.F_GETFD) fcntl.fcntl(fd, fcntl.F_SETFD, old | fcntl.FD_CLOEXEC) return os.fdopen (fd, mode) except OSError, x: raise IOError(x.errno, x.strerror, x.filename) def copyfile(src, dst): """ Copies a file - like shutil.copyfile, but the files are opened non-inheritable. Note: This prototype does not test _samefile like shutil.copyfile. """ fsrc = None fdst = None try: fsrc = open(src, "rb") fdst = open(dst, "wb") shutil.copyfileobj(fsrc, fdst) finally: if fdst: fdst.close() if fsrc: fsrc.close() ------------------------------------------------ """ blah blah: I googled around a bit, and it more and more seems to me that the Posix system has a serious design flaw, since it seems to be SO hard to write multi-threaded programs that also start child-processes. It's funny that right now Linus Torvalds himself seems to be aware of this problem and that the Linux kernel developers are discussing ways to solve it. Let's hope they find a way to get around it on the OS level. To me, the design of MS Windows looks better in the aspect of process-creation, handle inheritance and multi-threading... Anyway, it has its drawbacks, too. For example, I still cannot specify in a thread-safe way that a handle should be inherited to one child process but not to another - I would have to use a lock to synchronize it, which has its own problems, as Ross Cohen noted. The best solution at the OS level would be to explitly specify the handles/fds I want to be inherited in a "create child process" system call. BTW the Linux kernel developers face the same situation as we do: They could somehow implement a new system function like "open_noinherit", but there's a whole bunch of existing "standard code" that uses open and similar functions like socket(), accept() etc., and they don't want to change all these calls. So perhaps, for Python development, we just have to accept that the problem persists and that at this time a 100% solution just does not exist - and we should watch the discussion on http://lwn.net/Articles/237722/ to see how they solve it for Linux. """ That being said, I still think it's necessary for Python to provide a solution as good as possible. For example, in my production application, by consequently using the ftools module above, I could reduce the error rate dramatically: * With built-in open and shutil.copyfile: Several "Permission denied" and other errors a day * With ftools.open and ftools.copyfile: program running for a week or more without errors. There are still errors sometimes, and I suspect it has to do with the unintenional inheritance of socket handles (I did not dig into SocketServer.py, socket.py and socket.c to solve it). (However, the errors are so rare now that our clients think it's just errors in their network :-). Martin, you mentioned that for sockets, inheritance is not a problem unless accept(), recv() or select() is called in the child process (as far as I understood it). Though I am not an expert in socket programming at the C level, I doubt that you are right here. Apart from by own experiences, I've found some evidence in the WWW (searching for "child process socket inherit respond", and for "socket error 10054 process"). * http://mail.python.org/pipermail/python-list/2003-November/236043.html "socket's strange behavior with subprocesses" Funny: Even notepad.exe is used there as an example child process... * http://mail.python.org/pipermail/python-bugs-list/2006-April/032974.html python-Bugs-1469163 SimpleXMLRPCServer doesn't work anymore on Windows (see also Bug 1222790). * http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4202949 Java has switched to non-inheritable sockets as well. * http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5069989 "(process) Runtime.exec unnecessarily inherits open files (win)" * http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4197666" "Socket descriptors leak into processes spawned by Java programs on windows" ... Any Windows Guru around who can explain what's going on with socket handles and CreateProcess? I mean - is the explanation Martin gave for accept(), recv(), select() correct for Windows, too? And if so - how can the errors be explained that are mentioned in the URLs above? Martin v. Löwis wrote: protected, process.
Are you still reading? Here's a pragmatic proposal: - Add the functions in ftools.py (in a more complete version) to the standard library. Perhaps even add them to the subprocess.py module? - Add a note about handle inheritance to the documentation for the subprocess module, saying that for the parent process, one should avoid using open and prefer ftools.open instead. - Add a global switch to the socket module to choose between new and old behaviour: - New behaviour: In the C level socket implementation, use os.O_NOINHERIT resp. fcntl FD_CLOEXEC Remember: In case you write a forking socket server in Python, you have to use the old behaviour (so, in the ForkingServerMixin, expliticly choose the old behaviour). - Change the logging file handler classes to use ftools.open, so that at least the logging module does not produce errors in a multi-threaded program with child processes. - For Python 3000, search the standard library for unintentional inheritance. What do you think? Henning Footnote: I bet that about 50% of all unexplainable, seemingly random, hard-to-reproduce errors in any given program (written in any programming language) that uses child processes are caused by unintentional handle inheritance.

[I assume you mean python-dev] Discussing this issue on the list is fine. Posting code is on the borderline, and will have no effect, i.e. no action will come out of (at least *I* will ignore the code entirely, unless it is an actual patch, and submitted to the bug tracker).
Exactly. My proposal is still to provide an API to toggle the flag after the handle was created.
I did not say "no problems". I said "there is no ambiguity whereto direct the data if the child processes don't perform accept/recv".
Sure: the system will not shutdown the connection as long as the handle is still open in the subprocess (as the subprocess *might* send more data - which it won't). I think the problem could be avoided by the parent process explicitly performing shutdown(2), but I'm uncertain as I have never actively used shutdown().
I don't understand how this is relevant. This is about CLO_EXEC not being available on Windows, and has nothing to do with socket inheritance.
* http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4202949 Java has switched to non-inheritable sockets as well.
Not surprisingly - they don't support fork(). If they would, they could not have made that change. The bug report is the same issue: clients will be able to connect as long as the listen backlog fills. Then they will be turned down, as notepad will never perform accept. [I'm getting bored trying to explain the other cases as well]
See my explanation above. Martin

OK, here is an API that I tested on Windows and for sockets only. Perhaps someone can test it on Non-Windows (Linux, for example?) I think the best place for it would be as a new method "set_noinherit" for file and socket objects or as a new function in the os module (thus the implementation should probably be rewritten at the C level). Note that for file objects, the code would need an additional call to win32file._get_osfhandle, because .fileno() returns the Windows handle only for sockets, but not for files. The code below uses thes Mark Hammond's win32all library. import os if os.name == "nt": import win32api, win32con def set_noinherit(socket, noinherit=True): """ Mark a socket as non-inheritable to child processes. This should be called right after socket creation if you want to prevent the socket from being inherited to child processes. Notes: Unintentional socket or file inheritance is a security risk and can cause errors like "permission denied", "adress already in use" etc. in programs that start subprocesses, particularly in multi-threaded programs. These errors tend to occur seemingly randomly and are hard to reproduce (race condition!) and even harder to debug. Thus it is good practice to call this function as soon as possible after opening a file or socket that you doesn't need to be inherited to subprocesses. Note that in a multi-threaded environment, it is still possible that another thread starts a subprocess after you created a file/socket, but before you call set_noinherit. Note that for sockets, the new socket returned from accept() will be inheritable even if the listener socket was not; so you should call set_noinherit for the new socket as well. Availability: Posix, Windows """ flags = 0 if noinherit: flags = flags | win32con.HANDLE_FLAG_INHERIT win32api.SetHandleInformation(socket.fileno(), win32con.HANDLE_FLAG_INHERIT, flags) else: import fcntl def set_noinherit(socket, noinherit=True): """ ... documentation copied from the nt case ... """ fd = socket.fileno() flags = fcntl.fcntl(fd, fcntl.F_GETFD) & ~fncl.FD_CLOEXEC if noinherit: flags = flags | fcntl.FD_CLOEXEC) fcntl.fcntl(fd, fcntl.F_SETFD, flags)
I think I more or less understand what happens. The fact remains, that a subprocess takes influence on the behaviour of server and client programs in an unwanted (and for many people: unexpected) manner. Regarding the influence of socket inheritance for server apps, I performed some tests on Windows. If a subprocess is started after the parent called accept(), then a client can still connect() and send() to the socket - even if the parent process has closed it and exited meanwhile. This works until the queue is full (whose size was specified in listen()). THEN the client will get (10061, 'Connection refused'); as you already explained. And client and server will have the socket in CLOSE_WAIT resp. FIN_WAIT2 status. However I doubt that this is actually a problem as long as the server continues accept()ing. But it means the client and server OS have to manage the socket until the subprocess exits - even though neither client nor server need the socket anymore. One might argue that it is not actually a big problem, since the subprocess will exit sooner or later. This is more or less true for Linux, etc. However, sometimes a subprocess might crash or hang. Now what happens if the server program is closed and then started again? On Linux, no problem (more or less). When the server program is closed, the subprocess will be killed by the OS (I think), and the socket is released (perhaps with a few minutes delay). On Windows the situation is worse. Subprocess hanging: When the server program is closed, the subprocess will NOT be killed by the OS ("officially" there isn't a parent-child relationship for processes). It will continue hanging. When the server program is restarted, it will run into an "address alread in use" error. Subprocess crashing: Unfortunately, on a default Windows installation, this will be very much the same like a hanging subprocess: Dr. Watson or whichever debugger comes to debug the crashed program. As long as nobody clicks on the messagebox popup, the crashed program will not be freed and all handles will be kept open. Of course, on a server computer, usually there is nobody watching the desktop... Note: It is possible to work around this by installing a different debugger (which usually includes hacking the registry). These problems can be avoided by calling set_noinherit for the lisener socket as well as for the new socket returned by accept() ASAP.
The original bug was about problems due to unwanted handle inheritance with SimpleXMLRPCServer. The bug fix was to set CLO_EXEC. The fix didn't work for Windows of course. A correct bug fix would to use the "set_noinherit" function above.
[I'm getting bored trying to explain the other cases as well]
OK, YOU do understand the issue and know what's going on under the hood. I understand as well - at least now. It cost be several weeks of frustrating debugging, changing code to avoid the built-in "open", reading library source code, searching the internet and cursing... (and I'm not a beginner, as well as others who mentioned they had similar problems in this list). Note that the solution REQUIRED avoiding and/or hacking the standard library. As mentioned in previous posts, in a multi-threaded program, the correct solution for files is to use the ftools.open on Windows and a correct solution is not possible for sockets and on Non-Windows due to possible race-conditions. Using set_noinherit will reduce the risk as best as possible. For Python 2.6 I propose to add the set_noinherit method to file and socket objects. For Python 3000 I propose that by default files and sockets should be created non-inheritable (though it will not work perfectly for mult-threaded programs on Non-Windows - see the doc in the code). A server that needs handles to be inherited can then still call set_noinherit(False). Typical uses would include SocketServer.ForkingMixIn and the 3 standard handles for subprocess/os popen. If this seems reasonable and I can help in implementing this, please let me know. The change would prevent other developers from making the same frustrating experiences as I did. Regards, Henning

Indeed. Can you come up with a C implementation of it? I think it should be a function in the posix/nt module, expecting OS handles; the function in the os module could additionally support sockets and file objects also in a polymorphic way.
That's for accept, yes. For send, you can continue sending until the TCP window closes (plus some unspecified amount of local buffering the OS might do).
That's not true. The child process can run indefinitely even though the parent process has terminated. You may be thinking of SIGHUP, which is sent to all processes when the user logs out of the terminal. Regards, Martin

[I assume you mean python-dev] Discussing this issue on the list is fine. Posting code is on the borderline, and will have no effect, i.e. no action will come out of (at least *I* will ignore the code entirely, unless it is an actual patch, and submitted to the bug tracker).
Exactly. My proposal is still to provide an API to toggle the flag after the handle was created.
I did not say "no problems". I said "there is no ambiguity whereto direct the data if the child processes don't perform accept/recv".
Sure: the system will not shutdown the connection as long as the handle is still open in the subprocess (as the subprocess *might* send more data - which it won't). I think the problem could be avoided by the parent process explicitly performing shutdown(2), but I'm uncertain as I have never actively used shutdown().
I don't understand how this is relevant. This is about CLO_EXEC not being available on Windows, and has nothing to do with socket inheritance.
* http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4202949 Java has switched to non-inheritable sockets as well.
Not surprisingly - they don't support fork(). If they would, they could not have made that change. The bug report is the same issue: clients will be able to connect as long as the listen backlog fills. Then they will be turned down, as notepad will never perform accept. [I'm getting bored trying to explain the other cases as well]
See my explanation above. Martin

OK, here is an API that I tested on Windows and for sockets only. Perhaps someone can test it on Non-Windows (Linux, for example?) I think the best place for it would be as a new method "set_noinherit" for file and socket objects or as a new function in the os module (thus the implementation should probably be rewritten at the C level). Note that for file objects, the code would need an additional call to win32file._get_osfhandle, because .fileno() returns the Windows handle only for sockets, but not for files. The code below uses thes Mark Hammond's win32all library. import os if os.name == "nt": import win32api, win32con def set_noinherit(socket, noinherit=True): """ Mark a socket as non-inheritable to child processes. This should be called right after socket creation if you want to prevent the socket from being inherited to child processes. Notes: Unintentional socket or file inheritance is a security risk and can cause errors like "permission denied", "adress already in use" etc. in programs that start subprocesses, particularly in multi-threaded programs. These errors tend to occur seemingly randomly and are hard to reproduce (race condition!) and even harder to debug. Thus it is good practice to call this function as soon as possible after opening a file or socket that you doesn't need to be inherited to subprocesses. Note that in a multi-threaded environment, it is still possible that another thread starts a subprocess after you created a file/socket, but before you call set_noinherit. Note that for sockets, the new socket returned from accept() will be inheritable even if the listener socket was not; so you should call set_noinherit for the new socket as well. Availability: Posix, Windows """ flags = 0 if noinherit: flags = flags | win32con.HANDLE_FLAG_INHERIT win32api.SetHandleInformation(socket.fileno(), win32con.HANDLE_FLAG_INHERIT, flags) else: import fcntl def set_noinherit(socket, noinherit=True): """ ... documentation copied from the nt case ... """ fd = socket.fileno() flags = fcntl.fcntl(fd, fcntl.F_GETFD) & ~fncl.FD_CLOEXEC if noinherit: flags = flags | fcntl.FD_CLOEXEC) fcntl.fcntl(fd, fcntl.F_SETFD, flags)
I think I more or less understand what happens. The fact remains, that a subprocess takes influence on the behaviour of server and client programs in an unwanted (and for many people: unexpected) manner. Regarding the influence of socket inheritance for server apps, I performed some tests on Windows. If a subprocess is started after the parent called accept(), then a client can still connect() and send() to the socket - even if the parent process has closed it and exited meanwhile. This works until the queue is full (whose size was specified in listen()). THEN the client will get (10061, 'Connection refused'); as you already explained. And client and server will have the socket in CLOSE_WAIT resp. FIN_WAIT2 status. However I doubt that this is actually a problem as long as the server continues accept()ing. But it means the client and server OS have to manage the socket until the subprocess exits - even though neither client nor server need the socket anymore. One might argue that it is not actually a big problem, since the subprocess will exit sooner or later. This is more or less true for Linux, etc. However, sometimes a subprocess might crash or hang. Now what happens if the server program is closed and then started again? On Linux, no problem (more or less). When the server program is closed, the subprocess will be killed by the OS (I think), and the socket is released (perhaps with a few minutes delay). On Windows the situation is worse. Subprocess hanging: When the server program is closed, the subprocess will NOT be killed by the OS ("officially" there isn't a parent-child relationship for processes). It will continue hanging. When the server program is restarted, it will run into an "address alread in use" error. Subprocess crashing: Unfortunately, on a default Windows installation, this will be very much the same like a hanging subprocess: Dr. Watson or whichever debugger comes to debug the crashed program. As long as nobody clicks on the messagebox popup, the crashed program will not be freed and all handles will be kept open. Of course, on a server computer, usually there is nobody watching the desktop... Note: It is possible to work around this by installing a different debugger (which usually includes hacking the registry). These problems can be avoided by calling set_noinherit for the lisener socket as well as for the new socket returned by accept() ASAP.
The original bug was about problems due to unwanted handle inheritance with SimpleXMLRPCServer. The bug fix was to set CLO_EXEC. The fix didn't work for Windows of course. A correct bug fix would to use the "set_noinherit" function above.
[I'm getting bored trying to explain the other cases as well]
OK, YOU do understand the issue and know what's going on under the hood. I understand as well - at least now. It cost be several weeks of frustrating debugging, changing code to avoid the built-in "open", reading library source code, searching the internet and cursing... (and I'm not a beginner, as well as others who mentioned they had similar problems in this list). Note that the solution REQUIRED avoiding and/or hacking the standard library. As mentioned in previous posts, in a multi-threaded program, the correct solution for files is to use the ftools.open on Windows and a correct solution is not possible for sockets and on Non-Windows due to possible race-conditions. Using set_noinherit will reduce the risk as best as possible. For Python 2.6 I propose to add the set_noinherit method to file and socket objects. For Python 3000 I propose that by default files and sockets should be created non-inheritable (though it will not work perfectly for mult-threaded programs on Non-Windows - see the doc in the code). A server that needs handles to be inherited can then still call set_noinherit(False). Typical uses would include SocketServer.ForkingMixIn and the 3 standard handles for subprocess/os popen. If this seems reasonable and I can help in implementing this, please let me know. The change would prevent other developers from making the same frustrating experiences as I did. Regards, Henning

Indeed. Can you come up with a C implementation of it? I think it should be a function in the posix/nt module, expecting OS handles; the function in the os module could additionally support sockets and file objects also in a polymorphic way.
That's for accept, yes. For send, you can continue sending until the TCP window closes (plus some unspecified amount of local buffering the OS might do).
That's not true. The child process can run indefinitely even though the parent process has terminated. You may be thinking of SIGHUP, which is sent to all processes when the user logs out of the terminal. Regards, Martin
participants (2)
-
"Martin v. Löwis"
-
Henning von Bargen