Internal counter to debug leaking file descriptors

Hi, Is there any kind of internal file descriptor counter that can be queried to debug issues with leaking resources? It can be used in tests to check that all tests are finish with 0 opened descriptors. It will be very useful while porting Python applications from Unix to Windows. Unix is more tolerant to open files and can overwrite them and do other nasty things. See the thread from comment #17 - https://bugs.edge.launchpad.net/dulwich/+bug/557585/ - there is an example of mmap that starts holding file descriptor somewhere long before an error occurs. How could one debug this? Right now I have to use FileMon. It includes information about operated filenames, but no info about source code where this happens. It will be nice to have some kind of counter with filename information inside Python, so that it can be possible to get the full log of events without manually messing with external system-specific tools like FileMon. -- anatoly t.

If you wanted to do something like this in the Python stdlib, you'd have to monkey-patch (with a proxy/wrapper) all places that can open or close a filedescriptor -- os.open, os.popen, os.close, file open/close, socket open/close, and probably a bunch more that I've forgotten. Also some extension modules may open file descriptors directly through the C interfaces. I don't know if the Windows libc has some kind of tracking feature for file descriptors; of course it complicates things by using separate (numeric) namespaces for sockets and files. On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened. --Guido On Mon, Aug 30, 2010 at 11:49 PM, anatoly techtonik <techtonik@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Aug 31, 2010, at 10:03 AM, Guido van Rossum wrote:
On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened.
"/dev/fd" is actually a somewhat portable way of getting this information. I don't think it's part of a standard, but on Linux it's usually a symlink to "/proc/self/fd", and it's available on MacOS and most BSDs (based on a hasty and completely-not-comprehensive investigation). But it won't help you find out when the FDs were originally opened, no.

On 05:22 pm, glyph@twistedmatrix.com wrote:
On OS X and Solaris, dtrace and ustack will tell you exactly when and where the FDs were originally opened, though. On Linux, SystemTap might give you the same information (but I know much less about SystemTap). If http://bugs.python.org/issue4111 is resolved, then this may even be possible without using a patched version of Python. Jean-Paul

On Tue, 2010-08-31 at 17:40 +0000, exarkun@twistedmatrix.com wrote:
I believe you can do something like this: $ cat /tmp/trace-all-syscalls.stp /* Watch all syscalls in a specified process, dumping a user-space backtrace */ probe syscall.* { if (pid() == target()) { printf("%s(%s)\n", probefunc(), argstr) print_ubacktrace(); } } $ sudo stap --ldd -d /usr/bin/python /tmp/trace-all-syscalls.stp -c "python -c 'print 42'" This generates a torrent of debug data like this: sys_mmap_pgoff(0x0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x38f44e17aa : mmap64+0xa/0x30 [libc-2.11.90.so] 0x38f44673fc : _IO_file_doallocate+0x7c/0x110 [libc-2.11.90.so] 0x38f447498c : _IO_doallocbuf+0x2c/0x50 [libc-2.11.90.so] 0x38f4472ef4 : _IO_file_underflow@@GLIBC_2.2.5+0x1b4/0x230 [libc-2.11.90.so] 0x38f44749ce : _IO_default_uflow+0xe/0x30 [libc-2.11.90.so] 0x38f446fdcb : getc+0xab/0xf0 [libc-2.11.90.so] 0x39054f3e13 : r_long+0x23/0x120 [libpython2.6.so.1.0] 0x39054f3f3b : PyMarshal_ReadLongFromFile+0x2b/0x30 [libpython2.6.so.1.0] 0x39054f0661 : load_source_module+0x271/0x640 [libpython2.6.so.1.0] 0x39054f1cc5 : import_submodule+0x155/0x300 [libpython2.6.so.1.0] 0x39054f1f85 : load_next+0x115/0x2a0 [libpython2.6.so.1.0] 0x39054f2592 : import_module_level+0x212/0x730 [libpython2.6.so.1.0] 0x39054f3314 : PyImport_ImportModuleLevel+0x44/0xb0 [libpython2.6.so.1.0] 0x39054d843f : builtin___import__+0x8f/0xa0 [libpython2.6.so.1.0] 0x3905443f43 : PyObject_Call+0x53/0x100 [libpython2.6.so.1.0] 0x39054d89b3 : PyEval_CallObjectWithKeywords+0x43/0xf0 [libpython2.6.so.1.0] 0x39054db674 : PyEval_EvalFrameEx+0x21b4/0x65b0 [libpython2.6.so.1.0] 0x39054e03a8 : PyEval_EvalCodeEx+0x938/0x9e0 [libpython2.6.so.1.0] 0x39054e0482 : PyEval_EvalCode+0x32/0x40 [libpython2.6.so.1.0] 0x39054f02c2 : PyImport_ExecCodeModuleEx+0xc2/0x1f0 [libpython2.6.so.1.0] 0x39054f07a6 : load_source_module+0x3b6/0x640 [libpython2.6.so.1.0] You may want to specify specific syscalls in the above to narrow the scope. Issue 4111 patches cpython to statically mark Python frame entry/exit so that systemtap can directly instrument that; in Fedora 13 onwards I've built Python with systemtap hooks so that you can add: probe python.function.entry { printf("%s:%s:%d\n", filename, funcname, lineno); } (Arguably this is wrong, it's frame entry/exit, rather than function entry/exit). Potentially systemtap could be taught how to decipher/prettyprint Python backtraces in a similar way to how gdb does it (by hooking into PyEval_EvalFrameEx) Hope this is helpful Dave

On Tue, Aug 31, 2010 at 5:03 PM, Guido van Rossum <guido@python.org> wrote:
Actually I monkey-patched fdopen and open, but it appeared that it is not enough. Extensions cause the biggest problem. How can I monkey-patch opening of file descriptor deep inside mmap module? How can I know that a file descriptor is opened there at all? I thought that maybe Python has internal API for opening file descriptors and it is possible to intercept the operation on this level. Is it feasible to route all file descriptor open operations through such API that allows to audit open/close operations and filenames through callback?
On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened.
If I can query FD counter - I can automate the process of walking through the code line by line to find places where this descriptor incremented or decremented. Of course it would be nice to get access to FD stack so that a full filename can also be retrieved in this case. It would be nice if at least Linux implementation provided a way to detect leaking descriptors, thanks for suggestions, but my expertise and available resources are limited to Windows machines, so for now I won't be able to try anything more complicated than an unpack-and-launch Linux solution. -- anatoly t.

Of course it would be nice to get access to FD stack so that a full filename can also be retrieved in this case.
On Linux, this can be easily achieved by using /proc. You can take a look at how this is done in the current development version of psutil: http://code.google.com/p/psutil/source/browse/trunk/psutil/_pslinux.py?spec=svn633&r=630#266 Usage:
Same for sockets, a bunch of lines later: http://code.google.com/p/psutil/source/browse/trunk/psutil/_pslinux.py?spec=svn633&r=630#284
Hope this helps --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/

On 9/3/2010 6:09 AM, Giampaolo Rodolà wrote:
If you can use psutil itself, it has compiled Windows versions for 2.7 and 3.1 https://code.google.com/p/psutil/ -- Terry Jan Reedy

The Windows part slipped under my radar. =) Unfortunately the Windows binaries still refer to the current version which doesn't include open files and open connections functionalities. To have those he'll have to get the latest code from svn and compile it with mingw32. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ 2010/9/3 Terry Reedy <tjreedy@udel.edu>:

If you wanted to do something like this in the Python stdlib, you'd have to monkey-patch (with a proxy/wrapper) all places that can open or close a filedescriptor -- os.open, os.popen, os.close, file open/close, socket open/close, and probably a bunch more that I've forgotten. Also some extension modules may open file descriptors directly through the C interfaces. I don't know if the Windows libc has some kind of tracking feature for file descriptors; of course it complicates things by using separate (numeric) namespaces for sockets and files. On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened. --Guido On Mon, Aug 30, 2010 at 11:49 PM, anatoly techtonik <techtonik@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Aug 31, 2010, at 10:03 AM, Guido van Rossum wrote:
On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened.
"/dev/fd" is actually a somewhat portable way of getting this information. I don't think it's part of a standard, but on Linux it's usually a symlink to "/proc/self/fd", and it's available on MacOS and most BSDs (based on a hasty and completely-not-comprehensive investigation). But it won't help you find out when the FDs were originally opened, no.

On 05:22 pm, glyph@twistedmatrix.com wrote:
On OS X and Solaris, dtrace and ustack will tell you exactly when and where the FDs were originally opened, though. On Linux, SystemTap might give you the same information (but I know much less about SystemTap). If http://bugs.python.org/issue4111 is resolved, then this may even be possible without using a patched version of Python. Jean-Paul

On Tue, 2010-08-31 at 17:40 +0000, exarkun@twistedmatrix.com wrote:
I believe you can do something like this: $ cat /tmp/trace-all-syscalls.stp /* Watch all syscalls in a specified process, dumping a user-space backtrace */ probe syscall.* { if (pid() == target()) { printf("%s(%s)\n", probefunc(), argstr) print_ubacktrace(); } } $ sudo stap --ldd -d /usr/bin/python /tmp/trace-all-syscalls.stp -c "python -c 'print 42'" This generates a torrent of debug data like this: sys_mmap_pgoff(0x0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x38f44e17aa : mmap64+0xa/0x30 [libc-2.11.90.so] 0x38f44673fc : _IO_file_doallocate+0x7c/0x110 [libc-2.11.90.so] 0x38f447498c : _IO_doallocbuf+0x2c/0x50 [libc-2.11.90.so] 0x38f4472ef4 : _IO_file_underflow@@GLIBC_2.2.5+0x1b4/0x230 [libc-2.11.90.so] 0x38f44749ce : _IO_default_uflow+0xe/0x30 [libc-2.11.90.so] 0x38f446fdcb : getc+0xab/0xf0 [libc-2.11.90.so] 0x39054f3e13 : r_long+0x23/0x120 [libpython2.6.so.1.0] 0x39054f3f3b : PyMarshal_ReadLongFromFile+0x2b/0x30 [libpython2.6.so.1.0] 0x39054f0661 : load_source_module+0x271/0x640 [libpython2.6.so.1.0] 0x39054f1cc5 : import_submodule+0x155/0x300 [libpython2.6.so.1.0] 0x39054f1f85 : load_next+0x115/0x2a0 [libpython2.6.so.1.0] 0x39054f2592 : import_module_level+0x212/0x730 [libpython2.6.so.1.0] 0x39054f3314 : PyImport_ImportModuleLevel+0x44/0xb0 [libpython2.6.so.1.0] 0x39054d843f : builtin___import__+0x8f/0xa0 [libpython2.6.so.1.0] 0x3905443f43 : PyObject_Call+0x53/0x100 [libpython2.6.so.1.0] 0x39054d89b3 : PyEval_CallObjectWithKeywords+0x43/0xf0 [libpython2.6.so.1.0] 0x39054db674 : PyEval_EvalFrameEx+0x21b4/0x65b0 [libpython2.6.so.1.0] 0x39054e03a8 : PyEval_EvalCodeEx+0x938/0x9e0 [libpython2.6.so.1.0] 0x39054e0482 : PyEval_EvalCode+0x32/0x40 [libpython2.6.so.1.0] 0x39054f02c2 : PyImport_ExecCodeModuleEx+0xc2/0x1f0 [libpython2.6.so.1.0] 0x39054f07a6 : load_source_module+0x3b6/0x640 [libpython2.6.so.1.0] You may want to specify specific syscalls in the above to narrow the scope. Issue 4111 patches cpython to statically mark Python frame entry/exit so that systemtap can directly instrument that; in Fedora 13 onwards I've built Python with systemtap hooks so that you can add: probe python.function.entry { printf("%s:%s:%d\n", filename, funcname, lineno); } (Arguably this is wrong, it's frame entry/exit, rather than function entry/exit). Potentially systemtap could be taught how to decipher/prettyprint Python backtraces in a similar way to how gdb does it (by hooking into PyEval_EvalFrameEx) Hope this is helpful Dave

On Tue, Aug 31, 2010 at 5:03 PM, Guido van Rossum <guido@python.org> wrote:
Actually I monkey-patched fdopen and open, but it appeared that it is not enough. Extensions cause the biggest problem. How can I monkey-patch opening of file descriptor deep inside mmap module? How can I know that a file descriptor is opened there at all? I thought that maybe Python has internal API for opening file descriptors and it is possible to intercept the operation on this level. Is it feasible to route all file descriptor open operations through such API that allows to audit open/close operations and filenames through callback?
On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened.
If I can query FD counter - I can automate the process of walking through the code line by line to find places where this descriptor incremented or decremented. Of course it would be nice to get access to FD stack so that a full filename can also be retrieved in this case. It would be nice if at least Linux implementation provided a way to detect leaking descriptors, thanks for suggestions, but my expertise and available resources are limited to Windows machines, so for now I won't be able to try anything more complicated than an unpack-and-launch Linux solution. -- anatoly t.

Of course it would be nice to get access to FD stack so that a full filename can also be retrieved in this case.
On Linux, this can be easily achieved by using /proc. You can take a look at how this is done in the current development version of psutil: http://code.google.com/p/psutil/source/browse/trunk/psutil/_pslinux.py?spec=svn633&r=630#266 Usage:
Same for sockets, a bunch of lines later: http://code.google.com/p/psutil/source/browse/trunk/psutil/_pslinux.py?spec=svn633&r=630#284
Hope this helps --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/

On 9/3/2010 6:09 AM, Giampaolo Rodolà wrote:
If you can use psutil itself, it has compiled Windows versions for 2.7 and 3.1 https://code.google.com/p/psutil/ -- Terry Jan Reedy

The Windows part slipped under my radar. =) Unfortunately the Windows binaries still refer to the current version which doesn't include open files and open connections functionalities. To have those he'll have to get the latest code from svn and compile it with mingw32. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ 2010/9/3 Terry Reedy <tjreedy@udel.edu>:
participants (7)
-
anatoly techtonik
-
David Malcolm
-
exarkun@twistedmatrix.com
-
Giampaolo Rodolà
-
Glyph Lefkowitz
-
Guido van Rossum
-
Terry Reedy