Hey ResourceWarning instances that are created in some classes' __del__ function, like FileIO are a great tool to track down a program bad behavior. In network programming, that's even more important to avoid crashes, or huge leaks. But some of them are very hard to fix because we don't get much context, we just get warnings at the end of the program execution, when gc.collect is called. sys:1: ResourceWarning: unclosed file <_io.FileIO name=18 mode='wb'> sys:1: ResourceWarning: unclosed file <_io.FileIO name=17 mode='rb'> Here I just know that somewhere, 2 file descriptors where not closed. What I'd like to be able to do is to track down the origin of those warnings. Since __del__ is called asynchronously, it's impossible to track it right now (or I don't know how) What we need is a way to keep track of any resource allocation *when it happens*. Here's an idea: let's add three private functions in Python's io: def __allocate_resource(fd) => records the file descriptor that was allocated, along with the current traceback. def __free_resource(fd) => removes the fd from the list. def __is_resource_allocated(fd) => tell if the resource is in the list. These three functions, plugged in somewhere in io's classes, could be used in conjunction with ResourceWarning: when __del__ is called, if the resource was not freed - we'd be able to know where it was created. Of course these functions are just a brain dump - I have no idea how io internals work. But unless I missed it, something like I've just described is missing in Python. Cheers Tarek
On Mar 18, 2014, at 12:57, Tarek Ziadé
def __allocate_resource(fd) => records the file descriptor that was allocated, along with the current traceback. def __free_resource(fd) => removes the fd from the list. def __is_resource_allocated(fd) => tell if the resource is in the list
I like the general idea. I've actually written wrappers (in Python 2.x and other languages like C++, never Python 3, but similar idea...) to track these kinds of leaks myself. I don't think you need all of these methods. Just the first one will do it: if you get to __del__ without a close and emit a ResourceWarning, use the info stashed by the allocate function; otherwise, it never gets looked at. And I'm not sure it needs to be a method after all. What about cases where the io object doesn't actually allocate anything (because you got it from an fd or a socket object or similar?), but you still need to call close. Don't you want the traceback in those cases too? Also, in the most common cases (like open), you're actually creating a chain of two or three objects; do all of them need to store this info? I don't think you want this on all the time... But when exactly _do_ you want it on? Debug mode? If the warning is enabled at allocation time? A global flag on the io module? Also, storing an actual traceback is probably a bad idea, as it keeps all kinds of things alive for a very long time. Maybe storing a string representation, or enough info to generate such a representation in the ResourceWarning? Also, I suspect that knowing the arguments used to allocate the resource (like the filename passed to open or the object's constructor, in the most common case, but also things like the cloexec flag when debugging multi-process apps) might be at least as useful as the traceback, so you might as well add that too. Anyway, there's pure Python code in Lib/io.py that wraps up the C code. To experiment with this without getting into the C stuff, I think you could edit it to use your own classes that wrap _io.BufferedWriter, etc. instead of just exporting those directly.
Le 18/03/14 21:49, Andrew Barnert a écrit :
On Mar 18, 2014, at 12:57, Tarek Ziadé
wrote: def __allocate_resource(fd) => records the file descriptor that was allocated, along with the current traceback. def __free_resource(fd) => removes the fd from the list. def __is_resource_allocated(fd) => tell if the resource is in the list I like the general idea. I've actually written wrappers (in Python 2.x and other languages like C++, never Python 3, but similar idea...) to track these kinds of leaks myself.
I don't think you need all of these methods. Just the first one will do it: if you get to __del__ without a close and emit a ResourceWarning, use the info stashed by the allocate function; otherwise, it never gets looked at. And I'm not sure it needs to be a method after all. True.
What about cases where the io object doesn't actually allocate anything (because you got it from an fd or a socket object or similar?), but you still need to call close. Don't you want the traceback in those cases too? Also, in the most common cases (like open), you're actually creating a chain of two or three objects; do all of them need to store this info? I guess the allocate_resource function would be only called when the FD is created. And we'd want this initial traceback afaik
I don't think you want this on all the time... But when exactly _do_ you want it on? Debug mode? If the warning is enabled at allocation time? A global flag on the io module?
Yeah that would be costly, I was thinking about some kind of debug flag to activate it. I would not mind having to compile python --with-debug-leaks just to get that kind of tooling.
Also, storing an actual traceback is probably a bad idea, as it keeps all kinds of things alive for a very long time. Maybe storing a string representation, or enough info to generate such a representation in the ResourceWarning?
yeah good point.
Also, I suspect that knowing the arguments used to allocate the resource (like the filename passed to open or the object's constructor, in the most common case, but also things like the cloexec flag when debugging multi-process apps) might be at least as useful as the traceback, so you might as well add that too.
Anyway, there's pure Python code in Lib/io.py that wraps up the C code. To experiment with this without getting into the C stuff, I think you could edit it to use your own classes that wrap _io.BufferedWriter, etc. instead of just exporting those directly.
According to my attempts, It's hard to make sure all the calls are really going through those classes. Another approach is to hack ResourceWarning itself to give the idea a try Cheers
On 19 Mar 2014 06:09, "Tarek Ziadé"
Hey
ResourceWarning instances that are created in some classes' __del__ function, like FileIO are a great tool to track down a program bad behavior.
In network programming, that's even more important to avoid crashes, or huge leaks.
But some of them are very hard to fix because we don't get much context, we just get warnings at the end of the program execution, when gc.collect is
called.
sys:1: ResourceWarning: unclosed file <_io.FileIO name=18 mode='wb'> sys:1: ResourceWarning: unclosed file <_io.FileIO name=17 mode='rb'>
Here I just know that somewhere, 2 file descriptors where not closed.
What I'd like to be able to do is to track down the origin of those warnings.
Since __del__ is called asynchronously, it's impossible to track it right now (or I don't know how)
What we need is a way to keep track of any resource allocation *when it happens*.
Here's an idea: let's add three private functions in Python's io:
def __allocate_resource(fd) => records the file descriptor that was allocated, along with the current traceback. def __free_resource(fd) => removes the fd from the list. def __is_resource_allocated(fd) => tell if the resource is in the list.
These three functions, plugged in somewhere in io's classes, could be used in conjunction with ResourceWarning: when __del__ is called, if the resource was not freed - we'd be able to know where it was created.
Of course these functions are just a brain dump - I have no idea how io internals work. But unless I missed it, something like I've just described is missing in Python.
You should be able to experiment with something based on tracemalloc (although it may require patching the io implementation or else installing a GC callback that looks for particular types). Cheers, Nick.
Cheers Tarek _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Le 18/03/14 22:04, Nick Coghlan a écrit :
..
You should be able to experiment with something based on tracemalloc (although it may require patching the io implementation or else installing a GC callback that looks for particular types).
We've played a little bit with Victor on this today, using his script here: https://bitbucket.org/haypo/misc/src/tip/python/res_warn.py That did not fully work - I still need to investigate, but that's the general idea. I suspect modifying ResourceWarning itself to track down things would ensure we're not missing anything. Cheers Tarek
participants (3)
-
Andrew Barnert
-
Nick Coghlan
-
Tarek Ziadé