feature to make traceback objects usable without references to frame locals and globals
Hi, I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object. Currently, traceback objects are used when capturing and re-raising exceptions. However, they hold a reference to all frames, which hold a reference to their locals and globals. These are not needed by the default traceback output, and can cause serious memory bloat if a reference to a traceback object is kept for any significant length of time, and there are even big red warnings in the Python docs about using them in one frame. ( http://docs.python.org/release/3.1/library/sys.html#sys.exc_info ). Example usage would be something like: import sys try: 1/0 except: t, v, tb = sys.exc_info() tb.clean() # ... much later ... raise t, v, tb Which would be basically a function to do this: import sys try: 1/0 except: t, v, tb = sys.exc_info() c = tb while c: c.tb_frame.f_locals = None c.tb_frame.f_globals = None c = c.tb_next # ... much later ... raise t, v, tb Twisted has done a very similar thing with their twisted.python.failure.Failure object, which stringifies the traceback data and discards the reference to the Python traceback entirely ( http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.0.0/twisted/p... ) - they also replicate a lot of traceback printing functions to make use of this stringified data. It's worth noting that cgitb and other applications make use of locals and globals in its traceback output. However, I believe the vast majority of traceback usage does not make use of these references, and a significant penalty is paid as a result. Is there any interest in such a feature? -Greg
Do you have profiling data to support your claim? On Fri, Jun 25, 2010 at 7:48 PM, <ghazel@gmail.com> wrote:
Hi,
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Currently, traceback objects are used when capturing and re-raising exceptions. However, they hold a reference to all frames, which hold a reference to their locals and globals. These are not needed by the default traceback output, and can cause serious memory bloat if a reference to a traceback object is kept for any significant length of time, and there are even big red warnings in the Python docs about using them in one frame. ( http://docs.python.org/release/3.1/library/sys.html#sys.exc_info ).
Example usage would be something like:
import sys try: 1/0 except: t, v, tb = sys.exc_info() tb.clean() # ... much later ... raise t, v, tb
Which would be basically a function to do this:
import sys try: 1/0 except: t, v, tb = sys.exc_info() c = tb while c: c.tb_frame.f_locals = None c.tb_frame.f_globals = None c = c.tb_next # ... much later ... raise t, v, tb
Twisted has done a very similar thing with their twisted.python.failure.Failure object, which stringifies the traceback data and discards the reference to the Python traceback entirely ( http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.0.0/twisted/p... ) - they also replicate a lot of traceback printing functions to make use of this stringified data.
It's worth noting that cgitb and other applications make use of locals and globals in its traceback output. However, I believe the vast majority of traceback usage does not make use of these references, and a significant penalty is paid as a result.
Is there any interest in such a feature?
-Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
Well, I discovered this property of traceback objects when a real-world server of mine began eating all the memory on the server. To me, this is the most convincing reason to address the issue. I'm not sure what sort of profiling you're looking for, but I have since then produced a contrived example which demonstrates a serious memory consumption difference with a very short traceback object lifetime: http://codepad.org/F23cwezb If you run the test with "s.e = sys.exc_info()" commented out, the observed memory footprint of the process quickly approaches and sits at 5,677,056 bytes. Totally reasonable. If you uncomment that line, the memory footprint climbs to 283,316,224 bytes quite rapidly. That's a two order of magnitude difference! If you uncomment the "gc.collect()" line, the process still hits 148,910,080 bytes. -Greg On Fri, Jun 25, 2010 at 7:58 PM, Guido van Rossum <guido@python.org> wrote:
Do you have profiling data to support your claim?
On Fri, Jun 25, 2010 at 7:48 PM, <ghazel@gmail.com> wrote:
Hi,
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Currently, traceback objects are used when capturing and re-raising exceptions. However, they hold a reference to all frames, which hold a reference to their locals and globals. These are not needed by the default traceback output, and can cause serious memory bloat if a reference to a traceback object is kept for any significant length of time, and there are even big red warnings in the Python docs about using them in one frame. ( http://docs.python.org/release/3.1/library/sys.html#sys.exc_info ).
Example usage would be something like:
import sys try: 1/0 except: t, v, tb = sys.exc_info() tb.clean() # ... much later ... raise t, v, tb
Which would be basically a function to do this:
import sys try: 1/0 except: t, v, tb = sys.exc_info() c = tb while c: c.tb_frame.f_locals = None c.tb_frame.f_globals = None c = c.tb_next # ... much later ... raise t, v, tb
Twisted has done a very similar thing with their twisted.python.failure.Failure object, which stringifies the traceback data and discards the reference to the Python traceback entirely ( http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.0.0/twisted/p... ) - they also replicate a lot of traceback printing functions to make use of this stringified data.
It's worth noting that cgitb and other applications make use of locals and globals in its traceback output. However, I believe the vast majority of traceback usage does not make use of these references, and a significant penalty is paid as a result.
Is there any interest in such a feature?
-Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
I'd like to take this further and remove the need for traceback objects to refer to a frame object at all. The standard traceback printout only needs two pieces of information from the traceback, the file name and line number. The line number is already present in the traceback object. All it would take is the addition of a file name attribute to the traceback object, and the frame reference could be made optional. This would be a big help for Pyrex and Cython, which currently have to create entire dummy frame objects in order to add entries to the traceback. Not only is this tedious and inefficient, it ties them to internal details of the frame object that are vulnerable to change. It would be much nicer to have a simple API function such as PyTraceback_AddEntry(filename, lineno) to add a frameless traceback object. -- Greg
On Sat, 26 Jun 2010 20:23:52 +1200 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
I'd like to take this further and remove the need for traceback objects to refer to a frame object at all. The standard traceback printout only needs two pieces of information from the traceback, the file name and line number.
Both ideas seem reasonable, but they need a concrete proposal and/or a patch. Regards Antoine.
Greg Ewing wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Wouldn't it be better to write safer code and not store a reference to the traceback object in the first place ? Working with traceback objects can easily introduce hidden circular references, so it usually better not access them at all, if you don't have a need for them: Either like this: try: raise Exception except Exception, reason: pass or by using slicing: try: raise Exception except Exception, reason: errorclass, errorobject = sys.exc_info()[:2] pass If you do need to access them, make sure you clean up the reference as soon as you can: try: raise Exception except Exception, reason: errorclass, errorobject, tb = sys.exc_info() ... tb = None
I'd like to take this further and remove the need for traceback objects to refer to a frame object at all. The standard traceback printout only needs two pieces of information from the traceback, the file name and line number.
The line number is already present in the traceback object. All it would take is the addition of a file name attribute to the traceback object, and the frame reference could be made optional.
How would you make that reference optional ? The frames are needed to inspect the locals and globals of the call stack and debugging code relies on them being available. Also: What's the use case for creating traceback objects outside the Python interpreter core ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 26 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 22 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Sat, Jun 26, 2010 at 4:03 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Greg Ewing wrote:
I'd like to take this further and remove the need for traceback objects to refer to a frame object at all. The standard traceback printout only needs two pieces of information from the traceback, the file name and line number.
First off, Greg Ewing's idea fully covers my use case and may even simplify implementation, so I'm in favor it. I have never used (and very much question the use of) references to locals and globals. Having some backwards-compatible way to avoid ever having to deal with them would be preferable. On Sat, Jun 26, 2010 at 4:03 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Greg Ewing wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Wouldn't it be better to write safer code and not store a reference to the traceback object in the first place ?
Working with traceback objects can easily introduce hidden circular references, so it usually better not access them at all, if you don't have a need for them:
Those are strong words against using traceback objects. This feature idea is about creating a way to make traceback objects usable without the gotcha you're referencing. -Greg
On Sat, 26 Jun 2010 13:03:38 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Greg Ewing wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Wouldn't it be better to write safer code and not store a reference to the traceback object in the first place ?
In Python 3, tracebacks are stored as an attribute of the corresponding exception:
try: 1/0 ... except Exception as _: e = _ ... e.__traceback__ <traceback object at 0x7ff69fdbf908>
Also: What's the use case for creating traceback objects outside the Python interpreter core ?
He's not talking about creating traceback objects outside the core, but being able to reuse tracebacks created by the core without keeping alive a whole chain of objects. It's a real need when you want to do careful error handling/reporting without wasting too many resources. As already mentioned, Twisted has a bunch of code to work around that problem, since errors can be quite long-lived in a pipelined asynchronous execution model. Antoine.
Antoine Pitrou wrote:
On Sat, 26 Jun 2010 13:03:38 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Greg Ewing wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Wouldn't it be better to write safer code and not store a reference to the traceback object in the first place ?
In Python 3, tracebacks are stored as an attribute of the corresponding exception:
try: 1/0 ... except Exception as _: e = _ ... e.__traceback__ <traceback object at 0x7ff69fdbf908>
Ouch. So you explicitly need get rid off the traceback in Python3 if you want to avoid keeping the associated objects alive during exception processing ? I think that design decision needs to be revisited. Tracebacks are needed for error reporting, but (normally) not for managing error handling or recovery. E.g. it is not uncommon to store exception objects in a list for later batched error reporting. With the traceback being referenced on those object and the traceback chain keeping references to all frames alive, this kind of processing won't be feasible anymore. What's even more important is that programmers are unlikely going to be aware of this detail and its implications.
Also: What's the use case for creating traceback objects outside the Python interpreter core ?
He's not talking about creating traceback objects outside the core, but being able to reuse tracebacks created by the core without keeping alive a whole chain of objects.
With the question I was referring to the suggestion by Greg Ewing in which he seemed to imply that Pyrex and Cython create traceback objects.
It's a real need when you want to do careful error handling/reporting without wasting too many resources. As already mentioned, Twisted has a bunch of code to work around that problem, since errors can be quite long-lived in a pipelined asynchronous execution model.
With the above detail, I completely agree. In fact, more than that: I think we should make storing the traceback in exception.__traceback__ optional and not the default, much like .__context__ and .__cause__. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 26 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 22 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
Please don't act so surprised. There are about 4 relevant PEPs: 344, 3109, 3110, 3134 (the latter replacing 344). Also note that the traceback is only kept alove if the exception object is explicitly copied out of the except block that caught it -- normally the exception object is deleted when that block is left. --Guido On Sat, Jun 26, 2010 at 2:53 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Antoine Pitrou wrote:
On Sat, 26 Jun 2010 13:03:38 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Greg Ewing wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Wouldn't it be better to write safer code and not store a reference to the traceback object in the first place ?
In Python 3, tracebacks are stored as an attribute of the corresponding exception:
try: 1/0 ... except Exception as _: e = _ ... e.__traceback__ <traceback object at 0x7ff69fdbf908>
Ouch.
So you explicitly need get rid off the traceback in Python3 if you want to avoid keeping the associated objects alive during exception processing ?
I think that design decision needs to be revisited. Tracebacks are needed for error reporting, but (normally) not for managing error handling or recovery.
E.g. it is not uncommon to store exception objects in a list for later batched error reporting. With the traceback being referenced on those object and the traceback chain keeping references to all frames alive, this kind of processing won't be feasible anymore.
What's even more important is that programmers are unlikely going to be aware of this detail and its implications.
Also: What's the use case for creating traceback objects outside the Python interpreter core ?
He's not talking about creating traceback objects outside the core, but being able to reuse tracebacks created by the core without keeping alive a whole chain of objects.
With the question I was referring to the suggestion by Greg Ewing in which he seemed to imply that Pyrex and Cython create traceback objects.
It's a real need when you want to do careful error handling/reporting without wasting too many resources. As already mentioned, Twisted has a bunch of code to work around that problem, since errors can be quite long-lived in a pipelined asynchronous execution model.
With the above detail, I completely agree. In fact, more than that: I think we should make storing the traceback in exception.__traceback__ optional and not the default, much like .__context__ and .__cause__.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Jun 26 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 22 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
Please don't act so surprised. There are about 4 relevant PEPs: 344, 3109, 3110, 3134 (the latter replacing 344).
I knew about the discussions around chained exception. I wasn't aware of the idea to keep a traceback object on the exception object itself. PEP 3134 also mentioned the case we're currently discussing: """ Open Issue: Garbage Collection The strongest objection to this proposal has been that it creates cycles between exceptions and stack frames [12]. Collection of cyclic garbage (and therefore resource release) can be greatly delayed. >>> try: >>> 1/0 >>> except Exception, err: >>> pass will introduce a cycle from err -> traceback -> stack frame -> err, keeping all locals in the same scope alive until the next GC happens. Today, these locals would go out of scope. There is lots of code which assumes that "local" resources -- particularly open files -- will be closed quickly. If closure has to wait for the next GC, a program (which runs fine today) may run out of file handles. Making the __traceback__ attribute a weak reference would avoid the problems with cyclic garbage. Unfortunately, it would make saving the Exception for later (as unittest does) more awkward, and it would not allow as much cleanup of the sys module. A possible alternate solution, suggested by Adam Olsen, would be to instead turn the reference from the stack frame to the 'err' variable into a weak reference when the variable goes out of scope [13]. """ So obviously this case had already been discussed before. Was a solution found and implemented that addresses the problem ?
Also note that the traceback is only kept alove if the exception object is explicitly copied out of the except block that caught it -- normally the exception object is deleted when that block is left.
Right, but only if you do not use the exception object for other purposes elsewhere. If you do that a lot in your application, it appears that the only way around keeping lots of traceback objects alive is by explicitly setting .__traceback__ to None before storing away the exception object. Think of e.g. an application that does a long running calculation. Such applications typically want to continue processing even in case of errors and report all errors at the end of the run. If a programmer is unaware of the traceback issue, he'd likely run into a memory problem without really knowing where to look for the cause. Also note that garbage collection will not necessarily do what the user expects: it is well possible that big amounts of memory will stay allocated as unused space in pymalloc. This is not specific to the discussed case, but still a valid user concern. Greg Hazel observed this situation in his example.
--Guido
On Sat, Jun 26, 2010 at 2:53 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Antoine Pitrou wrote:
On Sat, 26 Jun 2010 13:03:38 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Greg Ewing wrote:
ghazel@gmail.com wrote:
I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object.
Wouldn't it be better to write safer code and not store a reference to the traceback object in the first place ?
In Python 3, tracebacks are stored as an attribute of the corresponding exception:
try: 1/0 ... except Exception as _: e = _ ... e.__traceback__ <traceback object at 0x7ff69fdbf908>
Ouch.
So you explicitly need get rid off the traceback in Python3 if you want to avoid keeping the associated objects alive during exception processing ?
I think that design decision needs to be revisited. Tracebacks are needed for error reporting, but (normally) not for managing error handling or recovery.
E.g. it is not uncommon to store exception objects in a list for later batched error reporting. With the traceback being referenced on those object and the traceback chain keeping references to all frames alive, this kind of processing won't be feasible anymore.
What's even more important is that programmers are unlikely going to be aware of this detail and its implications.
Also: What's the use case for creating traceback objects outside the Python interpreter core ?
He's not talking about creating traceback objects outside the core, but being able to reuse tracebacks created by the core without keeping alive a whole chain of objects.
With the question I was referring to the suggestion by Greg Ewing in which he seemed to imply that Pyrex and Cython create traceback objects.
It's a real need when you want to do careful error handling/reporting without wasting too many resources. As already mentioned, Twisted has a bunch of code to work around that problem, since errors can be quite long-lived in a pipelined asynchronous execution model.
With the above detail, I completely agree. In fact, more than that: I think we should make storing the traceback in exception.__traceback__ optional and not the default, much like .__context__ and .__cause__.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Jun 26 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 22 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Sat, Jun 26, 2010 at 4:45 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Also note that garbage collection will not necessarily do what the user expects: it is well possible that big amounts of memory will stay allocated as unused space in pymalloc. This is not specific to the discussed case, but still a valid user concern. Greg Hazel observed this situation in his example.
Aha. So whereas the process size ballooned, there is no actual memory leak (his example threw away the exception each time through the loop), it's just that looking at process size is a bad way to assess memory leaks. I would like to reject this then as "that's just how Python's memory allocation works". As you say, it's not specific to this case; it comes up occasionally and it's just a matter of user education. I don't think anything should be done about __traceback__ either -- frameworks that have this problem can work around it in various ways. Or, at least I don't see a reason to panic and roll back the feature. Maybe eventually it can be improved by adding some kind of functionality to control some details of the behavior. -- --Guido van Rossum (python.org/~guido)
On Sun, Jun 27, 2010 at 8:33 AM, Guido van Rossum <guido@python.org> wrote:
On Sat, Jun 26, 2010 at 4:45 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Also note that garbage collection will not necessarily do what the user expects: it is well possible that big amounts of memory will stay allocated as unused space in pymalloc. This is not specific to the discussed case, but still a valid user concern. Greg Hazel observed this situation in his example.
Aha. So whereas the process size ballooned, there is no actual memory leak (his example threw away the exception each time through the loop), it's just that looking at process size is a bad way to assess memory leaks. I would like to reject this then as "that's just how Python's memory allocation works". As you say, it's not specific to this case; it comes up occasionally and it's just a matter of user education.
Leak? My example does not try to demonstrate a leak. It demonstrates excessive allocation. If you collect a few times after the test the memory usage of the process does drop to a reasonable level again. In a real-world application with long-lived traceback objects and more state, this excessive allocation becomes crippling. Go ahead, add a zero to the size of that list being created in the example. Without the traceback reference the process stays stable at 17MB, with the reference it balloons to consume all of the 2GB of RAM in my laptop, causing swapping. This is similar to the observed behavior of a real application, which is completely stable and requires relatively little memory when not using traceback objects, but quickly grows to an unmanageable size with traceback objects.
I don't think anything should be done about __traceback__ either -- frameworks that have this problem can work around it in various ways. Or, at least I don't see a reason to panic and roll back the feature. Maybe eventually it can be improved by adding some kind of functionality to control some details of the behavior.
This idea is about an improvement to control some details of the behavior. Keeping __traceback__ in more cases would be nothing to "panic" about, if tracebacks were not such "unsafe" objects. I have not yet seen any way for a framework to work around the references issue without discarding the traceback object entirely and losing the ability to re-raise. -Greg
Guido van Rossum wrote:
On Sat, Jun 26, 2010 at 4:45 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Also note that garbage collection will not necessarily do what the user expects: it is well possible that big amounts of memory will stay allocated as unused space in pymalloc. This is not specific to the discussed case, but still a valid user concern. Greg Hazel observed this situation in his example.
Aha. So whereas the process size ballooned, there is no actual memory leak (his example threw away the exception each time through the loop), it's just that looking at process size is a bad way to assess memory leaks. I would like to reject this then as "that's just how Python's memory allocation works". As you say, it's not specific to this case; it comes up occasionally and it's just a matter of user education.
pymalloc has gotten a lot better since it was fixed in Python 2.5 to return unused chunks of memory to the OS, but we still have the issue of fragmented arenas with cases of just a few bytes keeping 256kB (the size of an arena) allocated.
I don't think anything should be done about __traceback__ either -- frameworks that have this problem can work around it in various ways. Or, at least I don't see a reason to panic and roll back the feature. Maybe eventually it can be improved by adding some kind of functionality to control some details of the behavior.
Not necessarily roll back the feature, but an implementation that deliberately introduces circular references is not really ideal. Since tracebacks on exceptions are rarely used by applications, I think it would be better to turn them into weak references. The arguments against doing this in the PEP appear rather weak compared to the potential issue for non-expert Python programmers: """ Making the __traceback__ attribute a weak reference would avoid the problems with cyclic garbage. Unfortunately, it would make saving the Exception for later (as unittest does) more awkward, and it would not allow as much cleanup of the sys module. """ Special use cases that want to save the traceback for later use can always explicitly convert the traceback into a real (non-weak) reference. I don't understand the reference to the sys module cleanup, so can't comment on that. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Sun, 27 Jun 2010 19:20:07 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Not necessarily roll back the feature, but an implementation that deliberately introduces circular references is not really ideal.
Since tracebacks on exceptions are rarely used by applications, I think it would be better to turn them into weak references.
How do you manage to get a strong reference before the traceback object gets deleted? Besides, an API which gives some information in an unreliable manner does not seem very user-friendly to me. I think I like the OP's idea better: allow to release the references to local and global variables from the frames in the traceback. This is keeps a lot of potentially large objects alive - some of which may also keep some OS resources busy.
On 28 June 2010 04:00, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 27 Jun 2010 19:20:07 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Not necessarily roll back the feature, but an implementation that deliberately introduces circular references is not really ideal.
Since tracebacks on exceptions are rarely used by applications, I think it would be better to turn them into weak references.
How do you manage to get a strong reference before the traceback object gets deleted?
At the beginning of the 'except' block, a strong local (but hidden) reference is obtained to the traceback (if it exists). This is deleted at the end of the 'except' block. Besides, an API which gives some information in an unreliable manner
does not seem very user-friendly to me.
I think I like the OP's idea better: allow to release the references to local and global variables from the frames in the traceback. This is keeps a lot of potentially large objects alive - some of which may also keep some OS resources busy.
I agree, with a variation - keep a weak reference to the frame in the traceback, and have a way for the application to specify that it wants to retain strong references to frames (so unittest for example can guarantee access to locals and globals). Possibly a context manager could be used for this, and decorators could be used to wrap an entire method in the context manager. A dummy frame would also be stored that contained enough info to replicate the existing stack trace (file, line number, etc). A strong reference could be obtained via the existing attribute, converted to a property, which does: a. return the internal reference if it is not a dummy frame; b. return the result of the weak reference if it still exists; c. return the dummy frame reference. I think this gives us the best of all worlds: 1. No strong reference to locals/globals in tracebacks by default; 2. Able to force strong references to frames; 3. We don't lose the ability to compose a full and complete stack trace. Tim Delaney
Antoine Pitrou wrote:
On Sun, 27 Jun 2010 19:20:07 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Not necessarily roll back the feature, but an implementation that deliberately introduces circular references is not really ideal.
Since tracebacks on exceptions are rarely used by applications, I think it would be better to turn them into weak references.
How do you manage to get a strong reference before the traceback object gets deleted?
IIUC, the traceback object will still be alive during processing of the except clause, so all you'd have to do is turn the weak reference into a real one. Let's assume that the weakref object is called .__traceback_weakref__ and the proxy called .__traceback__ (to assure compatibility). ... except TypeError as exc: # Replace the weakref object with the referenced object exc.__traceback__ = exc.__traceback_weakref__() # Set the weakref object to None to have it collected and to # signal this operation to other code knowing about this # strategy. exc.__traceback_weakref__ = None BTW: I wonder why proxy objects don't provide a direct access to the weakref object they are using. That would make keeping that extra variable around unnecessary.
Besides, an API which gives some information in an unreliable manner does not seem very user-friendly to me.
The argument so far has been that most error processing happens in the except clause itself, making it unnecessary to deal with possible circular references. That is certainly true in many cases. Now under that argument, using the traceback stored on an exception outside the except clause is even less likely to be needed, so I don't follow your concern that using a weak reference is less user-friendly. Perhaps someone could highlight a use case where the traceback is needed outside the except clause ?!
I think I like the OP's idea better: allow to release the references to local and global variables from the frames in the traceback. This is keeps a lot of potentially large objects alive - some of which may also keep some OS resources busy.
It's certainly a good idea to pay extra attention to this in Python3. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Mon, 28 Jun 2010 13:14:21 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
BTW: I wonder why proxy objects don't provide a direct access to the weakref object they are using. That would make keeping that extra variable around unnecessary.
Probably because the proxy would then have an additional attribute which isn't on the proxied object. Or, worse, it could also shadow one of the proxied object's existing attributes.
Perhaps someone could highlight a use case where the traceback is needed outside the except clause ?!
Well, it's needed if you want delayed error reporting and still display a comprehensive stack trace (rather than just the exception message). Frameworks often need this kind of behaviour; Twisted was already mentioned in this thread. But, even outside of frameworks, there are situations where you want to process a bunch of data and present all processing errors at the end. However, as the OP argued, most often you need the traceback in order to display file names and line numbers, but you don't need the attached variables (locals and globals). Regards Antoine.
Antoine Pitrou wrote:
On Mon, 28 Jun 2010 13:14:21 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
BTW: I wonder why proxy objects don't provide a direct access to the weakref object they are using. That would make keeping that extra variable around unnecessary.
Probably because the proxy would then have an additional attribute which isn't on the proxied object. Or, worse, it could also shadow one of the proxied object's existing attributes.
That's a very weak argument, IMHO. It all depends on the naming of the attribute. Also note that the proxied object won't know anything about that attribute, so it doesn't have any side-effects. We've used such an approach on our mxProxy object for years without any problems or naming conflicts so far: http://www.egenix.com/products/python/mxBase/mxProxy/ http://www.egenix.com/products/python/mxBase/mxProxy/doc/#_Toc162774452
Perhaps someone could highlight a use case where the traceback is needed outside the except clause ?!
Well, it's needed if you want delayed error reporting and still display a comprehensive stack trace (rather than just the exception message). Frameworks often need this kind of behaviour; Twisted was already mentioned in this thread. But, even outside of frameworks, there are situations where you want to process a bunch of data and present all processing errors at the end.
I had already given that example myself, but in those cases I had in mind the stack trace is not really needed: instead, you add the relevant information to the list of errors directly from the except clause, since the error information needed to report the issues is not related to programming errors, but instead to data errors.
However, as the OP argued, most often you need the traceback in order to display file names and line numbers, but you don't need the attached variables (locals and globals).
I guess all this just needs to be highlighted in the documentation to make programmers aware of the fact that they cannot just store exception objects away without considering the consequences of this first. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Mon, 28 Jun 2010 15:29:25 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Antoine Pitrou wrote:
On Mon, 28 Jun 2010 13:14:21 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
BTW: I wonder why proxy objects don't provide a direct access to the weakref object they are using. That would make keeping that extra variable around unnecessary.
Probably because the proxy would then have an additional attribute which isn't on the proxied object. Or, worse, it could also shadow one of the proxied object's existing attributes.
That's a very weak argument, IMHO. It all depends on the naming of the attribute.
What name do you suggest that isn't cumbersome or awkward, and yet doesn't present any risk of conflict with attributes of the proxied object?
We've used such an approach on our mxProxy object for years without any problems or naming conflicts so far:
http://www.egenix.com/products/python/mxBase/mxProxy/ http://www.egenix.com/products/python/mxBase/mxProxy/doc/#_Toc162774452
Well, if some features of mxProxy are useful, perhaps it would be worth integrating them in the stdlib. Regards Antoine.
Antoine Pitrou wrote:
On Mon, 28 Jun 2010 15:29:25 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
Antoine Pitrou wrote:
On Mon, 28 Jun 2010 13:14:21 +0200 "M.-A. Lemburg" <mal@egenix.com> wrote:
BTW: I wonder why proxy objects don't provide a direct access to the weakref object they are using. That would make keeping that extra variable around unnecessary.
Probably because the proxy would then have an additional attribute which isn't on the proxied object. Or, worse, it could also shadow one of the proxied object's existing attributes.
That's a very weak argument, IMHO. It all depends on the naming of the attribute.
What name do you suggest that isn't cumbersome or awkward, and yet doesn't present any risk of conflict with attributes of the proxied object?
If you want to play safe, use something like '__weakref_object__'. In mxProxy, we simply reserved all methods and attributes that start with 'proxy_' for use by the proxy object itself. That hasn't caused a conflict so far.
We've used such an approach on our mxProxy object for years without any problems or naming conflicts so far:
http://www.egenix.com/products/python/mxBase/mxProxy/ http://www.egenix.com/products/python/mxBase/mxProxy/doc/#_Toc162774452
Well, if some features of mxProxy are useful, perhaps it would be worth integrating them in the stdlib.
We mainly use mxProxy for low-level access control to objects, and as a way to implement a cleanup protocol for breaking circular references early. The weak reference feature was a later add-on and also serves as an additional way to prevent creation of circular references. All this was designed prior to Python implementing the GC protocol which now implements something similar to the cleanup protocol we have in mxProxy. Unlike the standard Python weakref implementation, mxProxy doesn't require changes to the proxy objects in order to create a weak reference. It works for all objects. I don't know why Fred used a different approach. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
Antoine Pitrou wrote:
"M.-A. Lemburg" <mal@egenix.com> wrote:
BTW: I wonder why proxy objects don't provide a direct access to the weakref object they are using.
Probably because the proxy would then have an additional attribute which isn't on the proxied object.
This problem could be avoided by providing a function to extract the proxied object. -- Greg
On 6/28/2010 8:39 AM, Antoine Pitrou wrote:
However, as the OP argued, most often you need the traceback in order to display file names and line numbers, but you don't need the attached variables (locals and globals).
It then seems to me that one should extract the file name and line number info one wants to save before exiting the exception clause and let the traceback exception and traceback go on exit. Is a library function needed to make extraction easier? Perhaps this "The reason for this [deletion on exit] is that with the traceback attached to them, exceptions will form a reference cycle with the stack frame, keeping all locals in that frame alive until the next garbage collection occurs." could be strengthened into a better warning that this "That means that you have to assign the exception to a different name if you want to be able to refer to it after the except clause. " may really, really not be a good idea. -- Terry Jan Reedy
On Mon, Jun 28, 2010 at 6:26 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 6/28/2010 8:39 AM, Antoine Pitrou wrote:
However, as the OP argued, most often you need the traceback in order to display file names and line numbers, but you don't need the attached variables (locals and globals).
It then seems to me that one should extract the file name and line number info one wants to save before exiting the exception clause and let the traceback exception and traceback go on exit. Is a library function needed to make extraction easier?
Unfortunately this is only half of the task. To re-raise the exception with the traceback later, a real traceback object is needed. To my knowledge there is no way to create a real traceback object from Python given only file name and line numbers. -Greg
Terry Reedy wrote:
It then seems to me that one should extract the file name and line number info one wants to save before exiting the exception clause and let the traceback exception and traceback go on exit. Is a library function needed to make extraction easier?
That would require building your own custom traceback structure that would be incompatible with any of the standard functions available for formatting and printing tracebacks. -- Greg
On Sun, Jun 27, 2010 at 10:20, M.-A. Lemburg <mal@egenix.com> wrote:
Guido van Rossum wrote:
On Sat, Jun 26, 2010 at 4:45 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Also note that garbage collection will not necessarily do what the user expects: it is well possible that big amounts of memory will stay allocated as unused space in pymalloc. This is not specific to the discussed case, but still a valid user concern. Greg Hazel observed this situation in his example.
Aha. So whereas the process size ballooned, there is no actual memory leak (his example threw away the exception each time through the loop), it's just that looking at process size is a bad way to assess memory leaks. I would like to reject this then as "that's just how Python's memory allocation works". As you say, it's not specific to this case; it comes up occasionally and it's just a matter of user education.
pymalloc has gotten a lot better since it was fixed in Python 2.5 to return unused chunks of memory to the OS, but we still have the issue of fragmented arenas with cases of just a few bytes keeping 256kB (the size of an arena) allocated.
I don't think anything should be done about __traceback__ either -- frameworks that have this problem can work around it in various ways. Or, at least I don't see a reason to panic and roll back the feature. Maybe eventually it can be improved by adding some kind of functionality to control some details of the behavior.
Not necessarily roll back the feature, but an implementation that deliberately introduces circular references is not really ideal.
But the circular reference only occurs if you store a reference outside the 'except' clause; Python 3 explicitly deletes any caught exception variable to prevent the loop.
Since tracebacks on exceptions are rarely used by applications, I think it would be better to turn them into weak references.
While I would be fine with that if that many people save raised exceptions outside of an 'except' clause, I doubt that happens very often and there would be backward-compatibility issues at this point.
On Sun, Jun 27, 2010 at 7:53 AM, M.-A. Lemburg <mal@egenix.com> wrote:
He's not talking about creating traceback objects outside the core, but being able to reuse tracebacks created by the core without keeping alive a whole chain of objects.
With the question I was referring to the suggestion by Greg Ewing in which he seemed to imply that Pyrex and Cython create traceback objects.
When Python code calls into Pyrex/C code which then call back into Python, I understand they insert dummy frames into the tracebacks to make the call stack more complete. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan wrote:
On Sun, Jun 27, 2010 at 7:53 AM, M.-A. Lemburg <mal@egenix.com> wrote:
He's not talking about creating traceback objects outside the core, but being able to reuse tracebacks created by the core without keeping alive a whole chain of objects.
With the question I was referring to the suggestion by Greg Ewing in which he seemed to imply that Pyrex and Cython create traceback objects.
When Python code calls into Pyrex/C code which then call back into Python, I understand they insert dummy frames into the tracebacks to make the call stack more complete.
Thanks for that bit of information. I suppose they do this for better error reporting, right ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Mon, Jun 28, 2010 at 9:31 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Nick Coghlan wrote:
When Python code calls into Pyrex/C code which then call back into Python, I understand they insert dummy frames into the tracebacks to make the call stack more complete.
Thanks for that bit of information. I suppose they do this for better error reporting, right ?
I believe so, but keep in mind that I've never actually used them myself, I've just seen this behaviour described elsewhere. It makes sense for them to do it though, since following a call stack through a plain C or C++ extension module can get rather confusing at times. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
M.-A. Lemburg wrote:
Nick Coghlan wrote:
When Python code calls into Pyrex/C code which then call back into Python, I understand they insert dummy frames into the tracebacks to make the call stack more complete.
I suppose they do this for better error reporting, right ?
Yes. This is one reason I would like to be able to have traceback objects without a corresponding frame. Having to create an entire frame just to have somewhere to put the file name is very annoying. Also, being able to remove the whole frame from a traceback object seems like a cleaner and more complete way to implement what the OP wanted. -- Greg
M.-A. Lemburg <mal@...> writes:
With the above detail, I completely agree. In fact, more than that: I think we should make storing the traceback in exception.__traceback__ optional and not the default, much like .__context__ and .__cause__.
I'm not sure why you consider __context__ non-default, since it is always automatically set when it applies.
On Sat, Jun 26, 2010 at 4:23 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
ghazel@gmail.com wrote: .. I'd like to take this further and remove the need for traceback objects to refer to a frame object at all. The standard traceback printout only needs two pieces of information from the traceback, the file name and line number.
Wouldn't that make it impossible to do postmortem analysis in pdb?
participants (11)
-
Alexander Belopolsky
-
Antoine Pitrou
-
Benjamin Peterson
-
Brett Cannon
-
ghazel@gmail.com
-
Greg Ewing
-
Guido van Rossum
-
M.-A. Lemburg
-
Nick Coghlan
-
Terry Reedy
-
Tim Delaney