Re: [Python-ideas] Shrink recursion error tracebacks (was: Have REPL print less by default)

I have been thinking, and especially after Nick’s comments, that it might be better to keep it as simple as possible to reduce the risk of errors happening while we’re printing the tracebacks. Recursive functions (that go deep enough to trigger an exception) are usually within one function, in the REPL, or by using __getattr__ / __getattribute__. Nice to have? Sure. Necessary? Don’t think as much. I’m also a very solid -1 on the idea of prompting the user to do something like full_ex() to get the full stack trace. My rationale for such a change is that we need to be extremely precise in our message, to leave absolutely no room for a different interpretation than what we mean. Your message, for instance, is ambiguous. The fact it says that calls are hidden would let me think I just lost information about the stack trace (though that’s but a wording issue). As a user, just seeing "Call _full_ex() to print full trace." would be an immediate red flag: Crap, I just lost precious debugging information to save on lines! Might be just me, but that’s my opinion anyway :) -Emanuel Example: File "<stdin>", line 1, in f File "<stdin>", line 1, in g [Mutually recursive calls hidden: f (300), g (360)] File "<stdin>", line 1, in h File "<stdin>", line 1, in f File "<stdin>", line 1, in g [Mutual-recursive calls hidden: f (103), g (200)] RuntimeError: maximum recursion depth exceeded [963 calls hidden. Call _full_ex() to print full trace.] This rule is easily modified to, "have been seen at least three times before." For functions that recurse at multiple lines, it can print out one message per line, but maybe it should count them all together in the "hidden" summary.

On 4/22/2016 10:00 PM, Émanuel Barry wrote:
I agree (I think) that the first issue and patch should be to detect and collapse n identical lines or pairs of lines in a row, where n > 3, with a message "The above pair of lines [or line] is repeated n-1 more times." This will be an optional but nice new feature giving 80% benefit for 20% work. There seems to be a consensus for this much. Lets do it.
I agree that anything beyond the first step is more dubious. I think extensions should be another proposal and possible issue. Or left to GUI wrappers of an underlying Python.
-- Terry Jan Reedy

On Fri, Apr 22, 2016 at 10:00:12PM -0400, Émanuel Barry wrote:
I’m also a very solid -1 on the idea of prompting the user to do something like full_ex() to get the full stack trace.
I'm not sure who suggested that idea. I certainly didn't.
I'm not sure who you are talking to here, or whose suggestion the following faked stack trace is. [...]
For the record, detecting subsequences of mutally recurive calls is not a trivial task. (It's not the same as cycle detection.) And I don't understand how you can get 300 calls to f and 360 calls to g in a scenario where f calls g and g calls f. Surely there would be equal number of calls to each?
I think the above over-complicates the situation. I want to deal with the low-hanging fruit that gives the biggest benefit for the least work (and least complexity!), not minimise every conceivable bit of redundancy in a stack trace. So, for the record, let me explicitly state my proposal: Stack traces should collapse blocks of repeated identical, contiguous lines (or pairs of lines, where the source code is available for display). For brevity, whenever I refer to a "line" in the stack trace, I mean either a single line of the form: File "<stdin>", line 1, in f when source code is not available, or a pair of lines of the form: File "path/to/file.py", line 1, in f line of source code when it is available. Whenever a single, continguous block of lines from the traceback consists of three or more identical lines (i.e. lines which compare equal using string equality), they should be collapsed down to: a single instance of that line followed by a message reporting the number of repetitions. For example: File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) would be collapsed to something like this: File "<stdin>", line 1, in f return f(arg) [...previous call was repeated 5 more times...] (In practice, you are more like to see "repeated 1000 more times" than just five.) This would not be collapsed, as the line numbers are not the same: File "<stdin>", line 1, in f File "<stdin>", line 2, in f File "<stdin>", line 3, in f File "<stdin>", line 4, in f File "<stdin>", line 5, in f File "<stdin>", line 6, in f nor this: File "<stdin>", line 1, in f File "<stdin>", line 1, in g File "<stdin>", line 1, in f File "<stdin>", line 1, in g File "<stdin>", line 1, in f File "<stdin>", line 1, in g My proposal does not include any provision for collapsing chains of mutual recursion. If somebody else wants to champion that as a separate proposal, please do, but I won't be making that proposal. This will shrink the ugly and harmful huge stacktraces that we get from many accidental recursion errors, without hiding potentially useful information. -- Steve

(Sorry, my previous message was sent from the wrong address, so only Émanuel got it. I then replied to his reply, which had me reuse the wrong address. Sorry, Émanuel, for this repeat.) On Fri, Apr 22, 2016 at 10:00 PM, Émanuel Barry <vgr255@live.ca> wrote:
Simple as in, unlikely to be affected by whatever caused the exception in the first place? Here's pseudocode for my suggestion. (I assume appropriate definitions of `traceback`, `print_folded`, and `print_the_thing`. I assume `func_name` is a qualified name (e.g. `('f', '<stdin>')`).) seen = collections.Counter() # Counts number of times each (func,line_no) has been seen block = None # A block of potentially-hidden functions. prev_line_no = None # In case len(block) == 1. hidecount = 0 # Total number of hidden lines. for func_name, line_no in traceback: times = seen[func_name, line_no] += 1 if times >= 3: if block is None: block = collections.Counter() block[func_name] += 1 prev_line_no = line_no else: # This `if` can be a function which returns a hidecount, # so we don't repeat ourselves at the end of the loop. if block is not None: if len(block) == 1: # don't need to hide print_the_thing(next(block. keys()), prev_line_no) else: print_folded(block) hidecount += len(block) block = None print_the_thing(func_name, line_no) if block is not None: if len(block) == 1: print_the_thing(block[0]) else: print_folded(block) hidecount += len(block)
My hiding is more complex (can't reproduce original output exactly), so it would be important to have an obvious way to get the old behavior. Someone else can propose the wording, if the hiding strategy itself seems useful. On Fri, Apr 22, 2016 at 11:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:
It depends on how you define the problem. I look for, "contiguous block of output, each line of which has been seen before." (You can set up a trivial example where g calls itself 60 times if the parameter is mod 300. Other examples might involve randomness, or perhaps it's just a really deep recursion rather than a cycle. I'm just giving an example of how you don't need to detect repeated blocks.)

On Sat, Apr 23, 2016 at 02:14:10AM -0400, Franklin? Lee wrote:
Just in case anyone missed it, here's my actual, working, code, which I now have in my PYTHONSTARTUP file. import sys import traceback from itertools import groupby TEMPLATE = " [...previous call is repeated %d times...]\n" def collapse(seq): for key, group in groupby(seq): group = list(group) if len(group) < 3: for item in group: yield item else: yield key yield TEMPLATE % (len(group)-1) def shortertb(*args): lines = traceback.format_exception(*args) sys.stderr.write(''.join(collapse(lines))) sys.excepthook = shortertb And here is an actual working example of it in action: py> import fact # Uses the obvious recursive algorithm. py> fact.fact(50000) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/steve/python/fact.py", line 3, in fact return n*fact(n-1) [...previous call is repeated 997 times...] File "/home/steve/python/fact.py", line 2, in fact if n < 1: return 1 RuntimeError: maximum recursion depth exceeded in comparison I'm not very interested in a complex, untested, incomplete, non-working chunk of pseudo-code when I have something which actually works in less than twenty lines. Especially since your version hides useful traceback information and requires the user to call a separate function to display the unmangled (and likely huge) traceback to find out what they're missing. In my version, they're not missing anything: it's a simple run-length encoding of the tracebacks, and nothing is hidden. The output is just compressed. So I'm afraid that, even if you manage to get your pseudo-code working and debugged, I'm going to vote a strong -1 on your proposal. Even if it works, I don't want lines to be hidden just because they've been seen before in some unrelated part of the traceback.
To me, it seems harmful, not useful. -- Steve

On Sat, Apr 23, 2016 at 6:13 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Just in case anyone missed it, here's my actual, working, code,
I'm not conjoined to the idea. I'm just proposing it as an alternative. I provided the pseudocode to express an idea, so that the idea could be discussed. I've already identified one glaring bug, but the quality of pseudocode itself is not really important: if there are actual implementation obstacles, they can be discussed _if_ the idea is considered useful. Remember that we're all here for the good of Python.
In a single callstack, there ARE no unrelated parts of the traceback. They've been seen before because they are part of a cycle: they started a call chain that eventually leads back to them.
In fact, it would probably only hide useful information if there is a non-trivial graph cycle. In that case, the RLE will still give the unmangled (and likely huge) traceback.

On 4/22/2016 10:00 PM, Émanuel Barry wrote:
I agree (I think) that the first issue and patch should be to detect and collapse n identical lines or pairs of lines in a row, where n > 3, with a message "The above pair of lines [or line] is repeated n-1 more times." This will be an optional but nice new feature giving 80% benefit for 20% work. There seems to be a consensus for this much. Lets do it.
I agree that anything beyond the first step is more dubious. I think extensions should be another proposal and possible issue. Or left to GUI wrappers of an underlying Python.
-- Terry Jan Reedy

On Fri, Apr 22, 2016 at 10:00:12PM -0400, Émanuel Barry wrote:
I’m also a very solid -1 on the idea of prompting the user to do something like full_ex() to get the full stack trace.
I'm not sure who suggested that idea. I certainly didn't.
I'm not sure who you are talking to here, or whose suggestion the following faked stack trace is. [...]
For the record, detecting subsequences of mutally recurive calls is not a trivial task. (It's not the same as cycle detection.) And I don't understand how you can get 300 calls to f and 360 calls to g in a scenario where f calls g and g calls f. Surely there would be equal number of calls to each?
I think the above over-complicates the situation. I want to deal with the low-hanging fruit that gives the biggest benefit for the least work (and least complexity!), not minimise every conceivable bit of redundancy in a stack trace. So, for the record, let me explicitly state my proposal: Stack traces should collapse blocks of repeated identical, contiguous lines (or pairs of lines, where the source code is available for display). For brevity, whenever I refer to a "line" in the stack trace, I mean either a single line of the form: File "<stdin>", line 1, in f when source code is not available, or a pair of lines of the form: File "path/to/file.py", line 1, in f line of source code when it is available. Whenever a single, continguous block of lines from the traceback consists of three or more identical lines (i.e. lines which compare equal using string equality), they should be collapsed down to: a single instance of that line followed by a message reporting the number of repetitions. For example: File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) File "<stdin>", line 1, in f return f(arg) would be collapsed to something like this: File "<stdin>", line 1, in f return f(arg) [...previous call was repeated 5 more times...] (In practice, you are more like to see "repeated 1000 more times" than just five.) This would not be collapsed, as the line numbers are not the same: File "<stdin>", line 1, in f File "<stdin>", line 2, in f File "<stdin>", line 3, in f File "<stdin>", line 4, in f File "<stdin>", line 5, in f File "<stdin>", line 6, in f nor this: File "<stdin>", line 1, in f File "<stdin>", line 1, in g File "<stdin>", line 1, in f File "<stdin>", line 1, in g File "<stdin>", line 1, in f File "<stdin>", line 1, in g My proposal does not include any provision for collapsing chains of mutual recursion. If somebody else wants to champion that as a separate proposal, please do, but I won't be making that proposal. This will shrink the ugly and harmful huge stacktraces that we get from many accidental recursion errors, without hiding potentially useful information. -- Steve

(Sorry, my previous message was sent from the wrong address, so only Émanuel got it. I then replied to his reply, which had me reuse the wrong address. Sorry, Émanuel, for this repeat.) On Fri, Apr 22, 2016 at 10:00 PM, Émanuel Barry <vgr255@live.ca> wrote:
Simple as in, unlikely to be affected by whatever caused the exception in the first place? Here's pseudocode for my suggestion. (I assume appropriate definitions of `traceback`, `print_folded`, and `print_the_thing`. I assume `func_name` is a qualified name (e.g. `('f', '<stdin>')`).) seen = collections.Counter() # Counts number of times each (func,line_no) has been seen block = None # A block of potentially-hidden functions. prev_line_no = None # In case len(block) == 1. hidecount = 0 # Total number of hidden lines. for func_name, line_no in traceback: times = seen[func_name, line_no] += 1 if times >= 3: if block is None: block = collections.Counter() block[func_name] += 1 prev_line_no = line_no else: # This `if` can be a function which returns a hidecount, # so we don't repeat ourselves at the end of the loop. if block is not None: if len(block) == 1: # don't need to hide print_the_thing(next(block. keys()), prev_line_no) else: print_folded(block) hidecount += len(block) block = None print_the_thing(func_name, line_no) if block is not None: if len(block) == 1: print_the_thing(block[0]) else: print_folded(block) hidecount += len(block)
My hiding is more complex (can't reproduce original output exactly), so it would be important to have an obvious way to get the old behavior. Someone else can propose the wording, if the hiding strategy itself seems useful. On Fri, Apr 22, 2016 at 11:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:
It depends on how you define the problem. I look for, "contiguous block of output, each line of which has been seen before." (You can set up a trivial example where g calls itself 60 times if the parameter is mod 300. Other examples might involve randomness, or perhaps it's just a really deep recursion rather than a cycle. I'm just giving an example of how you don't need to detect repeated blocks.)

On Sat, Apr 23, 2016 at 02:14:10AM -0400, Franklin? Lee wrote:
Just in case anyone missed it, here's my actual, working, code, which I now have in my PYTHONSTARTUP file. import sys import traceback from itertools import groupby TEMPLATE = " [...previous call is repeated %d times...]\n" def collapse(seq): for key, group in groupby(seq): group = list(group) if len(group) < 3: for item in group: yield item else: yield key yield TEMPLATE % (len(group)-1) def shortertb(*args): lines = traceback.format_exception(*args) sys.stderr.write(''.join(collapse(lines))) sys.excepthook = shortertb And here is an actual working example of it in action: py> import fact # Uses the obvious recursive algorithm. py> fact.fact(50000) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/steve/python/fact.py", line 3, in fact return n*fact(n-1) [...previous call is repeated 997 times...] File "/home/steve/python/fact.py", line 2, in fact if n < 1: return 1 RuntimeError: maximum recursion depth exceeded in comparison I'm not very interested in a complex, untested, incomplete, non-working chunk of pseudo-code when I have something which actually works in less than twenty lines. Especially since your version hides useful traceback information and requires the user to call a separate function to display the unmangled (and likely huge) traceback to find out what they're missing. In my version, they're not missing anything: it's a simple run-length encoding of the tracebacks, and nothing is hidden. The output is just compressed. So I'm afraid that, even if you manage to get your pseudo-code working and debugged, I'm going to vote a strong -1 on your proposal. Even if it works, I don't want lines to be hidden just because they've been seen before in some unrelated part of the traceback.
To me, it seems harmful, not useful. -- Steve

On Sat, Apr 23, 2016 at 6:13 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Just in case anyone missed it, here's my actual, working, code,
I'm not conjoined to the idea. I'm just proposing it as an alternative. I provided the pseudocode to express an idea, so that the idea could be discussed. I've already identified one glaring bug, but the quality of pseudocode itself is not really important: if there are actual implementation obstacles, they can be discussed _if_ the idea is considered useful. Remember that we're all here for the good of Python.
In a single callstack, there ARE no unrelated parts of the traceback. They've been seen before because they are part of a cycle: they started a call chain that eventually leads back to them.
In fact, it would probably only hide useful information if there is a non-trivial graph cycle. In that case, the RLE will still give the unmangled (and likely huge) traceback.
participants (4)
-
Franklin? Lee
-
Steven D'Aprano
-
Terry Reedy
-
Émanuel Barry