Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

Hi everyone, I fully agree with the rationale for the PEP, better error messages always help. However, I think the proposed implementation could be misleading in some cases and wastes memory. Use one position, not one-and-a-half positions. ----------------------------------------------- The main problem with PEP 657, IMO, is that it wants to present the location of an error as a pair of positions, but without the line number of the second position. Consequently it ends up with one and a half positions, not two. This does not work well for a number of reasons. 1. Errors spanning multiple lines. Consider: (i1 + i2 + s1 ) Where i1, i2 are integers and s1 is a string. With a single location, the error points to the second `+`. PEP 657 would highlight the whole line, `i1 + i2 +`, making it unclear where the error occurred. 2. Repeated binary operations on the same line. A single location can also be clearer when all the code is on one line. i1 + i2 + s1 PEP 657: i1 + i2 + s1 ^^^^^^^^^^^^ Using a single location: i1 + i2 + s1 ^ 3. Tracking locations in the compiler. While nicer locations for errors is great, it won't be popular if it has a negative impact on performance. Locations need to tracked through the compiler. The simpler the location, the easier this is to do correctly without a negative performance impact. It is already tricky to do this correctly with just line numbers because we have both source that compiles to no bytecodes (try, pass) and bytecodes that have no source (implicit return None and except cleanups). A single location can still be presented as a whole token, as tokenizing a single line of code is easy and fast. So when presenting an error, the whole token can be highlighted. E.g: NameError name ^^^^ Compression ----------- PEP 657 proposes that no compression be used, but also mandates the use of a lossy compression scheme (truncating all offsets over 255). I think it would be better to provide an API like PEP 626 and not restrict the internal format used. In fact, extending or replacing the API of PEP 626 seems the best approach to me. I wouldn't worry about memory consumption. The code object layout and unmarshalling process needs to be re-implemented for reduced memory use and faster startup: https://github.com/markshannon/faster-cpython/blob/master/tiers.md#tier-0 A few bytes more or less in the location table(s) is inconsequential. Cheers, Mark.

Hi Mark, Thanks for your useful feedback. Some comments:
1. Errors spanning multiple lines.
That is addressed. Check the latest version of the PEP: we are propising storing the end lines as well: https://www.python.org/dev/peps/pep-0657/#specification
2. Repeated binary operations on the same line.
It has been suggested to add this on top of the previous information, but we think this is problematic since these locations need to be manually calculated in the compiler, while the ranges are derived from the AST ranges. Given how error prone the old manual calculations of the AST were, we really want to avoid manual calculations here. Additionally, many users have mentioned that a single caret is difficult to read and that motivates the ranges in SyntaxErrors that we introduced.
3. Tracking locations in the compiler. PEP 657 proposes that no compression be used
No, PEP 657 doesn't specify the compression, which is different (check the latest version). We say: "The internal storage, compression and encoding of the information is left as an implementation detail and can be changed at any point as long as the public API remains unchanged."
I think it would be better to provide an API like PEP 626 and not restrict the internal format used.
Indeed, that is what we are doing. Check https://www.python.org/dev/peps/pep-0657/#id10 Cheers, Pablo Galindo Salgado On Mon, 17 May 2021 at 14:17, Mark Shannon <mark@hotpy.org> wrote:

Aah thanks for clarifying, I see what you mean now. In cases like this where the compiler is making optimizations, I think it is perfectly fine to just elide the column information. While it would be nice to maintain accurate columns wherever possible, you shouldn't constrain improvements and optimizations based on it. The traceback machinery will simply not print out the carets in that case and everything should just work smoothly.

Python does have an Optimized mode (-O). Granted, it’s not used very often, but this would be a good use case for it. -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Ethan Furman wrote:
I don't think so, since this is executed as `((i1 + i2) + s1)`. Mark's carets look correct to me, since the second (outer) addition's LHS is the result of adding `i1` and `i2`: ``` Python 3.11.0a0 (heads/main:a42d98ed91, May 16 2021, 14:02:36) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
Brandt

On Mon, May 17, 2021 at 6:18 AM Mark Shannon <mark@hotpy.org> wrote:
It's true this case is a bit confusing with the whole operation span highlighted, but I'm not sure the single location version is much better. I feel like a Really Good UI would like, highlight the two operands in different colors or something, or at least underline the two separate items whose type is incompatible separately: TypeError: unsupported operand type(s) for +: 'int' + 'str': i1 + i2 + s1 ^^^^^^^ ~~ More generally, these error messages are the kind of thing where the UI can always be tweaked to improve further, and those tweaks can make good use of any rich source information that's available. So, here's another option to consider: - When parsing, assign each AST node a unique, deterministic id (e.g. sequentially across the AST tree from top-to-bottom, left-to-right). - For each bytecode offset, store the corresponding AST node id in an lnotab-like table - When displaying a traceback, we already need to go find and read the original .py file to print source code at all. Re-parse it, and use the ids to find the original AST node, in context with full structure. Let the traceback formatter do whatever clever stuff it wants with this info. Of course if the .py and .pyc files don't match, this might produce gibberish. We already have that problem with showing source lines, but it might be even more confusing if we get some random unrelated AST node. This could be avoided by storing some kind of hash in the code object, so that we can validate the .py file we find hasn't changed (sha512 if we're feeling fancy, crc32 if we want to save space, either way is probably fine). This would make traceback printing more expensive, but only if you want the fancy features, and traceback printing is already expensive (it does file I/O!). Usually by the time you're rendering a traceback it's more important to optimize for human time than CPU time. It would take less memory than PEP 657, and the same as Mark's proposal (both only track one extra integer per bytecode offset). And it would allow for arbitrarily rich traceback display. (I guess in theory you could make this even cheaper by using it to replace lnotab, instead of extending it. But I think keeping lnotab around is a good idea, as a fallback for cases where you can't find the original source but still want some hint at location information.) -n -- Nathaniel J. Smith -- https://vorpus.org

Hi Nathaniel, Thanks a lot for your suggestion! I like the idea although I still think is more complex than our current proposal, but on the other hand it allows for a much richer results so I'm quite excited to try it out. We are going to give it a go to explore it with a prototype and if we are convinced we will update the PEP. One thing I'm not a fan of is that tools that want to leverage this information (notice our pep also offers this info for tools as an important aspect of it) will need to reparse the file AND search the AST node, which also involves transforming the full tree to Python objects, which is even slower. This is unfortunately a worse API than just get the full location given an instruction offset. Also, while displaying tracebacks may be a good scenario for the speed tradeoff, it may not be for other tools. Finally, if there is no file available all this information is lost, although one could argue that then is not extremely useful... Regards from sunny London, Pablo Galindo Salgado On Tue, 18 May 2021, 01:43 Nathaniel Smith, <njs@pobox.com> wrote:

Hummmm, actually another problem of this approach: Nodes are created and modified after the optimization pass, so the AST produced by the parser is not enough to reconstruct the actual information, we need to also run the optimization passes, but unfortunately, this is (by design) not don't in the Python API (to preserve all possible information about the original code), so this complicates quite a lot the API to get this, as `ast.parse` is not enough to get the original tree. On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

Yet another problem that I found: One integer is actually not enough to assign IDs. One unsigned integer can cover 4,294,967,295 AST nodes, but is technically possible to have more than that in a single file. While in PEP 657 we are tracking offsets that are normally very low < 100 typically or end lines that are easily compressible (as end lines are normally equal to the start of the line), node IDs can be arbitrarily big. Which is worse: the parser creates some AST nodes that throw away if the parser fails (that's how PEG parsers work), so there will be a lot of IDs that don't get assigned (and the parser doesn't have an easy way to know how many IDs has thrown away). This forces us to either use something bigger than an integer (probably a size_t) or to deal with overflow. On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

One integer is actually not enough to assign IDs.
Actually, disregard this particular problem. I think that we could perfectly stop assigning IDs if we reach the overflow limit and call it a day since you need to have a truly horrendous file to reach 4,294,967,295 AST nodes (I did some tests to check this). On Tue, 18 May 2021 at 13:25, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

Ok, we have implemented a rough prototype and we have decided not to go with this for the following reasons: * Is almost the same storage cost as the solution we already have. Since storing the node id cost 4 bytes (an integer) per instruction and our solution needs 2+ bytes per instruction for the offsets (normally just 2 as offsets are generally < 256) + the compressed end line, which is very small since is almost always the same as the start line. * It actually doesn't have more advantages. The current solution in the PEP can do exactly the same as this solution if you allow reparsing when displaying tracebacks. This is because with the start line, end line, start offset and end offset and the original file, you can extract the source that is associated with the instruction, parse it (and this is much faster because you just need to parse the tiny fragment) and then you get an AST node that you can use for whatever you want. * The proposed solution forces to reparse several times entire files just to extract a single AST node. Even worse: for frames in the same file it forces to reparse those again and again unless you complicate the implementation by adding AST caching. But even that it won't be free as it will incur in quite a lot of extra memory and this is problematic, especially when displaying exceptions as memory can be low, which the current design takes into account. * Nodes are created and modified after the optimization pass, so the AST produced by the parser is not enough to reconstruct the actual information, we need to also run the optimization passes, but unfortunately, this is (by design) not don't in the Python API (to preserve all possible information about the original code), so this complicates quite a lot the API to get this, as `ast.parse` is not enough to get the original tree. * The implementation is quite more complex then the one the PEP has since to do it right implies having to hash the source files, implement node id propagation in the parser and a node visitor to find the correct node + reparsing in the traceback. * It makes the AST (both the C one and the Python one) bigger, which increases the memory costs of parsing and very slightly affects performance (we measured a 4% decrease in perf to add the new field). * It makes the API for 3rd party tools very non-straightforward, forcing reparsing and finding the AST nodes. Even if we provide some new functionality in the ast module or similar to make this easier, it quite a lot of overhead just to get the position information. Even if is possible to solve many of these problems, we think the complexity is not worth it, especially since it actually doesn't give more functionality. On Tue, 18 May 2021 at 13:47, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

On Tue, May 18, 2021 at 2:49 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Excellent point! Do you know how reliable this is in practice, i.e. what proportion of bytecode source spans are something you can successfully pass to ast.parse? If it works it's obviously nicer, but I can't tell how often it works. E.g. anything including return/break/continue/yield/await will fail, since those require an enclosing context to be legal. I doubt return/break/continue will raise exceptions often, but yield/await do all the time. You could kluge it by wrapping the source span in a dummy 'async def' before parsing, since that makes yield/await legal, but OTOH it makes 'yield from' and 'from X import *' illegal. I guess you could have a helper that attempts passing the string to ast.parse, and if that fails tries wrapping it in a loop/sync def/async def/etc. until one of them succeeds. Maybe that would be a useful utility to add to the traceback module? Or add a PyCF_YOLO flag that tries to make sense of an arbitrary out-of-context string. (Are there any other bits of syntax that require specific contexts that I'm not thinking of? If __enter__/__exit__ raise an exception, then what's the corresponding span? The entire 'with' block, or just the 'with' line itself?) -n PS: this is completely orthogonal to PEP 657, but if you're excited about making tracebacks more readable, another piece of low-hanging fruit would be to print method __qualname__s instead of __name__s in the traceback output. The reason we don't do that now is that __qualname__ lives on the function object, but in a traceback, we can't get the function object. The traceback only has access to the code object, and the code object doesn't have __qualname__, just __name__. Probably the cleanest way to do this would be to make the traceback or code object have a pointer back to the function object. See also https://bugs.python.org/issue12857. -- Nathaniel J. Smith -- https://vorpus.org

On Wed, May 19, 2021 at 7:28 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Ah, nice! I guess I was confused by memories of the behavior in 3.6 and earlier, where 'await' was a pseudokeyword: ❯ docker run -it --rm python:3.6-alpine
Hopefully if we add more pseudokeywords in the future it won't break the ability to parse traceback spans. -n -- Nathaniel J. Smith -- https://vorpus.org

Hopefully if we add more pseudokeywords in the future it won't break the ability to parse traceback spans.
It won't because soft keywords are now handled natively with the peg parser (as "match" and "case" now) instead of hacked into the tokenizer :) On Fri, 21 May 2021 at 01:55, Nathaniel Smith <njs@pobox.com> wrote:

Hi Mark, Thanks for your useful feedback. Some comments:
1. Errors spanning multiple lines.
That is addressed. Check the latest version of the PEP: we are propising storing the end lines as well: https://www.python.org/dev/peps/pep-0657/#specification
2. Repeated binary operations on the same line.
It has been suggested to add this on top of the previous information, but we think this is problematic since these locations need to be manually calculated in the compiler, while the ranges are derived from the AST ranges. Given how error prone the old manual calculations of the AST were, we really want to avoid manual calculations here. Additionally, many users have mentioned that a single caret is difficult to read and that motivates the ranges in SyntaxErrors that we introduced.
3. Tracking locations in the compiler. PEP 657 proposes that no compression be used
No, PEP 657 doesn't specify the compression, which is different (check the latest version). We say: "The internal storage, compression and encoding of the information is left as an implementation detail and can be changed at any point as long as the public API remains unchanged."
I think it would be better to provide an API like PEP 626 and not restrict the internal format used.
Indeed, that is what we are doing. Check https://www.python.org/dev/peps/pep-0657/#id10 Cheers, Pablo Galindo Salgado On Mon, 17 May 2021 at 14:17, Mark Shannon <mark@hotpy.org> wrote:

Aah thanks for clarifying, I see what you mean now. In cases like this where the compiler is making optimizations, I think it is perfectly fine to just elide the column information. While it would be nice to maintain accurate columns wherever possible, you shouldn't constrain improvements and optimizations based on it. The traceback machinery will simply not print out the carets in that case and everything should just work smoothly.

Python does have an Optimized mode (-O). Granted, it’s not used very often, but this would be a good use case for it. -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Ethan Furman wrote:
I don't think so, since this is executed as `((i1 + i2) + s1)`. Mark's carets look correct to me, since the second (outer) addition's LHS is the result of adding `i1` and `i2`: ``` Python 3.11.0a0 (heads/main:a42d98ed91, May 16 2021, 14:02:36) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
Brandt

On Mon, May 17, 2021 at 6:18 AM Mark Shannon <mark@hotpy.org> wrote:
It's true this case is a bit confusing with the whole operation span highlighted, but I'm not sure the single location version is much better. I feel like a Really Good UI would like, highlight the two operands in different colors or something, or at least underline the two separate items whose type is incompatible separately: TypeError: unsupported operand type(s) for +: 'int' + 'str': i1 + i2 + s1 ^^^^^^^ ~~ More generally, these error messages are the kind of thing where the UI can always be tweaked to improve further, and those tweaks can make good use of any rich source information that's available. So, here's another option to consider: - When parsing, assign each AST node a unique, deterministic id (e.g. sequentially across the AST tree from top-to-bottom, left-to-right). - For each bytecode offset, store the corresponding AST node id in an lnotab-like table - When displaying a traceback, we already need to go find and read the original .py file to print source code at all. Re-parse it, and use the ids to find the original AST node, in context with full structure. Let the traceback formatter do whatever clever stuff it wants with this info. Of course if the .py and .pyc files don't match, this might produce gibberish. We already have that problem with showing source lines, but it might be even more confusing if we get some random unrelated AST node. This could be avoided by storing some kind of hash in the code object, so that we can validate the .py file we find hasn't changed (sha512 if we're feeling fancy, crc32 if we want to save space, either way is probably fine). This would make traceback printing more expensive, but only if you want the fancy features, and traceback printing is already expensive (it does file I/O!). Usually by the time you're rendering a traceback it's more important to optimize for human time than CPU time. It would take less memory than PEP 657, and the same as Mark's proposal (both only track one extra integer per bytecode offset). And it would allow for arbitrarily rich traceback display. (I guess in theory you could make this even cheaper by using it to replace lnotab, instead of extending it. But I think keeping lnotab around is a good idea, as a fallback for cases where you can't find the original source but still want some hint at location information.) -n -- Nathaniel J. Smith -- https://vorpus.org

Hi Nathaniel, Thanks a lot for your suggestion! I like the idea although I still think is more complex than our current proposal, but on the other hand it allows for a much richer results so I'm quite excited to try it out. We are going to give it a go to explore it with a prototype and if we are convinced we will update the PEP. One thing I'm not a fan of is that tools that want to leverage this information (notice our pep also offers this info for tools as an important aspect of it) will need to reparse the file AND search the AST node, which also involves transforming the full tree to Python objects, which is even slower. This is unfortunately a worse API than just get the full location given an instruction offset. Also, while displaying tracebacks may be a good scenario for the speed tradeoff, it may not be for other tools. Finally, if there is no file available all this information is lost, although one could argue that then is not extremely useful... Regards from sunny London, Pablo Galindo Salgado On Tue, 18 May 2021, 01:43 Nathaniel Smith, <njs@pobox.com> wrote:

Hummmm, actually another problem of this approach: Nodes are created and modified after the optimization pass, so the AST produced by the parser is not enough to reconstruct the actual information, we need to also run the optimization passes, but unfortunately, this is (by design) not don't in the Python API (to preserve all possible information about the original code), so this complicates quite a lot the API to get this, as `ast.parse` is not enough to get the original tree. On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

Yet another problem that I found: One integer is actually not enough to assign IDs. One unsigned integer can cover 4,294,967,295 AST nodes, but is technically possible to have more than that in a single file. While in PEP 657 we are tracking offsets that are normally very low < 100 typically or end lines that are easily compressible (as end lines are normally equal to the start of the line), node IDs can be arbitrarily big. Which is worse: the parser creates some AST nodes that throw away if the parser fails (that's how PEG parsers work), so there will be a lot of IDs that don't get assigned (and the parser doesn't have an easy way to know how many IDs has thrown away). This forces us to either use something bigger than an integer (probably a size_t) or to deal with overflow. On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

One integer is actually not enough to assign IDs.
Actually, disregard this particular problem. I think that we could perfectly stop assigning IDs if we reach the overflow limit and call it a day since you need to have a truly horrendous file to reach 4,294,967,295 AST nodes (I did some tests to check this). On Tue, 18 May 2021 at 13:25, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

Ok, we have implemented a rough prototype and we have decided not to go with this for the following reasons: * Is almost the same storage cost as the solution we already have. Since storing the node id cost 4 bytes (an integer) per instruction and our solution needs 2+ bytes per instruction for the offsets (normally just 2 as offsets are generally < 256) + the compressed end line, which is very small since is almost always the same as the start line. * It actually doesn't have more advantages. The current solution in the PEP can do exactly the same as this solution if you allow reparsing when displaying tracebacks. This is because with the start line, end line, start offset and end offset and the original file, you can extract the source that is associated with the instruction, parse it (and this is much faster because you just need to parse the tiny fragment) and then you get an AST node that you can use for whatever you want. * The proposed solution forces to reparse several times entire files just to extract a single AST node. Even worse: for frames in the same file it forces to reparse those again and again unless you complicate the implementation by adding AST caching. But even that it won't be free as it will incur in quite a lot of extra memory and this is problematic, especially when displaying exceptions as memory can be low, which the current design takes into account. * Nodes are created and modified after the optimization pass, so the AST produced by the parser is not enough to reconstruct the actual information, we need to also run the optimization passes, but unfortunately, this is (by design) not don't in the Python API (to preserve all possible information about the original code), so this complicates quite a lot the API to get this, as `ast.parse` is not enough to get the original tree. * The implementation is quite more complex then the one the PEP has since to do it right implies having to hash the source files, implement node id propagation in the parser and a node visitor to find the correct node + reparsing in the traceback. * It makes the AST (both the C one and the Python one) bigger, which increases the memory costs of parsing and very slightly affects performance (we measured a 4% decrease in perf to add the new field). * It makes the API for 3rd party tools very non-straightforward, forcing reparsing and finding the AST nodes. Even if we provide some new functionality in the ast module or similar to make this easier, it quite a lot of overhead just to get the position information. Even if is possible to solve many of these problems, we think the complexity is not worth it, especially since it actually doesn't give more functionality. On Tue, 18 May 2021 at 13:47, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

On Tue, May 18, 2021 at 2:49 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Excellent point! Do you know how reliable this is in practice, i.e. what proportion of bytecode source spans are something you can successfully pass to ast.parse? If it works it's obviously nicer, but I can't tell how often it works. E.g. anything including return/break/continue/yield/await will fail, since those require an enclosing context to be legal. I doubt return/break/continue will raise exceptions often, but yield/await do all the time. You could kluge it by wrapping the source span in a dummy 'async def' before parsing, since that makes yield/await legal, but OTOH it makes 'yield from' and 'from X import *' illegal. I guess you could have a helper that attempts passing the string to ast.parse, and if that fails tries wrapping it in a loop/sync def/async def/etc. until one of them succeeds. Maybe that would be a useful utility to add to the traceback module? Or add a PyCF_YOLO flag that tries to make sense of an arbitrary out-of-context string. (Are there any other bits of syntax that require specific contexts that I'm not thinking of? If __enter__/__exit__ raise an exception, then what's the corresponding span? The entire 'with' block, or just the 'with' line itself?) -n PS: this is completely orthogonal to PEP 657, but if you're excited about making tracebacks more readable, another piece of low-hanging fruit would be to print method __qualname__s instead of __name__s in the traceback output. The reason we don't do that now is that __qualname__ lives on the function object, but in a traceback, we can't get the function object. The traceback only has access to the code object, and the code object doesn't have __qualname__, just __name__. Probably the cleanest way to do this would be to make the traceback or code object have a pointer back to the function object. See also https://bugs.python.org/issue12857. -- Nathaniel J. Smith -- https://vorpus.org

On Wed, May 19, 2021 at 7:28 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Ah, nice! I guess I was confused by memories of the behavior in 3.6 and earlier, where 'await' was a pseudokeyword: ❯ docker run -it --rm python:3.6-alpine
Hopefully if we add more pseudokeywords in the future it won't break the ability to parse traceback spans. -n -- Nathaniel J. Smith -- https://vorpus.org

Hopefully if we add more pseudokeywords in the future it won't break the ability to parse traceback spans.
It won't because soft keywords are now handled natively with the peg parser (as "match" and "case" now) instead of hacked into the tokenizer :) On Fri, 21 May 2021 at 01:55, Nathaniel Smith <njs@pobox.com> wrote:
participants (8)
-
Ammar Askar
-
Brandt Bucher
-
Christopher Barker
-
Ethan Furman
-
Larry Hastings
-
Mark Shannon
-
Nathaniel Smith
-
Pablo Galindo Salgado