On Mon, May 17, 2021 at 6:18 AM Mark Shannon <mark@hotpy.org> wrote:
> 2. Repeated binary operations on the same line.
>
> A single location can also be clearer when all the code is on one line.
>
> i1 + i2 + s1
>
> PEP 657:
>
> i1 + i2 + s1
> ^^^^^^^^^^^^
>
> Using a single location:
>
> i1 + i2 + s1
> ^

It's true this case is a bit confusing with the whole operation span highlighted, but I'm not sure the single location version is much better. I feel like a Really Good UI would like, highlight the two operands in different colors or something, or at least underline the two separate items whose type is incompatible separately:

TypeError: unsupported operand type(s) for +: 'int' + 'str':

i1 + i2 + s1

^^^^^^^ ~~

More generally, these error messages are the kind of thing where the UI can always be tweaked to improve further, and those tweaks can make good use of any rich source information that's available.

So, here's another option to consider:

- When parsing, assign each AST node a unique, deterministic id (e.g. sequentially across the AST tree from top-to-bottom, left-to-right).

- For each bytecode offset, store the corresponding AST node id in an lnotab-like table

- When displaying a traceback, we already need to go find and read the original .py file to print source code at all. Re-parse it, and use the ids to find the original AST node, in context with full structure. Let the traceback formatter do whatever clever stuff it wants with this info.

Of course if the .py and .pyc files don't match, this might produce gibberish. We already have that problem with showing source lines, but it might be even more confusing if we get some random unrelated AST node. This could be avoided by storing some kind of hash in the code object, so that we can validate the .py file we find hasn't changed (sha512 if we're feeling fancy, crc32 if we want to save space, either way is probably fine).

This would make traceback printing more expensive, but only if you want the fancy features, and traceback printing is already expensive (it does file I/O!). Usually by the time you're rendering a traceback it's more important to optimize for human time than CPU time. It would take less memory than PEP 657, and the same as Mark's proposal (both only track one extra integer per bytecode offset). And it would allow for arbitrarily rich traceback display.

(I guess in theory you could make this even cheaper by using it to replace lnotab, instead of extending it. But I think keeping lnotab around is a good idea, as a fallback for cases where you can't find the original source but still want some hint at location information.)

-n

--
Nathaniel J. Smith -- https://vorpus.org