On Fri, Apr 27, 2018 at 5:58 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Fri, Apr 27, 2018 at 9:27 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> I don't think this needs any specific compiler magic or making 'dp' a
> reserved name, but it might well be a lot easier to write if there
> were some compiler features provided to _all_ functions. For instance,
> column positions are currently available in SyntaxErrors, but not
> other exceptions:
>
>>>> x = 1
>>>> print("spam" + x)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: can only concatenate str (not "int") to str
>>>> print("spam" : x)
> File "<stdin>", line 1
> print("spam" : x)
> ^
> SyntaxError: invalid syntax
>
> Imagine if the TypeError could show a caret, pointing to the plus
> sign. That would require that a function store column positions, not
> just line numbers. I'm not sure how much overhead it would add, nor
> how much benefit you'd really get from those markers, but it would
> then be the same mechanic for exception tracebacks and for
> semi-magical functions like this.

Being able to add carets to tracebacks in general would be quite nice actually. Imagine:

Traceback (most recent call last):
File "/tmp/blah.py", line 16, in <module>
    print(foo())
      ^^^^^
File "/tmp/blah.py", line 6, in foo
    return bar(1) + bar(2)
                    ^^^^^^
File "/tmp/blah.py", line 10, in bar
    return baz(2 * x) / baz(2 * x + 1)
           ^^^^^^^^^^
File "/tmp/blah.py", line 13, in baz
    return 1 + 1 / (x - 4)
   ^^^^^^^^^^^
ZeroDivisionError: division by zero

This is how I report error messages in patsy[1], and people seem to appreciate it... it would also help Python catch back up with other languages whose error reporting has gotten much friendlier in recent years (e.g., rust, clang).

Threading column numbers through the compiler might be tedious but AFAICT should be straightforward in principle. (Peephole optimizations and similar might be a bit of a puzzle, but you can do pretty crude things like saying new_span_start = min(*old_span_starts); new_span_end = max(*old_span_ends) and still get something that's at least useful, even if not necessarily 100% theoretically accurate.) The runtime overhead would be essentially zero, since this would be a static table that only gets consulted when printing tracebacks, similar to the lineno table. (Tracebacks already preserve f_lasti.)

So I think the main issue would be the extra memory in each code object to hold the bytecode offset -> column numbers table. We'd need some actual numbers to judge this for real, but my guess is that the gain in usability+friendliness would be easily worth it for 99% of users, and the other 1% are already plotting how to add options to strip out unnecessary things like type annotations so if it's a problem then this could be another thing for them to add to their list – leave out these tables at -OOO or whatever.

-n

[1] https://patsy.readthedocs.io/en/latest/overview.html

--
Nathaniel J. Smith -- https://vorpus.org