
Hi, Apologies for my tardiness in doing this, but no one explicitly said it was too late to critique PEP 657... Critique of PEP 657 (Include Fine Grained Error Locations in Tracebacks) ------------------------------------------------------------------------ First of all I want to say that I support the goal of improving error messages. IMO the PEP should be "accepted in principle". I think all of the issues below can be fixed while still supporting the general aims of the PEP. The change from points to ranges as locations --------------------------------------------- Because Python is a procedural language, there is an expectation that code executes in a certain order. PEP 626 (Precise line numbers for debugging and other tools) seeks to guarantee that that expectation is met. PEP 657 describes how locations for exceptions are to be handled, but is vague on the treatment of locations for tracing, profiling and debugging. PEP 657 proposes that locations for exceptions be treated as ranges, whereas tracing, profiling and debugging currently treat locations as points. Either this will end in contradictions and confusion should those locations disagree, or the locations for tracing, profiling and debugging must change. Using the start of a range as the point location for tracing may be misleading when the operation that causes an exception is on a different line within that range. Consider this example: https://github.com/python/cpython/blob/main/Lib/test/test_compile.py#L861 This might seem like a contrived case, but it is based on a real bug report https://bugs.python.org/issue39316 1. def load_method(): 2. return ( 3. o. 4. m( 5. 0 6. ) Currently the call is traced on line 4. PEP 657 would change the location of the call from line 4 to the range 3-6, which would mean that the line of call is no longer traced separately (or traced several times). PEP 657 makes no mention of this change. The PEP claims that these changes are improvements. Maybe they are, but they are quite impactful changes which the PEP glosses over. The impact on tools like coverage.py and debuggers should be made clearer. For example, how would one set a breakpoint on line 4 above? There are other languages (e.g. jinja templates) that compile to Python AST and bytecode. These *might* produce locations that overlap, but are not nested. The behavior of tracing and debuggers needs to be described for those locations. Backwards Compatibility ----------------------- PEP 657 claims it is fully backwards compatible, but it cannot be both backwards compatible and consistent. There are fundamental differences between using ranges and points as locations. Impact on startup time ---------------------- The PEP 657 suggests the impact on startup would be negligible. That is not quite true. The impact on startup is probably acceptable, but a proper analysis needs to be made. The increase in size of pyc files ~20% puts an upper bound on the increase of startup time, but I would expect it to be much less than that as loading files from disk is only a fraction of startup. Currently, startup is dominated by inefficiencies in interpreter creation, unmarshalling and module loading. We plan to reduce these a lot for 3.11, so that the impact of PEP 657 on startup will be larger (as a ratio) than experiments with 3.10 suggest. The API ------- The C API adds three new functions, one each for the end line, start column and end column. This is either slow, as any compressed table needs to be parsed four times, or space inefficient using an uncompressed table. Opt-out ------- Allowing opt-out prevents consistent compression of location data, resulting in larger pyc files for those that do not opt-out. The exact semantics, in terms of error formatting, tracing, etc is not described should the user opt-out. Summary ------- Overall, there is nothing that blocks acceptance of the PEP in principle, but there are quite a few issues that need resolving. Suggestions ----------- 1. Clarify, in detail, the impact on line-based tools like profilers, coverage.py and debuggers. This should include help on how to use the new APIs and where using the old APIs might result in behavioral changes. 2. Change the C API to a single function: int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int *startcolumn, int *endline, int *endcolumn) 3. Drop the opt-out option. If the extra information is optional, then the compression scheme must allow for that; making the code more complex and potentially less efficient. Does opting out use the start of the range, or the old line, as the location? 4. Drop the limitation on column offsets. The data needs to be compressed anyway, so allowing arbitrary column offsets is effectively free. 6. Store all location information in a single table (this applies more to the implementation than the PEP) Using four separate objects to hold the location info adds a lot of overhead for most functions. Cheers, Mark.

Hello Mark, Thanks for writing this email. I do appreciate your effort and your passion for trying to improve Python and this work, but I have to say that I am a bit frustrated with how are you dealing with this and that unfortunately, I have to admit that this is absorbing too much emotional energy. On the other hand, I would love it if you could create some Pull Requests after the initial implementation is done and help us improving the base implementation with a better format and efficient implementations.
PEP 657 describes how locations for exceptions are to be handled, but is vague on the treatment of locations for tracing, profiling and debugging.
That is because the pep is called "Include Fine-Grained Error Locations in Tracebacks", we are interested in fixing that particular problem and by request of many tool authors we have exposed the information as it turns out is quite useful for them, but we don't want to specify how the information will be used or the semantics for tracing profiling and debugging. The semantic of that information is already there (the same one attached to AST offsets) and we are not interested in changing it or altering it, just (as the PEP mentions), propagate it. We are not interested in creating more exhaustive contracts for the existing information (AST offsets).
The impact on tools like coverage.py and debuggers should be made clearer. For example, how would one set a breakpoint on line 4 above?
There is no impact on existing tools because we are not changing previous APIs, every API in the PEP is new so these changes do not affect existing tools. Existing tools could use the new APIs if they wish taking into account that these numbers have the exact semantics as the ones they have in the AST, which we believe is almost always what they expect. I understand that you may have a different view or disagree with how we view things and that is fine.
The behavior of tracing and debuggers needs to be described for those locations.
No, there is no reason we need to do that description. That's not the scope in the PEP, we only propagate AST offsets that can be consumed with the same semantics as the one currently publicly available on the AST. And this is not the primary focus of the pep, which is, again, mainly focused on "Include Fine-Grained Error Locations in Tracebacks". I will respect if you disagree with this vision, but that is our vision and that is how we have written the PEP.
PEP 657 claims it is fully backwards compatible, but it cannot be bothnbackwards compatible and consistent. There are fundamental differences between using ranges and points as locations.
This PEP is fully backward compatible because is not changing any existing API, is only adding new APIs. People that want "points" can use the same existing APIs that have been used before without any problems because they are there exactly as they were. Is absolutely fine if you have a different view on what is backwards compatible, but regarding our backwards compatibility policy and PEP 387, this is backwards compatible. The PEP 657 suggests the impact on startup would be negligible. That is
not quite true.
We already answered this in the other thread you created with exhaustive benchmarks. I don't think we have anything useful to add here. Regarding your suggestions: 2. Change the C API to a single function:
int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int
*startcolumn, int *endline, int *endcolumn)
This is very reasonable, so I think we can change the PEP and the implementation to do this. 3. Drop the opt-out option.
If the extra information is optional, then the compression scheme must allow for that; making the code more complex and potentially less efficient. Does opting out use the start of the range, or the old line, as the location?
Not possible. The opt-out option is important for a lot of people and, to the point I understand a very likely requirement for the PEP to be accepted. 4. Drop the limitation on column offsets.
The data needs to be compressed anyway, so allowing arbitrary column offsets is effectively free.
The limitation can be dropped in the future without problems. This is covered in the PEP: *>>> As specified previously, the underlying storage of the offsets should be considered an implementation detail, as the public APIs to obtain this values will return either C int types or Python int objects, which allows to implement better compression/encoding in the future if bigger ranges would need to be supported. * For the first implementation, we are not interested in dealing with compression or other heavy optimizations, but if you are interested you are more than welcome to improve upon and drop it in the future. 6. Store all location information in a single table (this applies more
to the implementation than the PEP) Using four separate objects to hold the location info adds a lot of overhead for most functions.
The internal representation of the data is an implementation detail. Our plan is to go with the most straightforward implementation first using separate tables, but if you see an opportunity for improvement and you have some good ideas to merge them into a single table, then you are more than welcome to make a Pull Request improving upon the initial version. Regards from cloudy London, Pablo Galindo Salgado

Hi Mark, Thank you for the feedback. Let me address/elaborate some of the points that Pablo touched on.
PEP 657 proposes that locations for exceptions be treated as ranges, whereas tracing, profiling and debugging currently treat locations as points.
I don't think we're making strong claims that the full `(line, end_line, column, end_column)` should be the canonical representation for exception locations. The only concrete place we suggest their usage is in the printing of tracebacks. The information is not exposed in the exception or traceback objects intentionally as part of this. The place we make a reference to non-traceback tooling being able to use this information is coverage tools being able to perform expression-level granularity in coverage. As a quick example consider: x = True or f() might be marked as covered by a line coverage tool but this PEP potentially exposes extra information that might help show that the function call is not covered.
Consider this example: https://github.com/python/cpython/blob/main/Lib/test/test_compile.py#L861
And this example continues to work just as it does right now. There is no change to the tracing apis or the existing co_lines method. I think your concern is if tracing tools switched to using PEP 657 (which they are not obligated to), but even that case works out: co_lines() returns (6, 8, 4) for the CALL_METHOD. co_positions() from PEP 657 returns (4, 6, 5, 6) corresponding to (line, end_line, column, end_column) for the CALL_METHOD.
For example, how would one set a breakpoint on line 4 above?
Just as they would right now, they don't need to change how they set breakpoints.
PEP 657 claims it is fully backwards compatible, but it cannot be both backwards compatible and consistent.
I think there's a misunderstanding on what backwards compatibility means between us, can you concretely explain how this PEP or its implementation would break existing tools? I understand your concerns about having two potentially conflicting APIs for source locations but that is not a backwards compatibility problem.
1. Clarify, in detail, the impact on line-based tools like profilers, coverage.py and debuggers. This should include help on how to use the new APIs and where using the old APIs might result in behavioral changes.
As mentioned, we don't have an expectation for line-based tools to switch to the new API. Its primary consumer is meant to be the traceback mechanism. Usage of the old APIs will and must not lead to any behavioral changes.
2. Change the C API to a single function: int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int *startcolumn, int *endline, int *endcolumn)
Thank you, this is a great suggestion.
3. Drop the opt-out option. If the extra information is optional, then the compression scheme must allow for that; making the code more complex and potentially less efficient. Does opting out use the start of the range, or the old line, as the location?
In the future if a fantastic compression scheme is figured out that requires both the end lines and column information, I think it would be acceptable to make the opt-out only suppress the printing of the carets in the traceback while still maintaining the data in the code objects. This would still be backwards compatible.
4. Drop the limitation on column offsets. The data needs to be compressed anyway, so allowing arbitrary column offsets is effectively free.
Sure, I think there were plenty of good ideas thrown around compression and lazy-loading in the last PEP 657 thread so this is more of just a soft-limitation of the current implementation of the PEP. This limit can be increased in the future without any changes in the API or hurting backwards compatibility.
6. Store all location information in a single table (this applies more to the implementation than the PEP) Using four separate objects to hold the location info adds a lot of overhead for most functions.
I would just like to cap off and address this point together. The PEP is meant primarily to aid with debugging and improve tracebacks. The API is designed to be very limited and allow for a lot of room for expansion and optimization. It does not make strong prescriptive claims that tools must switch to the new richer information as your mail seems to suggest. The other smaller aspects like the internal storage formats, not storing data when opted out are concerns that can be addressed without making any breaking changes.

On 6/30/2021 12:30 PM, Ammar Askar wrote:
I don't think we're making strong claims that the full `(line, end_line, column, end_column)` should be the canonical representation for exception locations. The only concrete place we suggest their usage is in the printing of tracebacks.
sys.__excepthook__ will have access to the information, but will censor it for long lines or slices that span more than one line.
The information is not exposed in the exception or traceback objects intentionally as part of this.
Then how will modules that customizes traceback presentation, such as idlelib, be able to get the 4-tuple for a particular traceback entry? This seems like a repeat of attribute and name error name hints being not accessible from the traceback module and only accessible from Python via a workaround such as in idlelib.run: 218: if typ in (AttributeError, NameError): # 3.10+ hints are not directly accessible from python (#44026). err = io.StringIO() with contextlib.redirect_stderr(err): sys.__excepthook__(typ, exc, tb) return [err.getvalue().split("\n")[-2] + "\n"] As Pablo explained on #44026, the length hints are not part of the tb object because they are expensive to compute and are not useful when AttributeError and NameError are expected and are caught in code for flow control purposes. Therefore the hints are only computed when an uncaught exception is displayed to users. However, the position information is already computed and it is just a matter of passing it along *all the way* to the python coder. For slices within normal lines, the new caret line can be turned back into position, as IDLE does now for SyntaxErrors. But that does not work when the caret represents a truncated slice. The PEP says co_positions will be added code objects but makes no mention of an API for accessing information within it. And that still leaves the issue of getting the code object. Summary: the position information would be much more useful if it were added to traceback items and if the traceback functions then exposed it. Note: the current co_lines and co_linetable seems to be undocumented -- no index entries, nothing in https://docs.python.org/3.11/reference/datamodel.html#index-56 and no docstring for co_lines. So I have no idea how these work. Note 2: "Opt-out mechanism" says new environmental variable, new -Xnodebugranges flag. "Have a configure flag to opt out" says "we have decided to the -O flag". I presume the latter is obsolete. -- Terry Jan Reedy

Hi Terry, Thanks for your message!
Then how will modules that customizes traceback presentation, such as idlelib, be able to get the 4-tuple for a particular traceback entry?
try: ... 1/0 ... except Exception as e: ... f = e ... list(f.__traceback__.tb_frame.f_code.co_positions()) [(1, 4, 1, 8), (2, 2, 3, 4), (2, 2, 5, 6), (2, 2, 3, 6), (2, 2, 3, 6), (2, 4, 3, 8), (2, 4, 3, 8), (None, 2, 3, 6), (3, 4, 1, 8), (3, 3, 8, 17), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (4, 4, 7, 8), (4, 4, 3, 4), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (3, 4, 3, 8)] This is equivalent (and more efficient) to have this information in the
From the exception, you can get the code object and from the code object the extra information using the Python API: Example: traceback itself (as it would have been duplicated and would require more changes). We would document this a bit better with some examples. And we will make sure to add docs about this for sure :) Pablo On Wed, 30 Jun 2021 at 22:16, Terry Reedy <tjreedy@udel.edu> wrote:
On 6/30/2021 12:30 PM, Ammar Askar wrote:
I don't think we're making strong claims that the full `(line, end_line, column, end_column)` should be the canonical representation for exception locations. The only concrete place we suggest their usage is in the printing of tracebacks.
sys.__excepthook__ will have access to the information, but will censor it for long lines or slices that span more than one line.
The information is not exposed in the exception or traceback objects intentionally as part of this.
Then how will modules that customizes traceback presentation, such as idlelib, be able to get the 4-tuple for a particular traceback entry? This seems like a repeat of attribute and name error name hints being not accessible from the traceback module and only accessible from Python via a workaround such as in idlelib.run: 218:
if typ in (AttributeError, NameError): # 3.10+ hints are not directly accessible from python (#44026). err = io.StringIO() with contextlib.redirect_stderr(err): sys.__excepthook__(typ, exc, tb) return [err.getvalue().split("\n")[-2] + "\n"]
As Pablo explained on #44026, the length hints are not part of the tb object because they are expensive to compute and are not useful when AttributeError and NameError are expected and are caught in code for flow control purposes. Therefore the hints are only computed when an uncaught exception is displayed to users.
However, the position information is already computed and it is just a matter of passing it along *all the way* to the python coder.
For slices within normal lines, the new caret line can be turned back into position, as IDLE does now for SyntaxErrors. But that does not work when the caret represents a truncated slice.
The PEP says co_positions will be added code objects but makes no mention of an API for accessing information within it. And that still leaves the issue of getting the code object.
Summary: the position information would be much more useful if it were added to traceback items and if the traceback functions then exposed it.
Note: the current co_lines and co_linetable seems to be undocumented -- no index entries, nothing in https://docs.python.org/3.11/reference/datamodel.html#index-56 and no docstring for co_lines. So I have no idea how these work.
Note 2: "Opt-out mechanism" says new environmental variable, new -Xnodebugranges flag. "Have a configure flag to opt out" says "we have decided to the -O flag". I presume the latter is obsolete.
-- Terry Jan Reedy
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RQYVQXJX... Code of Conduct: http://python.org/psf/codeofconduct/

Also, notice we are extending the traceback module (in Python) to support this, so you probably can also leverage those changes so you don't need to mess with code objects yourself :) On Wed, 30 Jun 2021 at 22:29, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Hi Terry,
Thanks for your message!
Then how will modules that customizes traceback presentation, such as idlelib, be able to get the 4-tuple for a particular traceback entry?
From the exception, you can get the code object and from the code object the extra information using the Python API:
Example:
try: ... 1/0 ... except Exception as e: ... f = e ... list(f.__traceback__.tb_frame.f_code.co_positions()) [(1, 4, 1, 8), (2, 2, 3, 4), (2, 2, 5, 6), (2, 2, 3, 6), (2, 2, 3, 6), (2, 4, 3, 8), (2, 4, 3, 8), (None, 2, 3, 6), (3, 4, 1, 8), (3, 3, 8, 17), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (4, 4, 7, 8), (4, 4, 3, 4), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (3, 4, 3, 8)] This is equivalent (and more efficient) to have this information in the traceback itself (as it would have been duplicated and would require more changes).
We would document this a bit better with some examples. And we will make sure to add docs about this for sure :)
Pablo
On Wed, 30 Jun 2021 at 22:16, Terry Reedy <tjreedy@udel.edu> wrote:
On 6/30/2021 12:30 PM, Ammar Askar wrote:
I don't think we're making strong claims that the full `(line, end_line, column, end_column)` should be the canonical representation for exception locations. The only concrete place we suggest their usage is in the printing of tracebacks.
sys.__excepthook__ will have access to the information, but will censor it for long lines or slices that span more than one line.
The information is not exposed in the exception or traceback objects intentionally as part of this.
Then how will modules that customizes traceback presentation, such as idlelib, be able to get the 4-tuple for a particular traceback entry? This seems like a repeat of attribute and name error name hints being not accessible from the traceback module and only accessible from Python via a workaround such as in idlelib.run: 218:
if typ in (AttributeError, NameError): # 3.10+ hints are not directly accessible from python (#44026). err = io.StringIO() with contextlib.redirect_stderr(err): sys.__excepthook__(typ, exc, tb) return [err.getvalue().split("\n")[-2] + "\n"]
As Pablo explained on #44026, the length hints are not part of the tb object because they are expensive to compute and are not useful when AttributeError and NameError are expected and are caught in code for flow control purposes. Therefore the hints are only computed when an uncaught exception is displayed to users.
However, the position information is already computed and it is just a matter of passing it along *all the way* to the python coder.
For slices within normal lines, the new caret line can be turned back into position, as IDLE does now for SyntaxErrors. But that does not work when the caret represents a truncated slice.
The PEP says co_positions will be added code objects but makes no mention of an API for accessing information within it. And that still leaves the issue of getting the code object.
Summary: the position information would be much more useful if it were added to traceback items and if the traceback functions then exposed it.
Note: the current co_lines and co_linetable seems to be undocumented -- no index entries, nothing in https://docs.python.org/3.11/reference/datamodel.html#index-56 and no docstring for co_lines. So I have no idea how these work.
Note 2: "Opt-out mechanism" says new environmental variable, new -Xnodebugranges flag. "Have a configure flag to opt out" says "we have decided to the -O flag". I presume the latter is obsolete.
-- Terry Jan Reedy
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RQYVQXJX... Code of Conduct: http://python.org/psf/codeofconduct/

On 6/30/2021 5:30 PM, Pablo Galindo Salgado wrote:
Also, notice we are extending the traceback module (in Python) to support this, so you probably can also leverage those changes so you don't need to mess with code objects yourself :)
IDLE currently uses traceback.extract_tb and traceback.print_list. In between, it a) removes extra entries at both ends that result from running in IDLE, and b) adds code lines for shell entries. It does this in the user code execution process and send the resulting string tagged as stderr via the socket connection to the IDLE gui process. What I believe I would like is to have 'line n' of each frame entry replaced with a position 4-tuple, however formatted, and no caret line. IDLE would then use the position to tag the appropriate slice of the line. Currently, if the user right clicks on either of the two lines of pair, to see the line in its context in its file, IDLE opens the file in an editor if not open already and tags the entire line. If 'line n' were replaced with the slice info, it could instead tag that slice, either within a line or spanning multiple lines. Both would be improvements. Please add me as nosy to any appropriate issues/PRs so I have at least an opportunity to test and comment. -- Terry Jan Reedy

Hi Terry,
IDLE currently uses traceback.extract_tb and traceback.print_list
Awesome, that should work out perfectly then. Our current proof-of-concept implementation augments the traceback.FrameSummary class to include `end_lineno`, `colno` and `end_colno` attributes. We will make sure to add you as a reviewer on that change so you can give it an early shot at integrating with IDLE. Regards, Ammar

Then how will modules that customizes traceback presentation, such as idlelib, be able to get the 4-tuple for a particular traceback entry?
From the exception, you can get the code object and from the code object the extra information using the Python API:
Example:
try: ... 1/0 ... except Exception as e: ... f = e
list(f.__traceback__.tb_frame.f_code.co_positions()) [(1, 4, 1, 8), (2, 2, 3, 4), (2, 2, 5, 6), (2, 2, 3, 6), (2, 2, 3, 6), (2, 4, 3, 8), (2, 4, 3, 8), (None, 2, 3, 6), (3, 4, 1, 8), (3, 3, 8, 17), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (4, 4, 7, 8), (4, 4, 3, 4), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (3, 4, 3, 8)]
Ah, co_positions is an access method, corresponding to the current (also undocumented) co_lines. There will be no direct access to the position-table as it will be kept private and subject to change. The obvious first question: why 28 items and what does the index mean? When I compile the above with 3.10.0b3, there are 29 bytecodes, so I am guessing your branch has 28 and that the first number in the tuple is the line number. But how would one know which '2' entry corresponds to the divide in '1/0'. And what do the rest of the tuple numbers mean? I don't see anything like the (2,2, 2,4) I expect for '1/0'. To be documented, as you say below.
This is equivalent (and more efficient) to have this information in the traceback itself (as it would have been duplicated and would require more changes).
Understood and agreed.
We would document this a bit better with some examples. And we will make sure to add docs about this for sure :)
-- Terry Jan Reedy
participants (4)
-
Ammar Askar
-
Mark Shannon
-
Pablo Galindo Salgado
-
Terry Reedy