Mailman 3 Critique of PEP 657 - Python-Dev

June 30, 2021

      Hi,

Apologies for my tardiness in doing this, but no one explicitly said it 
was too late to critique PEP 657...

Critique of PEP 657 (Include Fine Grained Error Locations in Tracebacks)
------------------------------------------------------------------------

First of all I want to say that I support the goal of improving error 
messages. IMO the PEP should be "accepted in principle". I think all of 
the issues below can be fixed while still supporting the general aims of 
the PEP.

The change from points to ranges as locations
---------------------------------------------

Because Python is a procedural language, there is an expectation that 
code executes in a certain order. PEP 626 (Precise line numbers for 
debugging and other tools) seeks to guarantee that that expectation is met.

PEP 657 describes how locations for exceptions are to be handled, but is 
vague on the treatment of locations for tracing, profiling and debugging.

PEP 657 proposes that locations for exceptions be treated as ranges, 
whereas tracing, profiling and debugging currently treat locations as 
points.

Either this will end in contradictions and confusion should those 
locations disagree, or the locations for tracing, profiling and 
debugging must change.

Using the start of a range as the point location for tracing may be 
misleading when the operation that causes an exception is on a different 
line within that range.

Consider this example:
https://github.com/python/cpython/blob/main/Lib/test/test_compile.py#L861

This might seem like a contrived case, but it is based on a real bug 
report https://bugs.python.org/issue39316

1.  def load_method():
2.      return (
3.          o.
4.          m(
5.              0
6.          )

Currently the call is traced on line 4.

PEP 657 would change the location of the call from line 4 to the range 
3-6, which would mean that the line of call is no longer traced 
separately (or traced several times). PEP 657 makes no mention of this 
change.

The PEP claims that these changes are improvements. Maybe they are, but 
they are quite impactful changes which the PEP glosses over. The impact 
on tools like coverage.py and debuggers should be made clearer. For 
example, how would one set a breakpoint on line 4 above?

There are other languages (e.g. jinja templates) that compile to Python 
AST and bytecode. These *might* produce locations that overlap, but are 
not nested. The behavior of tracing and debuggers needs to be described 
for those locations.

Backwards Compatibility
-----------------------

PEP 657 claims it is fully backwards compatible, but it cannot be both 
backwards compatible and consistent.
There are fundamental differences between using ranges and points as 
locations.

Impact on startup time
----------------------

The PEP 657 suggests the impact on startup would be negligible. That is 
not quite true. The impact on startup is probably acceptable, but a 
proper analysis needs to be made.

The increase in size of pyc files ~20% puts an upper bound on the 
increase of startup time, but I would expect it to be much less than 
that as loading files from disk is only a fraction of startup.

Currently, startup is dominated by inefficiencies in interpreter 
creation, unmarshalling and module loading.
We plan to reduce these a lot for 3.11, so that the impact of PEP 657 on 
startup will be larger (as a ratio) than experiments with 3.10 suggest.

The API
-------

The C API adds three new functions, one each for the end line, start 
column and end column.
This is either slow, as any compressed table needs to be parsed four 
times, or space inefficient using an uncompressed table.

Opt-out
-------

Allowing opt-out prevents consistent compression of location data, 
resulting in larger pyc files for those that do not opt-out.
The exact semantics, in terms of error formatting, tracing, etc is not 
described should the user opt-out.

Summary
-------

Overall, there is nothing that blocks acceptance of the PEP in 
principle, but there are quite a few issues that need resolving.

Suggestions
-----------

1. Clarify, in detail, the impact on line-based tools like profilers, 
coverage.py and debuggers. This should include help on how to use the 
new APIs and where using the old APIs might result in behavioral changes.

2. Change the C API to a single function:
int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int 
*startcolumn, int *endline, int *endcolumn)

3. Drop the opt-out option.
If the extra information is optional, then the compression scheme must 
allow for that; making the code more complex and potentially less 
efficient. Does opting out use the start of the range, or the old line, 
as the location?

4. Drop the limitation on column offsets.
The data needs to be compressed anyway, so allowing arbitrary column 
offsets is effectively free.

6. Store all location information in a single table (this applies more 
to the implementation than the PEP)
Using four separate objects to hold the location info adds a lot of 
overhead for most functions.

Cheers,
Mark.

Critique of PEP 657

Mark Shannon

Pablo Galindo Salgado

Ammar Askar

Terry Reedy

Pablo Galindo Salgado

Pablo Galindo Salgado

Terry Reedy

Ammar Askar

Terry Reedy

tags

participants (4)