[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

8 May 2021

      On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
wrote:
...
Although we were originally not sympathetic with it, we may need to offer
an opt-out mechanism for those users that care about the impact of the
overhead of the new data in pyc files
and in in-memory code objectsas was suggested by some folks (Thomas, Yury,
and others). For this, we could propose that the functionality will be
deactivated along with the extra
information when Python is executed in optimized mode (``python -O``) and
therefore pyo files will not have the overhead associated with the extra
required data.
Just to be clear, .pyo files have not existed for a while:
https://www.python.org/dev/peps/pep-0488/.
...
Notice that Python
already strips docstrings in this mode so it would be "aligned" with the
current mechanism of optimized mode.
This only kicks in at the -OO level.
...
Although this complicates the implementation, it certainly is still much
easier than dealing with compression (and more useful for those that don't
want the feature). Notice that we also
expect pessimistic results from compression as offsets would be quite
random (although predominantly in the range 10 - 120).
I personally prefer the idea of dropping the data with -OO since if you're
stripping out docstrings you're already hurting introspection capabilities
in the name of memory. Or one could go as far as to introduce -Os to do -OO
plus dropping this extra data.

As for .pyc file size, I personally wouldn't worry about it. If someone is
that space-constrained they either aren't using .pyc files or are only
shipping a single set of .pyc files under -OO and skipping source code. And
.pyc files are an implementation detail of CPython so there  shouldn't be
too much of a concern for other interpreters.

-Brett
...
On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
wrote:
...
One last note for clarity: that's the increase of size in the stdlib, the
increase of size
for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
of 22%.
On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
wrote:
...
Some update on the numbers. We have made some draft implementation to
corroborate the
numbers with some more realistic tests and seems that our original
calculations were wrong.
The actual increase in size is quite bigger than previously advertised:
Using bytes object to encode the final object and marshalling that to
disk (so using uint8_t) as the underlying
type:
BEFORE:
❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
70M     Lib
70M     total
AFTER:
❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
76M     Lib
76M     total
So that's an increase of 8.56 % over the original value. This is storing
the start offset and end offset with no compression
whatsoever.
On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
wrote:
...
Hi there,
We are preparing a PEP and we would like to start some early discussion
about one of the main aspects of the PEP.
The work we are preparing is to allow the interpreter to produce more
fine-grained error messages, pointing to
the source associated to the instructions that are failing. For example:
Traceback (most recent call last):
File "test.py", line 14, in <module>
lel3(x)
^^^^^^^
File "test.py", line 12, in lel3
return lel2(x) / 23
^^^^^^^
File "test.py", line 9, in lel2
return 25 + lel(x) + lel(x)
^^^^^^
File "test.py", line 6, in lel
return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
The cost of this is having the start column number and end
column number information for every bytecode instruction
and this is what we want to discuss (there is also some stack cost to
re-raise exceptions but that's not a big problem in
any case). Given that column numbers are not very big compared with
line numbers, we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library
and we found that the overhead of all pyc files is:
* If we use shorts, the total overhead is ~3% (total size 28MB and the
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and
the extra size is 0.44MB).
One of the disadvantages of using chars is that we can only report
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the
highlighting) for that frame. Unsigned short will allow
the values to go from 0 to 65535.
Unfortunately these numbers are not easily compressible, as every
instruction would have very different offsets.
There is also the possibility of not doing this based on some build
flag on when using -O to allow users to opt out, but given the fact
that these numbers can be quite useful to other tools like coverage
measuring tools, tracers, profilers and the such adding conditional
logic to many places would complicate the implementation considerably
and will potentially reduce the usability of those tools so we prefer
not to have the conditional logic. We believe this is extra cost is
very much worth the better error reporting but we understand and respect
other points of view.
Does anyone see a better way to encode this information **without
complicating a lot the implementation**? What are people thoughts on the
feature?
Thanks in advance,
Regards from cloudy London,
Pablo Galindo Salgado
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TY...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Brett Cannon