We are preparing a PEP and we would like to start some early discussion
about one of the main aspects of the PEP.
The work we are preparing is to allow the interpreter to produce more
fine-grained error messages, pointing to
the source associated to the instructions that are failing. For example:
Traceback (most recent call last):
File "test.py", line 14, in <module>
File "test.py", line 12, in lel3
return lel2(x) / 23
File "test.py", line 9, in lel2
return 25 + lel(x) + lel(x)
File "test.py", line 6, in lel
return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
TypeError: 'NoneType' object is not subscriptable
The cost of this is having the start column number and end column number
information for every bytecode instruction
and this is what we want to discuss (there is also some stack cost to
re-raise exceptions but that's not a big problem in
any case). Given that column numbers are not very big compared with line
numbers, we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library and
we found that the overhead of all pyc files is:
* If we use shorts, the total overhead is ~3% (total size 28MB and the
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and the
extra size is 0.44MB).
One of the disadvantages of using chars is that we can only report columns
from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the
highlighting) for that frame. Unsigned short will allow
the values to go from 0 to 65535.
Unfortunately these numbers are not easily compressible, as every
instruction would have very different offsets.
There is also the possibility of not doing this based on some build flag on
when using -O to allow users to opt out, but given the fact
that these numbers can be quite useful to other tools like coverage
measuring tools, tracers, profilers and the such adding conditional
logic to many places would complicate the implementation considerably and
will potentially reduce the usability of those tools so we prefer
not to have the conditional logic. We believe this is extra cost is very
much worth the better error reporting but we understand and respect
other points of view.
Does anyone see a better way to encode this information **without
complicating a lot the implementation**? What are people thoughts on the
Thanks in advance,
Regards from cloudy London,
Pablo Galindo Salgado