On 2021-05-07 22:45, Pablo Galindo Salgado wrote:
Hi there,
We are preparing a PEP and we would like to start some early discussion about one of the main aspects of the PEP.
The work we are preparing is to allow the interpreter to produce more fine-grained error messages, pointing to the source associated to the instructions that are failing. For example:
Traceback (most recent call last):
File "test.py", line 14, in <module>
lel3(x)
^^^^^^^
File "test.py", line 12, in lel3
return lel2(x) / 23
^^^^^^^
File "test.py", line 9, in lel2
return 25 + lel(x) + lel(x)
^^^^^^
File "test.py", line 6, in lel
return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
The cost of this is having the start column number and end column number information for every bytecode instruction and this is what we want to discuss (there is also some stack cost to re-raise exceptions but that's not a big problem in any case). Given that column numbers are not very big compared with line numbers, we plan to store these as unsigned chars or unsigned shorts. We ran some experiments over the standard library and we found that the overhead of all pyc files is:
* If we use shorts, the total overhead is ~3% (total size 28MB and the extra size is 0.88 MB). * If we use chars. the total overhead is ~1.5% (total size 28 MB and the extra size is 0.44MB).
One of the disadvantages of using chars is that we can only report columns from 1 to 255 so if an error happens in a column bigger than that then we would have to exclude it (and not show the highlighting) for that frame. Unsigned short will allow the values to go from 0 to 65535.
[snip]How common are lines are longer than 255 characters, anyway? One thought: could the stored column position not include the indentation? Would that help?