[Python-Dev] Simplify lnotab? (AST branch update)

Fri Oct 14 03:55:20 CEST 2005

At 02:25 PM 10/14/2005 +1300, Greg Ewing wrote:
>Phillip J. Eby wrote:
>
> > +1.  I'd be especially interested in lifting the current requirement
> > that line ranges and byte ranges both increase monotonically.  Even
> > better if the lines for a particular piece of code don't have to all
> > come from the same file.
>
>How about an array of:
>
>    +----------------+----------------+----------------+
>    | bytecode index |     file no.   |    line no.    |
>    +----------------+----------------+----------------+
>
>Entries are sorted by bytecode index, with each entry
>applying from that bytecode position up to the position
>of the next entry. The file no. indexes a tuple of file
>names attached to the code object. All entries are 32-bit
>integers.

The file number could be 16-bit - I don't see a use case for referring to 
65,000 different filenames.  ;)  But that doesn't save much space.

Anyway, in the common case, this scheme will use 10 more bytes per line of 
Python code, which translates to a megabyte or so for the standard 
library.  I definitely like the simplicity, but a meg's a meg.  A more 
compact scheme is possible, by using two tables - a bytecode->line number 
table, and a line number-> file table.  In the single-file case, you can 
omit the second table, and the first table then only uses 6 more bytes per 
line than we're currently using.  Not fantastic, but probably more acceptable.

If you have to encode multiple files, you just offset their line numbers by 
the size of the other files, and put entries in the line->file table to 
match.  When computing the line number, you subtract the matching entry in 
the line->file table to get the actual line number within that file.