On 6/04/20 4:48 am, Guido van Rossum wrote:
> There's no need to worry about this: in almost all cases the error
> indicator points to the same spot in the source code as with the old
> parser.
I'm curious about how that works. From the description in the PEP,
it seems that none of the individual parsing functions can report
an error, because there might be another branch higher up that
succeeds. Does it keep track of the maximum distance it got through
the source or something like that?
I guess you could call it that. There is a small layer of abstraction between the actual tokenizer (which cannot go back) and the generated parser functions. This abstraction buffers tokens. When a parser function wants a token it calls into this abstraction, and that either satisfies it from its buffer, or if there is no lookahead in the buffer left, calls the actual tokenizer. When a parser function fails, it calls into the abstraction layer to back up to a previous point (which I call the "mark").
When an error bubbles all the way up, we report a SyntaxError pointing to the farthest token that the abstraction has buffered (self.pos in the blog post).