On Sun, Apr 5, 2020 at 5:16 PM Greg Ewing firstname.lastname@example.org wrote:
On 6/04/20 4:48 am, Guido van Rossum wrote:
There's no need to worry about this: in almost all cases the error indicator points to the same spot in the source code as with the old parser.
I'm curious about how that works. From the description in the PEP, it seems that none of the individual parsing functions can report an error, because there might be another branch higher up that succeeds. Does it keep track of the maximum distance it got through the source or something like that?
I guess you could call it that. There is a small layer of abstraction between the actual tokenizer (which cannot go back) and the generated parser functions. This abstraction buffers tokens. When a parser function wants a token it calls into this abstraction, and that either satisfies it from its buffer, or if there is no lookahead in the buffer left, calls the actual tokenizer. When a parser function fails, it calls into the abstraction layer to back up to a previous point (which I call the "mark").
(A simplified version of this layer is shown in my blog post, https://medium.com/@gvanrossum_83706/building-a-peg-parser-d4869b5958fb -- the class Tokenizer.)
When an error bubbles all the way up, we report a SyntaxError pointing to the farthest token that the abstraction has buffered (self.pos in the blog post).