[Python-ideas] Hooking between lexer and parser

Mon Jun 8 05:18:36 CEST 2015

On Jun 7, 2015, at 19:52, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> Andrew is essentially only proposing relatively minor tweaks to the
> API of the existing tokenizer module to make it more iterable based
> and less file based (while still preserving the file based APIs).

And also a patch to the existing ast module to allow it to handle tokenizers from Python as well as from C. The tokenizer tweaks themselves are just to make that easier (and to make using tokenizer a little simpler even if you don't feed it directly to the parser).

(It surprised me that the C-level tokenizer actually can take C strings and string objects rather than file objects, but once you think about how the high-level C API stuff like being able to exec a single line must work, it's pretty obvious why that was added...)

> Eugene Toder's and Dave Malcolm's patches from a few years ago make
> the existing AST -> bytecode section of the toolchain easier to modify
> and experiment with (and are ideas worth exploring for 3.6 if anyone
> is willing and able to invest the time to bring them back up to date).

I got a chance to take a look at this, and, while it seems completely orthogonal to what I'm trying to do, it also seems very cool. If someone got the patches up to date for the trunk and fixed the minor issues involved in the last review (both of which look pretty simple), what are the chances of getting it reviewed for 3.6? (I realize this is probably a better question for the issue tracker or the -dev list than buried in the middle of a barely-relevant -ideas thread, but I'm on my phone here, and you brought it up.:)