[Python-ideas] Hooking between lexer and parser

Nick Coghlan ncoghlan at gmail.com
Sun Jun 7 07:59:13 CEST 2015

On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas
<python-ideas at python.org> wrote:
> Also, if we got my change, I could write code that cleanly hooks parsing in
> 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can
> at least use it, and all of the relevant and complicated code would be
> shared between the two versions. With your change, I'd have to write code
> that was completely different for 3.6+ than what I could backport, meaning
> I'd have to write, debug, and maintain two completely different
> implementations. And again, for no benefit.

I don't think I've said this explicitly yet, but I'm +1 on the idea of
making it easier to "hack the token stream". As Andew has noted, there
are two reasons this is an interesting level to work at for certain
kinds of modifications:

1. The standard Python tokeniser has already taken care of converting
the byte stream into Unicode code points, and the code point stream
into tokens (including replacing leading whitespace with the
structural INDENT/DEDENT tokens)

2. You get to work with a linear stream of tokens, rather than a
precomposed tree of AST nodes that you have to traverse and keep

If all you're wanting to do is token rewriting, or to push the token
stream over a network connection in preference to pushing raw source
code or fully compiled bytecode, a bit of refactoring of the existing
tokeniser/compiler interface to be less file based and more iterable
based could make that easier to work with.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-ideas mailing list