[Python-ideas] Hooking between lexer and parser

Robert Collins robertc at robertcollins.net
Mon Jun 8 05:10:17 CEST 2015

On 8 June 2015 at 14:56, Neil Girdhar <mistersheik at gmail.com> wrote:
> On Sun, Jun 7, 2015 at 10:52 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>> However, if you're specifically wanting to work on an "ideal parser
>> API", then the reference interpreter for a 24 year old established
>> language *isn't* the place to do it - the compromises necessitated by
>> the need to align with an extensive existing ecosystem will actively
>> work against your goal for a clean, minimalist structure. That's thus
>> something better developed *independently* of CPython, and then
>> potentially considered at some point in the future when it's better
>> established what concrete benefits it would offer over the status quo
>> for both the core development team and Python's end users.
> That's not what I'm doing.  All I'm suggesting is that changes to Python
> that *preclude* the "ideal parser API" be avoided.  I'm not trying to make
> the ideal API happen today.  I'm just keeping the path to that rosy future
> free of obstacles.

I've used that approach in projects before, and in hindsight I realise
that I caused significant disruption doing that. The reason boils down
to - without consensus that the rosy future is all of:
 - the right future
 - worth doing eventually
 - more important to reach than solve problems that appear on the way

then you end up frustrating folk that have problems now, without
actually adding value to anyone: the project gets to choose between a
future that [worst case, fails all three tests] might not be right,
might not be worth doing, and is less important than actual problems
which it is stopping solutions for.

In this particular case, given Nick's comments about why we change the
guts here, I'd say that 'worth doing eventually' is not in consensus,
and I personally think that anything that is 'in the indefinite
future' is almost never more important than problems affecting people
today, because its a possible future benefit vs a current cost.
There's probably an economics theorem to describe that, but I'm not an
economist :)

Pragmatically, I propose that the existing structure already has
significant friction around any move to a unified (but still
multi-pass I presume) parser infrastructure, and so adding a small
amount of friction for substantial benefits will not substantially
impact the future work.

Concretely: a multi-stage parser with unified language for both lexer
and parser should be quite amenable to calling out to a legacy token
hook, without onerous impact. Failing that, we can follow the
deprecation approach when someone finds we can't do that, and after a
reasonable time remove the old hook. But right now, I think the onus
is on you to show that a shim wouldn't be possible, rather than
refusing to support adding a tokeniser hook because a shim isn't
*obviously possible*.


Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

More information about the Python-ideas mailing list