[Python-Dev] Next-to-last wart in Python syntax.

Thomas Wouters thomas@xs4all.net
Wed, 14 Mar 2001 14:58:50 +0100


On Wed, Mar 14, 2001 at 02:09:51PM +0100, Fredrik Lundh wrote:
> finn wrote:

> > >>>> spam(class=1)
> > >{'class': 1}
> > >>>> spam(print=1)
> > >{'print': 1}
> > 
> > Exactly.

> how hard would it be to fix this in CPython?  can it be
> done in time for 2.1?  (Thomas?)

Well, monday night my jetlag hit very badly (I flew back on the night from
saturday to sunday) and caused me to skip an entire night of sleep. I spent
part of that breaking my brain over the parser :) I have no experience with
parsers or parser-writing, by the way, so this comes hard to me, and I have
no clue how this is solved in other parsers.

I seriously doubt it can be done for 2.1, unless someone knows parsers well
and can deliver an extended version of the current parser well before the
next beta. Changing the parser to something not so limited as our current
parser would be too big a change to slip in right before 2.1. 

Fixing the current parser is possible, but not straightforward. As far as I
can figure out, the parser first breaks up the file in elements and then
classifies the elements, and if an element cannot be classified, it is left
as bareword for the subsequent passes to catch it as either a valid
identifier in a valid context, or a syntax error.

I guess it should be possible to hack the parser so it accepts other
statements where it expects an identifier, and then treats those statements
as strings, but you can't just accept all statements -- some will be needed
to bracket the identifier, or you get weird behaviour when you say 'def ()'.
So you need to maintain a list of acceptible statements and try each of
those... My guess is that it's possible, I just haven't figured out how to
do it yet. Can we force a certain 'ordering' in the keywords (their symbolic
number as #defined in graminit.h) some way ?

Another solution would be to do it explicitly in Grammar. I posted an
attempt at that before, but it hurts. It can be done in two ways, both of
which hurt for different reasons :) For example,

funcdef: 'def' NAME parameters ':' suite

can be changed in

funcdef: 'def' nameorkw parameters ':' suite
nameorkw: NAME | 'def' | 'and' | 'pass' | 'print' | 'return' | ...

or in

funcdef: 'def' (NAME | 'def' | 'and' | 'pass' | 'print' | ...) parameters ':' suite

The first means changing the places that currently accept a NAME, and that
means that all places where the compiler does STR(node) have to be checked.
There is a *lot* of those, and it isn't directly obvious whether they expect
node to be a NAME, or really know that, or think they know that. STR() could
be made to detect 'nameorkw' nodetypes and get the STR() of its first child
if so, but that's really an ugly hack.

The second way is even more of an ugly hack, but it doesn't require any
changes in the parser. It just requires making the Grammar look like random
garbage :) Of course, we could keep the grammar the way it is, and
preprocess it before feeding it to the parser, extracting all keywords
dynamically and sneakily replacing NAME with (NAME | keywords )... hmm...
that might actually be workable. It would still be a hack, though.

Now-for-something-easy--meetings!-ly y'rs ;)
-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!