[Python-ideas] Tighten up the formal grammar and parsing a bit?

Mon May 15 08:29:33 EDT 2017

On Mon, May 15, 2017 at 07:38:29PM +1000, Hugh Fisher wrote:

> I wrote this little Python program using CPython 3.5.2. It's ...
> interesting ... that we apparently don't need comments or pass
> statements any more. 

I'm not sure what you mean by "any more". The code you give works, 
unchanged, all the way back to Python 2.0 when augmented assignment was 
added. If you replace the 

    x += 1

with 

    x = x + 1

it works all the way back to Python 1.5 and probably even older. Python 
has (more or less) always supported arbitrary expressions as statements, 
so this is not new. This is a feature, not a bug: supporting expressions 
as statements is necessary for expressions like:

    alist.sort()

and other expressions with side-effects. Unfortunately, that means that 
pointless expressions like:

    42

that have no purpose are also legal.

In recent versions, the compiler has a peephole optimizer that removes 
at least some constant expressions:

# Python 3.5

py> block = """x = 1
... 'some string'
... 100
... y = 2
... """
py> code = compile(block, '', 'exec')
py> from dis import dis
py> dis(code)
  1           0 LOAD_CONST               0 (1)
              3 STORE_NAME               0 (x)

  4           6 LOAD_CONST               1 (2)
              9 STORE_NAME               1 (y)
             12 LOAD_CONST               2 (None)
             15 RETURN_VALUE

There's also a (weak) convention that bare string literals are intended 
as pseudo-constants. That's especially handy with triple-quoted strings, 
since they can comment-out multiple lines.

> Anyone else think it might be worth tightening up
> the grammar definition and parser a bit?

Not me.

In fact, I'd go further than just saying "I don't think it is 
worthwhile". I'll say that treating bare strings as pseudo-comments is 
a positive feature worth keeping. Tightening up the grammar to prohibit 
that is a bad thing.

There's an argument to be made that bare expressions like:

    100

are pointless, but it isn't a strong argument. In practice, it isn't 
really a common source of errors, and as far as efficiency goes, the 
peephole optimizer solves that.

And its easy to get the rules wrong. For instance, at first I thought 
that a bare name lookup like:

    x

could be safely optimized away, or prohibited, but it can't. It is true 
that a successful name lookup will do nothing, but not all lookups are 
successful:

    try:
        next
    except NameError:
        # Python version is too old
        def next(iterator):
            return iterator.next()

If we prohibit bare name lookups, that will break a lot of working code.

I suppose it is possible that a *sufficiently intelligent* compiler 
could recognise bare expressions that have no side-effects, and prohibit 
them, and that this might prevent some rare, occasional errors:

    x #= 1  # oops I meant +=

but honestly, I don't see that this is a good use of developer's time. 
It adds complexity to the language, risks false positives, and in my 
opinion is the sort of thing that is better flagged by a linter, not 
prohibited by the interpreter.

-- 
Steve