[Python-Dev] PEP 30XZ: Simplified Parsing

Jim Jewett jimjjewett at gmail.com
Mon Apr 30 05:29:25 CEST 2007


PEP: 30xz
Title: Simplified Parsing
Version: $Revision$
Last-Modified: $Date$
Author: Jim J. Jewett <JimJJewett at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 29-Apr-2007
Post-History: 29-Apr-2007


Abstract

    Python initially inherited its parsing from C.  While this has
    been generally useful, there are some remnants which have been
    less useful for python, and should be eliminated.

    + Implicit String concatenation

    + Line continuation with "\"

    + 034 as an octal number (== decimal 28).  Note that this is
      listed only for completeness; the decision to raise an
      Exception for leading zeros has already been made in the
      context of PEP XXX, about adding a binary literal.


Rationale for Removing Implicit String Concatenation

    Implicit String concatentation can lead to confusing, or even
    silent, errors. [1]

        def f(arg1, arg2=None): pass

        f("abc" "def")  # forgot the comma, no warning ...
                        # silently becomes f("abcdef", None)

    or, using the scons build framework,

        sourceFiles = [
        'foo.c',
        'bar.c',
        #...many lines omitted...
        'q1000x.c']

    It's a common mistake to leave off a comma, and then scons complains
    that it can't find 'foo.cbar.c'.  This is pretty bewildering behavior
    even if you *are* a Python programmer, and not everyone here is.

    Note that in C, the implicit concatenation is more justified; there
    is no other way to join strings without (at least) a function call.

    In Python, strings are objects which support the __add__ operator;
    it is possible to write:

        "abc" + "def"

    Because these are literals, this addition can still be optimized
    away by the compiler.

    Guido indicated [2] that this change should be handled by PEP, because
    there were a few edge cases with other string operators, such as the %.
    The resolution is to treat them the same as today.

        ("abc %s def" + "ghi" % var)  # fails like today.
                                      # raises TypeError because of
                                      # precedence.  (% before +)

        ("abc" + "def %s ghi" % var)  # works like today; precedence makes
                                      # the optimization more difficult to
                                      # recognize, but does not change the
                                      # semantics.

        ("abc %s def" + "ghi") % var  # works like today, because of
                                      # precedence:  () before %
                                      # CPython compiler can already
                                      # add the literals at compile-time.


Rationale for Removing Explicit Line Continuation

    A terminal "\" indicates that the logical line is continued on the
    following physical line (after whitespace).

    Note that a non-terminal "\" does not have this meaning, even if the
    only additional characters are invisible whitespace.  (Python depends
    heavily on *visible* whitespace at the beginning of a line; it does
    not otherwise depend on *invisible* terminal whitespace.)  Adding
    whitespace after a "\" will typically cause a syntax error rather
    than a silent bug, but it still isn't desirable.

    The reason to keep "\" is that occasionally code looks better with
    a "\" than with a () pair.

        assert True, (
            "This Paren is goofy")

    But realistically, that paren is no worse than a "\".  The only
    advantage of "\" is that it is slightly more familiar to users of
    C-based languages.  These same languages all also support line
    continuation with (), so reading code will not be a problem, and
    there will be one less rule to learn for people entirely new to
    programming.


Rationale for Removing Implicit Octal Literals

    This decision should be covered by PEP ???, on numeric literals.
    It is mentioned here only for completeness.

    C treats integers beginning with "0" as octal, rather than decimal.
    Historically, Python has inherited this usage.  This has caused
    quite a few annoying bugs for people who forgot the rule, and
    tried to line up their constants.

        a = 123
        b = 024   # really only 20, because octal
        c = 245

    In Python 3.0, the second line will instead raise a SyntaxError,
    because of the ambiguity.  Instead, the line should be written
    as in one of the following ways:

        b = 24    # PEP 8
        b =  24   # columns line up, for quick scanning
        b = 0t24  # really did want an Octal!


References

    [1] Implicit String Concatenation, Jewett, Orendorff
        http://mail.python.org/pipermail/python-ideas/2007-April/000397.html

    [2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw
        http://www.python.org/peps/pep-0012

    [3] http://www.opencontent.org/openpub/



Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:


More information about the Python-Dev mailing list