[Python-checkins] r55172 - peps/trunk/pep-3126.txt

guido.van.rossum python-checkins at python.org
Mon May 7 20:05:28 CEST 2007


Author: guido.van.rossum
Date: Mon May  7 20:05:23 2007
New Revision: 55172

Modified:
   peps/trunk/pep-3126.txt
Log:
New version, Raymond is now co-author.


Modified: peps/trunk/pep-3126.txt
==============================================================================
--- peps/trunk/pep-3126.txt	(original)
+++ peps/trunk/pep-3126.txt	Mon May  7 20:05:23 2007
@@ -2,111 +2,380 @@
 Title: Remove Implicit String Concatenation
 Version: $Revision$
 Last-Modified: $Date$
-Author: Jim J. Jewett <JimJJewett at gmail.com>
+Author: Jim J. Jewett <JimJJewett at gmail.com>,
+        Raymond D. Hettinger <python at rcn.com>
 Status: Draft
 Type: Standards Track
-Content-Type: text/plain
+Content-Type: text/x-rst
 Created: 29-Apr-2007
-Post-History: 29-Apr-2007, 30-Apr-2007
+Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007
 
 
 Abstract
+========
 
-    Python initially inherited its parsing from C.  While this has
-    been generally useful, there are some remnants which have been
-    less useful for python, and should be eliminated.
+Python inherited many of its parsing rules from C.  While this has
+been generally useful, there are some individual rules which are less
+useful for python, and should be eliminated.
 
-    This PEP proposes to eliminate Implicit String concatenation
-    based on adjacency of literals.
+This PEP proposes to eliminate implicit string concatenation based
+only on the adjacency of literals.
 
-    Instead of
+Instead of::
 
-        "abc" "def" == "abcdef"
+    "abc" "def" == "abcdef"
 
-    authors will need to be explicit, and add the strings
+authors will need to be explicit, and either add the strings::
 
-        "abc" + "def" == "abcdef"
+    "abc" + "def" == "abcdef"
 
+or join them::
 
-Rationale for Removing Implicit String Concatenation
+    "".join(["abc", "def"]) == "abcdef"
 
-    Implicit String concatentation can lead to confusing, or even
-    silent, errors.
 
-        def f(arg1, arg2=None): pass
+Motivation
+==========
 
-        f("abc" "def")  # forgot the comma, no warning ...
-                        # silently becomes f("abcdef", None)
+One goal for Python 3000 should be to simplify the language by
+removing unnecessary features.  Implicit string concatenation should
+be dropped in favor of existing techniques. This will simplify the
+grammar and simplify a user's mental picture of Python.  The latter is
+important for letting the language "fit in your head".  A large group
+of current users do not even know about implicit concatenation.  Of
+those who do know about it, a large portion never use it or habitually
+avoid it. Of those who both know about it and use it, very few could
+state with confidence the implicit operator precedence and under what
+circumstances it is computed when the definition is compiled versus
+when it is run.
 
-    or, using the scons build framework,
+
+History or Future
+-----------------
+
+Many Python parsing rules are intentionally compatible with C.  This
+is a useful default, but Special Cases need to be justified based on
+their utility in Python.  We should no longer assume that python
+programmers will also be familiar with C, so compatibility between
+languages should be treated as a tie-breaker, rather than a
+justification.
+
+In C, implicit concatenation is the only way to join strings without
+using a (run-time) function call to store into a variable.  In Python,
+the strings can be joined (and still recognized as immutable) using
+more standard Python idioms, such ``+`` or ``"".join``.
+
+
+Problem
+-------
+
+Implicit String concatentation leads to tuples and lists which are
+shorter than they appear; this is turn can lead to confusing, or even
+silent, errors.  For example, given a function which accepts several
+parameters, but offers a default value for some of them::
+
+    def f(fmt, *args):
+        print fmt % args
+
+This looks like a valid call, but isn't::
+
+    >>> f("User %s got a message %s",
+          "Bob"
+          "Time for dinner")
+
+    Traceback (most recent call last):
+      File "<pyshell#8>", line 2, in <module>
+        "Bob"
+      File "<pyshell#3>", line 2, in f
+        print fmt % args
+    TypeError: not enough arguments for format string
+
+
+Calls to this function can silently do the wrong thing::
+
+    def g(arg1, arg2=None):
+        ...
+
+    # silently transformed into the possibly very different
+    # g("arg1 on this linearg2 on this line", None)
+    g("arg1 on this line"
+      "arg2 on this line")
+
+To quote Jason Orendorff [#Orendorff]
+
+    Oh.  I just realized this happens a lot out here.  Where I work,
+    we use scons, and each SConscript has a long list of filenames::
 
         sourceFiles = [
-        'foo.c'
-        'bar.c',
-        #...many lines omitted...
-        'q1000x.c']
-
-    It's a common mistake to leave off a comma, and then scons complains
-    that it can't find 'foo.cbar.c'.  This is pretty bewildering behavior
-    even if you *are* a Python programmer, and not everyone here is.  [1]
-
-    Note that in C, the implicit concatenation is more justified; there
-    is no other way to join strings without (at least) a function call.
-
-    In Python, strings are objects which support the __add__ operator;
-    it is possible to write:
-
-        "abc" + "def"
-
-    Because these are literals, this addition can still be optimized
-    away by the compiler.  (The CPython compiler already does.  [2])
-
-    Guido indicated [2] that this change should be handled by PEP, because
-    there were a few edge cases with other string operators, such as the %.
-    (Assuming that str % stays -- it may be eliminated in favor of
-    PEP 3101 -- Advanced String Formatting.  [3] [4])
-    
-    The resolution is to treat them the same as today.
-
-        ("abc %s def" + "ghi" % var)  # fails like today.
-                                      # raises TypeError because of
-                                      # precedence.  (% before +)
-        
-        ("abc" + "def %s ghi" % var)  # works like today; precedence makes
-                                      # the optimization more difficult to
-                                      # recognize, but does not change the
-                                      # semantics.
-
-        ("abc %s def" + "ghi") % var  # works like today, because of
-                                      # precedence:  () before % 
-                                      # CPython compiler can already 
-                                      # add the literals at compile-time.
-    
-    
+            'foo.c'
+            'bar.c',
+            #...many lines omitted...
+            'q1000x.c']
+
+    It's a common mistake to leave off a comma, and then scons
+    complains that it can't find 'foo.cbar.c'.  This is pretty
+    bewildering behavior even if you *are* a Python programmer,
+    and not everyone here is.
+
+
+Solution
+========
+
+In Python, strings are objects and they support the __add__ operator,
+so it is possible to write::
+
+    "abc" + "def"
+
+Because these are literals, this addition can still be optimized away
+by the compiler; the CPython compiler already does so.
+[#rcn-constantfold]_
+
+Other existing alternatives include multiline (triple-quoted) strings,
+and the join method::
+
+    """This string
+       extends across
+       multiple lines, but you may want to use something like
+       Textwrap.dedent
+       to clear out the leading spaces
+       and/or reformat.
+    """
+
+
+    >>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
+    True
+
+    >>> " ".join(["space", "string", "joiner"]) == "space string joiner"
+
+    >>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
+    """multiple
+    lines""")
+    True
+
+
+Concerns
+========
+
+
+Operator Precedence
+-------------------
+
+Guido indicated [#rcn-constantfold]_ that this change should be
+handled by PEP, because there were a few edge cases with other string
+operators, such as the %.  (Assuming that str % stays -- it may be
+eliminated in favor of PEP 3101 -- Advanced String Formatting.
+[#PEP3101]_ [#elimpercent]_)
+
+The resolution is to use parentheses to enforce precedence -- the same
+solution that can be used today::
+
+    # Clearest, works today, continues to work, optimization is
+    # already possible.
+    ("abc %s def" + "ghi") % var
+
+    # Already works today; precedence makes the optimization more
+    # difficult to recognize, but does not change the semantics.
+    "abc" + "def %s ghi" % var
+
+as opposed to::
+
+    # Already fails because modulus (%) is higher precedence than
+    # addition (+)
+    ("abc %s def" + "ghi" % var)
+
+    # Works today only because adjacency is higher precedence than
+    # modulus.  This will no longer be available.
+    "abc %s" "def" % var
+
+    # So the 2-to-3 translator can automically replace it with the
+    # (already valid):
+    ("abc %s" + "def") % var
+
+
+Long Commands
+-------------
+
+    ... build up (what I consider to be) readable SQL queries [#skipSQL]_::
+
+        rows = self.executesql("select cities.city, state, country"
+                               "    from cities, venues, events, addresses"
+                               "    where cities.city like %s"
+                               "      and events.active = 1"
+                               "      and venues.address = addresses.id"
+                               "      and addresses.city = cities.id"
+                               "      and events.venue = venues.id",
+                               (city,))
+
+Alternatives again include triple-quoted strings, ``+``, and ``.join``::
+
+    query="""select cities.city, state, country
+                 from cities, venues, events, addresses
+                 where cities.city like %s
+                   and events.active = 1"
+                   and venues.address = addresses.id
+                   and addresses.city = cities.id
+                   and events.venue = venues.id"""
+
+    query=( "select cities.city, state, country"
+          + "    from cities, venues, events, addresses"
+          + "    where cities.city like %s"
+          + "      and events.active = 1"
+          + "      and venues.address = addresses.id"
+          + "      and addresses.city = cities.id"
+          + "      and events.venue = venues.id"
+          )
+
+    query="\n".join(["select cities.city, state, country",
+                     "    from cities, venues, events, addresses",
+                     "    where cities.city like %s",
+                     "      and events.active = 1",
+                     "      and venues.address = addresses.id",
+                     "      and addresses.city = cities.id",
+                     "      and events.venue = venues.id"])
+
+    # And yes, you *could* inline any of the above querystrings
+    # the same way the original was inlined.
+    rows = self.executesql(query, (city,))
+
+
+Regular Expressions
+-------------------
+
+Complex regular expressions are sometimes stated in terms of several
+implicitly concatenated strings with each regex component on a
+different line and followed by a comment.  The plus operator can be
+inserted here but it does make the regex harder to read.  One
+alternative is to use the re.VERBOSE option.  Another alternative is
+to build-up the regex with a series of += lines::
+
+    # Existing idiom which relies on implicit concatenation
+    r = ('a{20}'  # Twenty A's
+         'b{5}'   # Followed by Five B's
+         )
+
+    # Mechanical replacement
+    r = ('a{20}'  +# Twenty A's
+         'b{5}'   # Followed by Five B's
+         )
+
+    # already works today
+    r = '''a{20}  # Twenty A's
+           b{5}   # Followed by Five B's
+        '''                 # Compiled with the re.VERBOSE flag
+
+    # already works today
+    r = 'a{20}'   # Twenty A's
+    r += 'b{5}'   # Followed by Five B's
+
+
+Internationalization
+--------------------
+
+Some internationalization tools -- notably xgettext -- have already
+been special-cased for implicit concatenation, but not for Python's
+explicit concatenation. [#barryi8]_
+
+These tools will fail to extract the (already legal)::
+
+    _("some string" +
+      " and more of it")
+
+but often have a special case for::
+
+    _("some string"
+      " and more of it")
+
+It should also be possible to just use an overly long line (xgettext
+limits messages to 2048 characters [#xgettext2048]_, which is less
+than Python's enforced limit) or triple-quoted strings, but these
+solutions sacrifice some readability in the code::
+
+    # Lines over a certain length are unpleasant.
+    _("some string and more of it")
+
+    # Changing whitespace is not ideal.
+    _("""Some string
+         and more of it""")
+    _("""Some string
+    and more of it""")
+    _("Some string \
+    and more of it")
+
+I do not see a good short-term resolution for this.
+
+
+Transition
+==========
+
+The proposed new constructs are already legal in current Python, and
+can be used immediately.
+
+The 2 to 3 translator can be made to mechanically change::
+
+    "str1" "str2"
+    ("line1"  #comment
+     "line2")
+
+into::
+
+    ("str1" + "str2")
+    ("line1"   +#comments
+     "line2")
+
+If users want to use one of the other idioms, they can; as these
+idioms are all already legal in python 2, the edits can be made
+to the original source, rather than patching up the translator.
+
+
+Open Issues
+===========
+
+Is there a better way to support external text extraction tools, or at
+least ``xgettext`` [#gettext]_ in particular?
+
+
 References
+==========
+
+..  [#Orendorff] Implicit String Concatenation, Orendorff
+    http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
+
+..  [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger,
+    van Rossum
+    http://mail.python.org/pipermail/python-3000/2007-April/006563.html
+
+..  [#PEP3101] PEP 3101, Advanced String Formatting, Talin
+    http://www.python.org/peps/pep-3101.html
+
+..  [#elimpercent] ps to question Re: Need help completing ABC pep,
+    van Rossum
+    http://mail.python.org/pipermail/python-3000/2007-April/006737.html
+
+..  [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
+    http://mail.python.org/pipermail/python-3000/2007-May/007261.html
 
-    [1] Implicit String Concatenation, Jewett, Orendorff
-        http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
+..  [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing
+    http://mail.python.org/pipermail/python-3000/2007-May/007305.html
 
-    [2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
-        http://mail.python.org/pipermail/python-3000/2007-April/006563.html
+..  [#gettext] GNU gettext manual
+    http://www.gnu.org/software/gettext/
 
-    [3] PEP 3101, Advanced String Formatting, Talin
-        http://www.python.org/peps/pep-3101.html
+..  [#xgettext2048] Unix man page for xgettext -- Notes section
+    http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext
 
-    [4] ps to question Re: Need help completing ABC pep, van Rossum
-        http://mail.python.org/pipermail/python-3000/2007-April/006737.html
 
 Copyright
+=========
 
     This document has been placed in the public domain.
 
 
 
-Local Variables:
-mode: indented-text
-indent-tabs-mode: nil
-sentence-end-double-space: t
-fill-column: 70
-coding: utf-8
-End:
+..
+    Local Variables:
+    mode: indented-text
+    indent-tabs-mode: nil
+    sentence-end-double-space: t
+    fill-column: 70
+    coding: utf-8
+    End:


More information about the Python-checkins mailing list