[Python-checkins] r55172 - peps/trunk/pep-3126.txt
guido.van.rossum
python-checkins at python.org
Mon May 7 20:05:28 CEST 2007
Author: guido.van.rossum
Date: Mon May 7 20:05:23 2007
New Revision: 55172
Modified:
peps/trunk/pep-3126.txt
Log:
New version, Raymond is now co-author.
Modified: peps/trunk/pep-3126.txt
==============================================================================
--- peps/trunk/pep-3126.txt (original)
+++ peps/trunk/pep-3126.txt Mon May 7 20:05:23 2007
@@ -2,111 +2,380 @@
Title: Remove Implicit String Concatenation
Version: $Revision$
Last-Modified: $Date$
-Author: Jim J. Jewett <JimJJewett at gmail.com>
+Author: Jim J. Jewett <JimJJewett at gmail.com>,
+ Raymond D. Hettinger <python at rcn.com>
Status: Draft
Type: Standards Track
-Content-Type: text/plain
+Content-Type: text/x-rst
Created: 29-Apr-2007
-Post-History: 29-Apr-2007, 30-Apr-2007
+Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007
Abstract
+========
- Python initially inherited its parsing from C. While this has
- been generally useful, there are some remnants which have been
- less useful for python, and should be eliminated.
+Python inherited many of its parsing rules from C. While this has
+been generally useful, there are some individual rules which are less
+useful for python, and should be eliminated.
- This PEP proposes to eliminate Implicit String concatenation
- based on adjacency of literals.
+This PEP proposes to eliminate implicit string concatenation based
+only on the adjacency of literals.
- Instead of
+Instead of::
- "abc" "def" == "abcdef"
+ "abc" "def" == "abcdef"
- authors will need to be explicit, and add the strings
+authors will need to be explicit, and either add the strings::
- "abc" + "def" == "abcdef"
+ "abc" + "def" == "abcdef"
+or join them::
-Rationale for Removing Implicit String Concatenation
+ "".join(["abc", "def"]) == "abcdef"
- Implicit String concatentation can lead to confusing, or even
- silent, errors.
- def f(arg1, arg2=None): pass
+Motivation
+==========
- f("abc" "def") # forgot the comma, no warning ...
- # silently becomes f("abcdef", None)
+One goal for Python 3000 should be to simplify the language by
+removing unnecessary features. Implicit string concatenation should
+be dropped in favor of existing techniques. This will simplify the
+grammar and simplify a user's mental picture of Python. The latter is
+important for letting the language "fit in your head". A large group
+of current users do not even know about implicit concatenation. Of
+those who do know about it, a large portion never use it or habitually
+avoid it. Of those who both know about it and use it, very few could
+state with confidence the implicit operator precedence and under what
+circumstances it is computed when the definition is compiled versus
+when it is run.
- or, using the scons build framework,
+
+History or Future
+-----------------
+
+Many Python parsing rules are intentionally compatible with C. This
+is a useful default, but Special Cases need to be justified based on
+their utility in Python. We should no longer assume that python
+programmers will also be familiar with C, so compatibility between
+languages should be treated as a tie-breaker, rather than a
+justification.
+
+In C, implicit concatenation is the only way to join strings without
+using a (run-time) function call to store into a variable. In Python,
+the strings can be joined (and still recognized as immutable) using
+more standard Python idioms, such ``+`` or ``"".join``.
+
+
+Problem
+-------
+
+Implicit String concatentation leads to tuples and lists which are
+shorter than they appear; this is turn can lead to confusing, or even
+silent, errors. For example, given a function which accepts several
+parameters, but offers a default value for some of them::
+
+ def f(fmt, *args):
+ print fmt % args
+
+This looks like a valid call, but isn't::
+
+ >>> f("User %s got a message %s",
+ "Bob"
+ "Time for dinner")
+
+ Traceback (most recent call last):
+ File "<pyshell#8>", line 2, in <module>
+ "Bob"
+ File "<pyshell#3>", line 2, in f
+ print fmt % args
+ TypeError: not enough arguments for format string
+
+
+Calls to this function can silently do the wrong thing::
+
+ def g(arg1, arg2=None):
+ ...
+
+ # silently transformed into the possibly very different
+ # g("arg1 on this linearg2 on this line", None)
+ g("arg1 on this line"
+ "arg2 on this line")
+
+To quote Jason Orendorff [#Orendorff]
+
+ Oh. I just realized this happens a lot out here. Where I work,
+ we use scons, and each SConscript has a long list of filenames::
sourceFiles = [
- 'foo.c'
- 'bar.c',
- #...many lines omitted...
- 'q1000x.c']
-
- It's a common mistake to leave off a comma, and then scons complains
- that it can't find 'foo.cbar.c'. This is pretty bewildering behavior
- even if you *are* a Python programmer, and not everyone here is. [1]
-
- Note that in C, the implicit concatenation is more justified; there
- is no other way to join strings without (at least) a function call.
-
- In Python, strings are objects which support the __add__ operator;
- it is possible to write:
-
- "abc" + "def"
-
- Because these are literals, this addition can still be optimized
- away by the compiler. (The CPython compiler already does. [2])
-
- Guido indicated [2] that this change should be handled by PEP, because
- there were a few edge cases with other string operators, such as the %.
- (Assuming that str % stays -- it may be eliminated in favor of
- PEP 3101 -- Advanced String Formatting. [3] [4])
-
- The resolution is to treat them the same as today.
-
- ("abc %s def" + "ghi" % var) # fails like today.
- # raises TypeError because of
- # precedence. (% before +)
-
- ("abc" + "def %s ghi" % var) # works like today; precedence makes
- # the optimization more difficult to
- # recognize, but does not change the
- # semantics.
-
- ("abc %s def" + "ghi") % var # works like today, because of
- # precedence: () before %
- # CPython compiler can already
- # add the literals at compile-time.
-
-
+ 'foo.c'
+ 'bar.c',
+ #...many lines omitted...
+ 'q1000x.c']
+
+ It's a common mistake to leave off a comma, and then scons
+ complains that it can't find 'foo.cbar.c'. This is pretty
+ bewildering behavior even if you *are* a Python programmer,
+ and not everyone here is.
+
+
+Solution
+========
+
+In Python, strings are objects and they support the __add__ operator,
+so it is possible to write::
+
+ "abc" + "def"
+
+Because these are literals, this addition can still be optimized away
+by the compiler; the CPython compiler already does so.
+[#rcn-constantfold]_
+
+Other existing alternatives include multiline (triple-quoted) strings,
+and the join method::
+
+ """This string
+ extends across
+ multiple lines, but you may want to use something like
+ Textwrap.dedent
+ to clear out the leading spaces
+ and/or reformat.
+ """
+
+
+ >>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
+ True
+
+ >>> " ".join(["space", "string", "joiner"]) == "space string joiner"
+
+ >>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
+ """multiple
+ lines""")
+ True
+
+
+Concerns
+========
+
+
+Operator Precedence
+-------------------
+
+Guido indicated [#rcn-constantfold]_ that this change should be
+handled by PEP, because there were a few edge cases with other string
+operators, such as the %. (Assuming that str % stays -- it may be
+eliminated in favor of PEP 3101 -- Advanced String Formatting.
+[#PEP3101]_ [#elimpercent]_)
+
+The resolution is to use parentheses to enforce precedence -- the same
+solution that can be used today::
+
+ # Clearest, works today, continues to work, optimization is
+ # already possible.
+ ("abc %s def" + "ghi") % var
+
+ # Already works today; precedence makes the optimization more
+ # difficult to recognize, but does not change the semantics.
+ "abc" + "def %s ghi" % var
+
+as opposed to::
+
+ # Already fails because modulus (%) is higher precedence than
+ # addition (+)
+ ("abc %s def" + "ghi" % var)
+
+ # Works today only because adjacency is higher precedence than
+ # modulus. This will no longer be available.
+ "abc %s" "def" % var
+
+ # So the 2-to-3 translator can automically replace it with the
+ # (already valid):
+ ("abc %s" + "def") % var
+
+
+Long Commands
+-------------
+
+ ... build up (what I consider to be) readable SQL queries [#skipSQL]_::
+
+ rows = self.executesql("select cities.city, state, country"
+ " from cities, venues, events, addresses"
+ " where cities.city like %s"
+ " and events.active = 1"
+ " and venues.address = addresses.id"
+ " and addresses.city = cities.id"
+ " and events.venue = venues.id",
+ (city,))
+
+Alternatives again include triple-quoted strings, ``+``, and ``.join``::
+
+ query="""select cities.city, state, country
+ from cities, venues, events, addresses
+ where cities.city like %s
+ and events.active = 1"
+ and venues.address = addresses.id
+ and addresses.city = cities.id
+ and events.venue = venues.id"""
+
+ query=( "select cities.city, state, country"
+ + " from cities, venues, events, addresses"
+ + " where cities.city like %s"
+ + " and events.active = 1"
+ + " and venues.address = addresses.id"
+ + " and addresses.city = cities.id"
+ + " and events.venue = venues.id"
+ )
+
+ query="\n".join(["select cities.city, state, country",
+ " from cities, venues, events, addresses",
+ " where cities.city like %s",
+ " and events.active = 1",
+ " and venues.address = addresses.id",
+ " and addresses.city = cities.id",
+ " and events.venue = venues.id"])
+
+ # And yes, you *could* inline any of the above querystrings
+ # the same way the original was inlined.
+ rows = self.executesql(query, (city,))
+
+
+Regular Expressions
+-------------------
+
+Complex regular expressions are sometimes stated in terms of several
+implicitly concatenated strings with each regex component on a
+different line and followed by a comment. The plus operator can be
+inserted here but it does make the regex harder to read. One
+alternative is to use the re.VERBOSE option. Another alternative is
+to build-up the regex with a series of += lines::
+
+ # Existing idiom which relies on implicit concatenation
+ r = ('a{20}' # Twenty A's
+ 'b{5}' # Followed by Five B's
+ )
+
+ # Mechanical replacement
+ r = ('a{20}' +# Twenty A's
+ 'b{5}' # Followed by Five B's
+ )
+
+ # already works today
+ r = '''a{20} # Twenty A's
+ b{5} # Followed by Five B's
+ ''' # Compiled with the re.VERBOSE flag
+
+ # already works today
+ r = 'a{20}' # Twenty A's
+ r += 'b{5}' # Followed by Five B's
+
+
+Internationalization
+--------------------
+
+Some internationalization tools -- notably xgettext -- have already
+been special-cased for implicit concatenation, but not for Python's
+explicit concatenation. [#barryi8]_
+
+These tools will fail to extract the (already legal)::
+
+ _("some string" +
+ " and more of it")
+
+but often have a special case for::
+
+ _("some string"
+ " and more of it")
+
+It should also be possible to just use an overly long line (xgettext
+limits messages to 2048 characters [#xgettext2048]_, which is less
+than Python's enforced limit) or triple-quoted strings, but these
+solutions sacrifice some readability in the code::
+
+ # Lines over a certain length are unpleasant.
+ _("some string and more of it")
+
+ # Changing whitespace is not ideal.
+ _("""Some string
+ and more of it""")
+ _("""Some string
+ and more of it""")
+ _("Some string \
+ and more of it")
+
+I do not see a good short-term resolution for this.
+
+
+Transition
+==========
+
+The proposed new constructs are already legal in current Python, and
+can be used immediately.
+
+The 2 to 3 translator can be made to mechanically change::
+
+ "str1" "str2"
+ ("line1" #comment
+ "line2")
+
+into::
+
+ ("str1" + "str2")
+ ("line1" +#comments
+ "line2")
+
+If users want to use one of the other idioms, they can; as these
+idioms are all already legal in python 2, the edits can be made
+to the original source, rather than patching up the translator.
+
+
+Open Issues
+===========
+
+Is there a better way to support external text extraction tools, or at
+least ``xgettext`` [#gettext]_ in particular?
+
+
References
+==========
+
+.. [#Orendorff] Implicit String Concatenation, Orendorff
+ http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
+
+.. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger,
+ van Rossum
+ http://mail.python.org/pipermail/python-3000/2007-April/006563.html
+
+.. [#PEP3101] PEP 3101, Advanced String Formatting, Talin
+ http://www.python.org/peps/pep-3101.html
+
+.. [#elimpercent] ps to question Re: Need help completing ABC pep,
+ van Rossum
+ http://mail.python.org/pipermail/python-3000/2007-April/006737.html
+
+.. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
+ http://mail.python.org/pipermail/python-3000/2007-May/007261.html
- [1] Implicit String Concatenation, Jewett, Orendorff
- http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
+.. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing
+ http://mail.python.org/pipermail/python-3000/2007-May/007305.html
- [2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
- http://mail.python.org/pipermail/python-3000/2007-April/006563.html
+.. [#gettext] GNU gettext manual
+ http://www.gnu.org/software/gettext/
- [3] PEP 3101, Advanced String Formatting, Talin
- http://www.python.org/peps/pep-3101.html
+.. [#xgettext2048] Unix man page for xgettext -- Notes section
+ http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext
- [4] ps to question Re: Need help completing ABC pep, van Rossum
- http://mail.python.org/pipermail/python-3000/2007-April/006737.html
Copyright
+=========
This document has been placed in the public domain.
-Local Variables:
-mode: indented-text
-indent-tabs-mode: nil
-sentence-end-double-space: t
-fill-column: 70
-coding: utf-8
-End:
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
+ End:
More information about the Python-checkins
mailing list