PEP 30XZ: Simplified Parsing
PEP: 30xz Title: Simplified Parsing Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett <JimJJewett@gmail.com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007 Abstract Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated. + Implicit String concatenation + Line continuation with "\" + 034 as an octal number (== decimal 28). Note that this is listed only for completeness; the decision to raise an Exception for leading zeros has already been made in the context of PEP XXX, about adding a binary literal. Rationale for Removing Implicit String Concatenation Implicit String concatentation can lead to confusing, or even silent, errors. [1] def f(arg1, arg2=None): pass f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None) or, using the scons build framework, sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c'] It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is. Note that in C, the implicit concatenation is more justified; there is no other way to join strings without (at least) a function call. In Python, strings are objects which support the __add__ operator; it is possible to write: "abc" + "def" Because these are literals, this addition can still be optimized away by the compiler. Guido indicated [2] that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. The resolution is to treat them the same as today. ("abc %s def" + "ghi" % var) # fails like today. # raises TypeError because of # precedence. (% before +) ("abc" + "def %s ghi" % var) # works like today; precedence makes # the optimization more difficult to # recognize, but does not change the # semantics. ("abc %s def" + "ghi") % var # works like today, because of # precedence: () before % # CPython compiler can already # add the literals at compile-time. Rationale for Removing Explicit Line Continuation A terminal "\" indicates that the logical line is continued on the following physical line (after whitespace). Note that a non-terminal "\" does not have this meaning, even if the only additional characters are invisible whitespace. (Python depends heavily on *visible* whitespace at the beginning of a line; it does not otherwise depend on *invisible* terminal whitespace.) Adding whitespace after a "\" will typically cause a syntax error rather than a silent bug, but it still isn't desirable. The reason to keep "\" is that occasionally code looks better with a "\" than with a () pair. assert True, ( "This Paren is goofy") But realistically, that paren is no worse than a "\". The only advantage of "\" is that it is slightly more familiar to users of C-based languages. These same languages all also support line continuation with (), so reading code will not be a problem, and there will be one less rule to learn for people entirely new to programming. Rationale for Removing Implicit Octal Literals This decision should be covered by PEP ???, on numeric literals. It is mentioned here only for completeness. C treats integers beginning with "0" as octal, rather than decimal. Historically, Python has inherited this usage. This has caused quite a few annoying bugs for people who forgot the rule, and tried to line up their constants. a = 123 b = 024 # really only 20, because octal c = 245 In Python 3.0, the second line will instead raise a SyntaxError, because of the ambiguity. Instead, the line should be written as in one of the following ways: b = 24 # PEP 8 b = 24 # columns line up, for quick scanning b = 0t24 # really did want an Octal! References [1] Implicit String Concatenation, Jewett, Orendorff http://mail.python.org/pipermail/python-ideas/2007-April/000397.html [2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw http://www.python.org/peps/pep-0012 [3] http://www.opencontent.org/openpub/ Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Jim Jewett wrote:
Rationale for Removing Implicit String Concatenation
Implicit String concatentation can lead to confusing, or even silent, errors. [1]
def f(arg1, arg2=None): pass
f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None)
or, using the scons build framework,
sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c']
Since your first example omits the comma, I think this one should, too. sourceFiles = [ 'foo.c' 'bar.c', #...many lines omitted... 'q1000x.c'] That is, either both examples should show an error, or both examples should work, but point out how easy it is to make an error. Eric.
Jim Jewett wrote:
PEP: 30xz Title: Simplified Parsing Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett <JimJJewett@gmail.com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007
Abstract
Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated.
+ Implicit String concatenation
+ Line continuation with "\"
+ 034 as an octal number (== decimal 28). Note that this is listed only for completeness; the decision to raise an Exception for leading zeros has already been made in the context of PEP XXX, about adding a binary literal.
Rationale for Removing Implicit String Concatenation
Implicit String concatentation can lead to confusing, or even silent, errors. [1]
def f(arg1, arg2=None): pass
f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None)
Implicit string concatenation is massively useful for creating long strings in a readable way though: call_something("first part\n" "second line\n" "third line\n") I find it an elegant way of building strings and would be sad to see it go. Adding trailing '+' signs is ugly. Michael Foord
or, using the scons build framework,
sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c']
It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is.
Note that in C, the implicit concatenation is more justified; there is no other way to join strings without (at least) a function call.
In Python, strings are objects which support the __add__ operator; it is possible to write:
"abc" + "def"
Because these are literals, this addition can still be optimized away by the compiler.
Guido indicated [2] that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. The resolution is to treat them the same as today.
("abc %s def" + "ghi" % var) # fails like today. # raises TypeError because of # precedence. (% before +)
("abc" + "def %s ghi" % var) # works like today; precedence makes # the optimization more difficult to # recognize, but does not change the # semantics.
("abc %s def" + "ghi") % var # works like today, because of # precedence: () before % # CPython compiler can already # add the literals at compile-time.
Rationale for Removing Explicit Line Continuation
A terminal "\" indicates that the logical line is continued on the following physical line (after whitespace).
Note that a non-terminal "\" does not have this meaning, even if the only additional characters are invisible whitespace. (Python depends heavily on *visible* whitespace at the beginning of a line; it does not otherwise depend on *invisible* terminal whitespace.) Adding whitespace after a "\" will typically cause a syntax error rather than a silent bug, but it still isn't desirable.
The reason to keep "\" is that occasionally code looks better with a "\" than with a () pair.
assert True, ( "This Paren is goofy")
But realistically, that paren is no worse than a "\". The only advantage of "\" is that it is slightly more familiar to users of C-based languages. These same languages all also support line continuation with (), so reading code will not be a problem, and there will be one less rule to learn for people entirely new to programming.
Rationale for Removing Implicit Octal Literals
This decision should be covered by PEP ???, on numeric literals. It is mentioned here only for completeness.
C treats integers beginning with "0" as octal, rather than decimal. Historically, Python has inherited this usage. This has caused quite a few annoying bugs for people who forgot the rule, and tried to line up their constants.
a = 123 b = 024 # really only 20, because octal c = 245
In Python 3.0, the second line will instead raise a SyntaxError, because of the ambiguity. Instead, the line should be written as in one of the following ways:
b = 24 # PEP 8 b = 24 # columns line up, for quick scanning b = 0t24 # really did want an Octal!
References
[1] Implicit String Concatenation, Jewett, Orendorff http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
[2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw http://www.python.org/peps/pep-0012
[3] http://www.opencontent.org/openpub/
Copyright
This document has been placed in the public domain.
Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
On 5/2/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Implicit string concatenation is massively useful for creating long strings in a readable way though:
call_something("first part\n" "second line\n" "third line\n")
I find it an elegant way of building strings and would be sad to see it go. Adding trailing '+' signs is ugly.
You'll still have textwrap.dedent:: call_something(dedent('''\ first part second line third line ''')) And using textwrap.dedent, you don't have to remember to add the \n at the end of every line. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
Steven Bethard wrote:
On 5/2/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Implicit string concatenation is massively useful for creating long strings in a readable way though:
call_something("first part\n" "second line\n" "third line\n")
I find it an elegant way of building strings and would be sad to see it go. Adding trailing '+' signs is ugly.
You'll still have textwrap.dedent::
call_something(dedent('''\ first part second line third line '''))
And using textwrap.dedent, you don't have to remember to add the \n at the end of every line.
But if you don't want the EOLs? Example from some code of mine: raise MakeError("extracting '%s' in '%s' did not create the " "directory that the Python build will expect: " "'%s'" % (src_pkg, dst_dir, dst)) I use this kind of thing frequently. Don't know if others consider it bad style. Trent -- Trent Mick trentm at activestate.com
On Wed, May 02, 2007 at 04:42:09PM +0100, Michael Foord wrote:
Implicit string concatenation is massively useful for creating long strings in a readable way though:
This PEP doesn't seem very well-argued: "It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'." Yes, and then you say "oh, right!" and add the missing comma; problem fixed! The whole cycle takes about a minute. Is this really an issue worth fixing? --amk
On Wed, May 02, 2007 at 01:53:01PM -0400, A.M. Kuchling wrote:
On Wed, May 02, 2007 at 04:42:09PM +0100, Michael Foord wrote:
Implicit string concatenation is massively useful for creating long strings in a readable way though:
This PEP doesn't seem very well-argued: "It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'." Yes, and then you say "oh, right!" and add the missing comma; problem fixed! The whole cycle takes about a minute. Is this really an issue worth fixing?
The 'cycle' can also generally be avoided via a few good habits- sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c'] That's the original example provided; each file is on a seperate line so it's a bit easier to tell what changed if you're reviewing the delta. That said, doing sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c', ] is (in my experience) a fair bit better; you *can* have the trailing comma without any ill affects, plus shifting the ']' to a seperate line is lessy noisy delta wise for the usual "add another string to the end of the list". Personally, I'm -1 on nuking implicit string concatenation; the examples provided for the 'why' aren't that strong in my experience, and the forced shift to concattenation is rather annoying when you're dealing with code limits (80 char limit for example)- dprint("depends level cycle: %s: " "dropping cycle for %s from %s" % (cur_frame.atom, datom, cur_frame.current_pkg), "cycle") Converting that over isn't hard, but it's a great way to inadvertantly bite yourself in the butt- triple quote isn't usually much of an option in such a case also since you don't want the newlines coming through. ~harring
At 10:34 AM 5/2/2007 -0700, Trent Mick wrote:
But if you don't want the EOLs? Example from some code of mine:
raise MakeError("extracting '%s' in '%s' did not create the " "directory that the Python build will expect: " "'%s'" % (src_pkg, dst_dir, dst))
I use this kind of thing frequently. Don't know if others consider it bad style.
Well, I do it a lot too; don't know if that makes it good or bad, though. :) I personally don't see a lot of benefit to changing the lexical rules for Py3K, however. The hard part of lexing Python is INDENT/DEDENT (and the associated unbalanced parens rule), and none of these proposals suggest removing *that*. Overall, this whole thing seems like a bikeshed to me.
On Wednesday 02 May 2007, Trent Mick wrote:
raise MakeError("extracting '%s' in '%s' did not create the " "directory that the Python build will expect: " "'%s'" % (src_pkg, dst_dir, dst))
I use this kind of thing frequently. Don't know if others consider it bad style.
I do this too; this is a good way to have a simple human-readable message without doing weird things to about extraneous newlines or strange indentation. -1 on removing implicit string catenation. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
On 4/30/07, Jim Jewett <jimjjewett@gmail.com> wrote:
Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated.
+ Implicit String concatenation
+ Line continuation with "\"
I don't know if I can vote, but if I could I'd be -1 on this. Can't say I'm using continuation often, but there's one case when I'm using it and I'd like to continue using it: #!/usr/bin/env python """\ Usage: some-tool.py [arguments...] Does this and that based on its arguments""" if condition: print __doc__ sys.exit(1) This way usage immediately stands out much better, without any unnecessary new lines. Best regards, Alexey.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 2, 2007, at 2:51 PM, Phillip J. Eby wrote:
At 10:34 AM 5/2/2007 -0700, Trent Mick wrote:
But if you don't want the EOLs? Example from some code of mine:
raise MakeError("extracting '%s' in '%s' did not create the " "directory that the Python build will expect: " "'%s'" % (src_pkg, dst_dir, dst))
I use this kind of thing frequently. Don't know if others consider it bad style.
Well, I do it a lot too; don't know if that makes it good or bad, though. :)
I just realized that changing these lexical rules might have an adverse affect on internationalization. Or it might force more lines to go over the 79 character limit. The problem is that _("some string" " and more of it") is not the same as _("some string" + " and more of it") because the latter won't be extracted by tools like pygettext (I'm not sure about standard gettext). You would either have to teach pygettext and maybe gettext about this construct, or you'd have to use something different. Triple quoted strings are probably not so good because you'd have to still backslash the trailing newlines. You can't split the strings up into sentence fragments because that makes some translations impossible. Someone ease my worries here. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjjpOHEjvBPtnXfVAQJ/xwP7BNMGvrmuxKmb7QiIawYjORKt9Pxmz7XJ kFVHl47UusOGzgmtwm6Qi2DeSDsG0JOu0XwlZbX3YPE8omTzTP8WLdavJ1e+i2nP V8GwXVyFgyFHx3V1jb0o9eiUGFEwkXInCGcOFqdWOEF49TtRNHGY6ne+eumwkqxK qOyTGkcreG4= =J6I/ -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 2, 2007, at 3:23 PM, Alexey Borzenkov wrote:
On 4/30/07, Jim Jewett <jimjjewett@gmail.com> wrote:
Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated.
+ Implicit String concatenation
+ Line continuation with "\"
I don't know if I can vote, but if I could I'd be -1 on this. Can't say I'm using continuation often, but there's one case when I'm using it and I'd like to continue using it:
#!/usr/bin/env python """\ Usage: some-tool.py [arguments...]
Does this and that based on its arguments"""
if condition: print __doc__ sys.exit(1)
This way usage immediately stands out much better, without any unnecessary new lines.
Me too, all the time. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjjpcnEjvBPtnXfVAQL0ngP9FwE7swQSdPiH4wAMQRe1CAzWXBLCXKok d08GHhyp5GWHs1UzDZbnxnLRVZt+ra/3iSJT8g32X2gX9gWkFUJfqZFN9wLVjzDZ qlX4m2cJs4nlskRDsycPMY9MLGUwQ8bt7mn92Oh3vXAvtXm42Dxu66NvTlyYdIFQ 9M2HrMbBn1M= =3kNg -----END PGP SIGNATURE-----
Trent> But if you don't want the EOLs? Example from some code of mine: Trent> raise MakeError("extracting '%s' in '%s' did not create the " Trent> "directory that the Python build will expect: " Trent> "'%s'" % (src_pkg, dst_dir, dst)) Trent> I use this kind of thing frequently. Don't know if others Trent> consider it bad style. I use it all the time. For example, to build up (what I consider to be) readable SQL queries: rows = self.executesql("select cities.city, state, country" " from cities, venues, events, addresses" " where cities.city like %s" " and events.active = 1" " and venues.address = addresses.id" " and addresses.city = cities.id" " and events.venue = venues.id", (city,)) I would be disappointed it string literal concatention went away. Skip
Please add my -1 to the chorus here, for the same reasons already expressed. Cheers, Mark
-----Original Message----- From: python-dev-bounces+mhammond=keypoint.com.au@python.org [mailto:python-dev-bounces+mhammond=keypoint.com.au@python.org ]On Behalf Of Jim Jewett Sent: Monday, 30 April 2007 1:29 PM To: Python 3000; Python Dev Subject: [Python-Dev] PEP 30XZ: Simplified Parsing
PEP: 30xz Title: Simplified Parsing Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett <JimJJewett@gmail.com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007
Abstract
Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated.
+ Implicit String concatenation
+ Line continuation with "\"
+ 034 as an octal number (== decimal 28). Note that this is listed only for completeness; the decision to raise an Exception for leading zeros has already been made in the context of PEP XXX, about adding a binary literal.
Rationale for Removing Implicit String Concatenation
Implicit String concatentation can lead to confusing, or even silent, errors. [1]
def f(arg1, arg2=None): pass
f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None)
or, using the scons build framework,
sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c']
It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is.
Note that in C, the implicit concatenation is more justified; there is no other way to join strings without (at least) a function call.
In Python, strings are objects which support the __add__ operator; it is possible to write:
"abc" + "def"
Because these are literals, this addition can still be optimized away by the compiler.
Guido indicated [2] that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. The resolution is to treat them the same as today.
("abc %s def" + "ghi" % var) # fails like today. # raises TypeError because of # precedence. (% before +)
("abc" + "def %s ghi" % var) # works like today; precedence makes # the optimization more difficult to # recognize, but does not change the # semantics.
("abc %s def" + "ghi") % var # works like today, because of # precedence: () before % # CPython compiler can already # add the literals at compile-time.
Rationale for Removing Explicit Line Continuation
A terminal "\" indicates that the logical line is continued on the following physical line (after whitespace).
Note that a non-terminal "\" does not have this meaning, even if the only additional characters are invisible whitespace. (Python depends heavily on *visible* whitespace at the beginning of a line; it does not otherwise depend on *invisible* terminal whitespace.) Adding whitespace after a "\" will typically cause a syntax error rather than a silent bug, but it still isn't desirable.
The reason to keep "\" is that occasionally code looks better with a "\" than with a () pair.
assert True, ( "This Paren is goofy")
But realistically, that paren is no worse than a "\". The only advantage of "\" is that it is slightly more familiar to users of C-based languages. These same languages all also support line continuation with (), so reading code will not be a problem, and there will be one less rule to learn for people entirely new to programming.
Rationale for Removing Implicit Octal Literals
This decision should be covered by PEP ???, on numeric literals. It is mentioned here only for completeness.
C treats integers beginning with "0" as octal, rather than decimal. Historically, Python has inherited this usage. This has caused quite a few annoying bugs for people who forgot the rule, and tried to line up their constants.
a = 123 b = 024 # really only 20, because octal c = 245
In Python 3.0, the second line will instead raise a SyntaxError, because of the ambiguity. Instead, the line should be written as in one of the following ways:
b = 24 # PEP 8 b = 24 # columns line up, for quick scanning b = 0t24 # really did want an Octal!
References
[1] Implicit String Concatenation, Jewett, Orendorff
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html [2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw http://www.python.org/peps/pep-0012 [3] http://www.opencontent.org/openpub/ Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mhammond%40keypoint.com.au
I fully support the removal of implicit string concatenation (explicit is better than implicit; there's only one way to do it). I also fully support the removal of backslashes for line continuation of statements (same reasons). (I mean this as distinct from line continuation within a string; that's a separate issue.) -- ?!ng
FWIW, I'm -1 on both proposals too. I like implicit string literal concatenation and I really can't see what we gain from backslash continuation removal. Georg Mark Hammond schrieb:
Please add my -1 to the chorus here, for the same reasons already expressed.
Cheers,
Mark
-----Original Message----- From: python-dev-bounces+mhammond=keypoint.com.au@python.org [mailto:python-dev-bounces+mhammond=keypoint.com.au@python.org ]On Behalf Of Jim Jewett Sent: Monday, 30 April 2007 1:29 PM To: Python 3000; Python Dev Subject: [Python-Dev] PEP 30XZ: Simplified Parsing
PEP: 30xz Title: Simplified Parsing Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett <JimJJewett@gmail.com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007
Abstract
Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated.
+ Implicit String concatenation
+ Line continuation with "\"
+ 034 as an octal number (== decimal 28). Note that this is listed only for completeness; the decision to raise an Exception for leading zeros has already been made in the context of PEP XXX, about adding a binary literal.
Rationale for Removing Implicit String Concatenation
Implicit String concatentation can lead to confusing, or even silent, errors. [1]
def f(arg1, arg2=None): pass
f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None)
or, using the scons build framework,
sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c']
It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is.
Note that in C, the implicit concatenation is more justified; there is no other way to join strings without (at least) a function call.
In Python, strings are objects which support the __add__ operator; it is possible to write:
"abc" + "def"
Because these are literals, this addition can still be optimized away by the compiler.
Guido indicated [2] that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. The resolution is to treat them the same as today.
("abc %s def" + "ghi" % var) # fails like today. # raises TypeError because of # precedence. (% before +)
("abc" + "def %s ghi" % var) # works like today; precedence makes # the optimization more difficult to # recognize, but does not change the # semantics.
("abc %s def" + "ghi") % var # works like today, because of # precedence: () before % # CPython compiler can already # add the literals at compile-time.
Rationale for Removing Explicit Line Continuation
A terminal "\" indicates that the logical line is continued on the following physical line (after whitespace).
Note that a non-terminal "\" does not have this meaning, even if the only additional characters are invisible whitespace. (Python depends heavily on *visible* whitespace at the beginning of a line; it does not otherwise depend on *invisible* terminal whitespace.) Adding whitespace after a "\" will typically cause a syntax error rather than a silent bug, but it still isn't desirable.
The reason to keep "\" is that occasionally code looks better with a "\" than with a () pair.
assert True, ( "This Paren is goofy")
But realistically, that paren is no worse than a "\". The only advantage of "\" is that it is slightly more familiar to users of C-based languages. These same languages all also support line continuation with (), so reading code will not be a problem, and there will be one less rule to learn for people entirely new to programming.
Rationale for Removing Implicit Octal Literals
This decision should be covered by PEP ???, on numeric literals. It is mentioned here only for completeness.
C treats integers beginning with "0" as octal, rather than decimal. Historically, Python has inherited this usage. This has caused quite a few annoying bugs for people who forgot the rule, and tried to line up their constants.
a = 123 b = 024 # really only 20, because octal c = 245
In Python 3.0, the second line will instead raise a SyntaxError, because of the ambiguity. Instead, the line should be written as in one of the following ways:
b = 24 # PEP 8 b = 24 # columns line up, for quick scanning b = 0t24 # really did want an Octal!
References
[1] Implicit String Concatenation, Jewett, Orendorff
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
[2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw http://www.python.org/peps/pep-0012
[3] http://www.opencontent.org/openpub/
Copyright
This document has been placed in the public domain.
Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mhammond%40keypoint.com.au
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gman...
-- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
Georg Brandl wrote:
FWIW, I'm -1 on both proposals too. I like implicit string literal concatenation and I really can't see what we gain from backslash continuation removal.
Georg
-1 on removing them also. I find they are helpful. It could be made optional in block headers that end with a ':'. It's optional, (just more white space), in parenthesized expressions, tuples, lists, and dictionary literals already.
[1,\ ... 2,\ ... 3] [1, 2, 3]
(1,\ ... 2,\ ... 3) (1, 2, 3)
{1:'a',\ ... 2:'b',\ ... 3:'c'} {1: 'a', 2: 'b', 3: 'c'}
The rule would be any keyword that starts a block, (class, def, if, elif, with, ... etc.), until an unused (for anything else) colon, would always evaluate to be a single line weather or not it has parentheses or line continuations in it. These can never be multi-line statements as far as I know. The back slash would still be needed in console input. The following inconsistency still bothers me, but I suppose it's an edge case that doesn't cause problems.
print r"hello world\" File "<stdin>", line 1 print r"hello world\" ^ SyntaxError: EOL while scanning single-quoted string print r"hello\ ... world" hello\ world
In the first case, it's treated as a continuation character even though it's not at the end of a physical line. So it gives an error. In the second case, its accepted as a continuation character, *and* a '\' character at the same time. (?) Cheers, Ron
Ron Adam wrote:
The following inconsistency still bothers me, but I suppose it's an edge case that doesn't cause problems.
print r"hello world\" File "<stdin>", line 1 print r"hello world\" ^ SyntaxError: EOL while scanning single-quoted string
In the first case, it's treated as a continuation character even though it's not at the end of a physical line. So it gives an error.
No, that is unrelated to line continuation. The \" is an escape sequence, therefore there is no double-quote to end the string literal. -- Benji York http://benjiyork.com
Benji York wrote:
Ron Adam wrote:
The following inconsistency still bothers me, but I suppose it's an edge case that doesn't cause problems.
print r"hello world\" File "<stdin>", line 1 print r"hello world\" ^ SyntaxError: EOL while scanning single-quoted string
In the first case, it's treated as a continuation character even though it's not at the end of a physical line. So it gives an error.
No, that is unrelated to line continuation. The \" is an escape sequence, therefore there is no double-quote to end the string literal.
Are you sure?
print r'\"' \"
It's just a '\' here. These are raw strings if you didn't notice. Cheers, Ron
Ron Adam schrieb:
Benji York wrote:
Ron Adam wrote:
The following inconsistency still bothers me, but I suppose it's an edge case that doesn't cause problems.
print r"hello world\" File "<stdin>", line 1 print r"hello world\" ^ SyntaxError: EOL while scanning single-quoted string
In the first case, it's treated as a continuation character even though it's not at the end of a physical line. So it gives an error.
No, that is unrelated to line continuation. The \" is an escape sequence, therefore there is no double-quote to end the string literal.
Are you sure?
print r'\"' \"
It's just a '\' here.
These are raw strings if you didn't notice.
It's all in the implementation. The tokenizer takes it as an escape sequence -- it doesn't specialcase raw strings -- the AST builder (parsestr() in ast.c) doesn't. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
Barry Warsaw writes:
The problem is that
_("some string" " and more of it")
is not the same as
_("some string" + " and more of it")
Are you worried about translators? The gettext functions themselves will just see the result of the operation. The extraction tools like xgettext do fail, however. Translating the above to # The problem is that gettext("some string" " and more of it") # is not the same as gettext("some string" + " and more of it") and invoking "xgettext --force-po --language=Python test.py" gives # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2007-05-03 23:32+0900\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" #: test.py:3 msgid "some string and more of it" msgstr "" #: test.py:8 msgid "some string" msgstr "" BTW, it doesn't work for the C equivalent, either.
You would either have to teach pygettext and maybe gettext about this construct, or you'd have to use something different.
Teaching Python-based extraction tools about it isn't hard, just make sure that you slurp in the whole argument, and eval it. If what you get isn't a string, throw an exception. xgettext will be harder, since apparently does not do it, nor does it even know enough to error or warn on syntax it doesn't handle within gettext()'s argument.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 3, 2007, at 10:40 AM, Stephen J. Turnbull wrote:
Barry Warsaw writes:
The problem is that
_("some string" " and more of it")
is not the same as
_("some string" + " and more of it")
Are you worried about translators? The gettext functions themselves will just see the result of the operation. The extraction tools like xgettext do fail, however.
Yep, sorry, it is the extraction tools I'm worried about.
Teaching Python-based extraction tools about it isn't hard, just make sure that you slurp in the whole argument, and eval it. If what you get isn't a string, throw an exception. xgettext will be harder, since apparently does not do it, nor does it even know enough to error or warn on syntax it doesn't handle within gettext()'s argument.
IMO, this is a problem. We can make the Python extraction tool work, but we should still be very careful about breaking 3rd party tools like xgettext, since other projects may be using such tools. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjoBI3EjvBPtnXfVAQLg0AP/Y1ncqie1NgzRFzuZpnZapMs/+oo+5BCK 1MYqsJwucnDJnOqrUcU34Vq3SB7X7VsSDv3TuoTNnheinX6senorIFQKRAj4abKT f2Y63t6BT97mSOAITFZvVSj0YSG+zkD/HMGeDj4dOJFLj1tYxgKpVprlhMbELzG1 AIKe+wsYjcs= =+oFV -----END PGP SIGNATURE-----
On Thursday 03 May 2007 15:40, Stephen J. Turnbull wrote:
Teaching Python-based extraction tools about it isn't hard, just make sure that you slurp in the whole argument, and eval it.
We generate our component documentation based on going through the AST generated by compiler.ast, finding doc strings (and other strings in other known/expected locations), and then formatting using docutils. Eval'ing the file isn't always going to work due to imports relying on libraries that may need to be installed. (This is especially the case with Kamaelia because we tend to wrap libraries for usage as components in a convenient way) We've also specifically moved away from importing the file or eval'ing things because of this issue. It makes it easier to have docs built on a random machine with not too much installed on it. You could special case "12345" + "67890" as a compile timeconstructor and jiggle things such that by the time it came out the parser that looked like "1234567890", but I don't see what that has to gain over the current form. (which doesn't look like an expression) I also think that's a rather nasty version. On the flip side if we're eval'ing an expression to get a docstring, there would be great temptation to extend that to be a doc-object - eg using dictionaries, etc as well for more specific docs. Is that wise? I don't know :) Michael. -- Kamaelia project lead http://kamaelia.sourceforge.net/Home
Barry Warsaw writes:
IMO, this is a problem. We can make the Python extraction tool work, but we should still be very careful about breaking 3rd party tools like xgettext, since other projects may be using such tools.
But _("some string" + " and more of it") is already legal Python, and xgettext is already broken for it. Arguably, xgettext's implementation of -L Python should be execve ("pygettext", argv, environ); <wink>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 3, 2007, at 12:41 PM, Stephen J. Turnbull wrote:
Barry Warsaw writes:
IMO, this is a problem. We can make the Python extraction tool work, but we should still be very careful about breaking 3rd party tools like xgettext, since other projects may be using such tools.
But
_("some string" + " and more of it")
is already legal Python, and xgettext is already broken for it.
Yep, but the idiom that *gettext accepts is used far more often. If that's outlawed then the tools /have/ to be taught the alternative.
Arguably, xgettext's implementation of -L Python should be
execve ("pygettext", argv, environ);
<wink>
Ouch. :) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjohUXEjvBPtnXfVAQLHhAQAmKNyjbPpIMIlz7zObvb09wdw7jyC2bBa 2w+rDilRgxicUXWqH/L6AeHHl3HiVOO+tELU6upTxOWBMlJG8xcY70rde/32I0gb Wm0ylLlvDU/bAlSMyUscs77BVt82UQsBEqXyQ2+PRfQj7aOkpqgT8P3dwCYrtPaH L4W4JzvoK1M= =9pgu -----END PGP SIGNATURE-----
Michael Sparks writes:
We generate our component documentation based on going through the AST generated by compiler.ast, finding doc strings (and other strings in other known/expected locations), and then formatting using docutils.
Are you talking about I18N and gettext? If so, I'm really lost ....
You could special case "12345" + "67890" as a compile timeconstructor and jiggle things such that by the time it came out the parser that looked like "1234567890", but I don't see what that has to gain over the current form.
I'm not arguing it's a gain, simply that it's a case that *should* be handled by extractors of translatable strings anyway, and if it were, there would not be an I18N issue in this PEP. It *should* be handled because this is just constant folding. Any half-witted compiler does it, and programmers expect their compilers to do it. pygettext and xgettext are (very special) compilers. I don't see why that expectation should be violated just because the constants in question are translatable strings. I recognize that for xgettext implementing that in C for languages as disparate as Lisp, Python, and Perl (all of which have string concatenation operators) is hard, and to the extent that xgettext is recommended by 9 out of 10 translators, we need to worry about how long it's going to take for xgettext to get fixed (because it *is* broken in this respect, at least for Python).
On 5/3/07, Georg Brandl <g.brandl@gmx.net> wrote:
These are raw strings if you didn't notice.
It's all in the implementation. The tokenizer takes it as an escape sequence -- it doesn't specialcase raw strings -- the AST builder (parsestr() in ast.c) doesn't.
FWIW, it wasn't designed this way so as to be easy to implement. It was designed this way because the overwhelming use case is regular expressions, where one needs to be able to escape single and double quotes -- the re module unescapes \" and \' when it encounters them. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Michael Foord wrote:
Jim Jewett wrote:
PEP: 30xz Title: Simplified Parsing Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett <JimJJewett@gmail.com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007
Abstract
Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated.
+ Implicit String concatenation
+ Line continuation with "\"
+ 034 as an octal number (== decimal 28). Note that this is listed only for completeness; the decision to raise an Exception for leading zeros has already been made in the context of PEP XXX, about adding a binary literal.
Rationale for Removing Implicit String Concatenation
Implicit String concatentation can lead to confusing, or even silent, errors. [1]
def f(arg1, arg2=None): pass
f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None)
Implicit string concatenation is massively useful for creating long strings in a readable way though:
call_something("first part\n" "second line\n" "third line\n")
I find it an elegant way of building strings and would be sad to see it go. Adding trailing '+' signs is ugly.
Currently at least possible, though doubtless some people won't like the left-hand alignment, is call_something("""\ first part second part third part """) Alas if the proposal to remove the continuation backslash goes through this may not remain available to us. I realise that the arrival of Py3 means all these are up for grabs, but don't think any of them are really warty enough to require removal. I take the point that octal constants are counter-intuitive and wouldn't be too disappointed by their removal. I still think Icon had the right answer there in allowing an explicit decimal radix in constants, so 16 as a binary constant would be 10000r2, or 10r16. IIRC it still allowed 0x10 as well (though Tim may shoot me down there). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden ------------------ Asciimercial --------------------- Get on the web: Blog, lens and tag your way to fame!! holdenweb.blogspot.com squidoo.com/pythonology tagged items: del.icio.us/steve.holden/python All these services currently offer free registration! -------------- Thank You for Reading ----------------
skip@pobox.com wrote:
Trent> But if you don't want the EOLs? Example from some code of mine:
Trent> raise MakeError("extracting '%s' in '%s' did not create the " Trent> "directory that the Python build will expect: " Trent> "'%s'" % (src_pkg, dst_dir, dst))
Trent> I use this kind of thing frequently. Don't know if others Trent> consider it bad style.
I use it all the time. For example, to build up (what I consider to be) readable SQL queries:
rows = self.executesql("select cities.city, state, country" " from cities, venues, events, addresses" " where cities.city like %s" " and events.active = 1" " and venues.address = addresses.id" " and addresses.city = cities.id" " and events.venue = venues.id", (city,))
I would be disappointed it string literal concatention went away.
Tripe-quoted strings are much easier here, and SQL is insensitive to the newlines and additional spaces. Why not just use rows = self.executesql("""select cities.city, state, country from cities, venues, events, addresses where cities.city like %s and events.active = 1 and venues.address = addresses.id and addresses.city = cities.id and events.venue = venues.id""", (city,)) It also gives you better error messages from most database back-ends. I realise it makes the constants slightly longer, but if that's an issue I'd have thought people would want to indent code with tabs and not spaces. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden ------------------ Asciimercial --------------------- Get on the web: Blog, lens and tag your way to fame!! holdenweb.blogspot.com squidoo.com/pythonology tagged items: del.icio.us/steve.holden/python All these services currently offer free registration! -------------- Thank You for Reading ----------------
Steven Bethard a écrit :
On 5/2/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Implicit string concatenation is massively useful for creating long strings in a readable way though:
call_something("first part\n" "second line\n" "third line\n")
I find it an elegant way of building strings and would be sad to see it go. Adding trailing '+' signs is ugly.
You'll still have textwrap.dedent::
call_something(dedent('''\ first part second line third line '''))
And using textwrap.dedent, you don't have to remember to add the \n at the end of every line.
STeVe
maybe we could have a "dedent" literal that would remove the first newline and all indentation so that you can just write: call_something( d''' first part second line third line ''' ) Cheers Baptiste
On 5/4/07, Baptiste Carvello <baptiste13@altern.org> wrote:
maybe we could have a "dedent" literal that would remove the first newline and all indentation so that you can just write:
call_something( d''' first part second line third line ''' )
Surely from textwrap import dedent as d is close enough? -Mike
Mike Klaas <mike.klaas@gmail.com> wrote:
On 5/4/07, Baptiste Carvello <baptiste13@altern.org> wrote:
maybe we could have a "dedent" literal that would remove the first newline and all indentation so that you can just write:
call_something( d''' first part second line third line ''' )
Surely
from textwrap import dedent as d
is close enough?
Apart from it happening at run time rather than compile time. -- Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick
>> Surely >> >> from textwrap import dedent as d >> >> is close enough? Nick> Apart from it happening at run time rather than compile time. And as someone else pointed out, what if you don't want each chunk of text terminated by a newline? Skip
Mark Hammond wrote:
Please add my -1 to the chorus here, for the same reasons already expressed.
Another -1 here - while I agree there are benefits to removing backslash continuations and string literal concatenation, I don't think they're significant enough to justify the hassle of making it happen. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
participants (26)
-
A.M. Kuchling
-
Alexey Borzenkov
-
Baptiste Carvello
-
Barry Warsaw
-
Benji York
-
Brian Harring
-
Eric V. Smith
-
Fred L. Drake, Jr.
-
Georg Brandl
-
Guido van Rossum
-
Jim Jewett
-
Ka-Ping Yee
-
Mark Hammond
-
Michael Foord
-
Michael Sparks
-
Mike Klaas
-
Nick Coghlan
-
nick@craig-wood.com
-
Phillip J. Eby
-
Ron Adam
-
skip@pobox.com
-
Stephen J. Turnbull
-
Stephen J. Turnbull
-
Steve Holden
-
Steven Bethard
-
Trent Mick