String literal concatenation & docstrings
A c.l.p question about docstring formatting got me curious about something. http://www.python.org/doc/2.3.4/ref/string-catenation.html states that: Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld". This isn't quite true, since the following doesn't work: def some_func(): """Doc string line 1 (the only line, surprisingly)\n""" """Doc string line 2, except it isn't.""" It seems like an odd quirk that the compile-time concatenation of string literals doesn't work for docstrings. I had a bit of trawl through the docs and the archive with Google, but couldn't find anything that stated whether this behaviour was deliberate or accidental. So, can anyone satisfy my idle curiousity as to whether this was a deliberate design choice, or an accident of the implementation? Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net
Nick Coghlan
A c.l.p question about docstring formatting got me curious about something.
http://www.python.org/doc/2.3.4/ref/string-catenation.html states that: Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld".
This isn't quite true, since the following doesn't work:
def some_func(): """Doc string line 1 (the only line, surprisingly)\n""" """Doc string line 2, except it isn't."""
I haven't actually checked or anything rash like that, but I'd imagine the answer is something like: The two strings are separate statements as far as the parser is concerned, and the "concatenating adjacent strings" thing only happens within an expression. You can do this:
"con"\ ... "cat" 'concat'
So, can anyone satisfy my idle curiousity as to whether this was a deliberate design choice, or an accident of the implementation?
Well, it surprises me not at all. Cheers, mwh -- ARTHUR: Why should he want to know where his towel is? FORD: Everybody should know where his towel is. ARTHUR: I think your head's come undone. -- The Hitch-Hikers Guide to the Galaxy, Episode 7
Michael Hudson wrote:
Nick Coghlan
writes: I haven't actually checked or anything rash like that, but I'd imagine the answer is something like: The two strings are separate statements as far as the parser is concerned, and the "concatenating adjacent strings" thing only happens within an expression.
That would certainly be a sensible explanation. The only time I've ever actually made use of the feature is when assigning a long string, and even then only rarely (I'm more likely to use triple quotes and left align the whole thing)
You can do this:
"con"\
... "cat" 'concat'
Which actually does work for combining multiple strings into a single docstring.
So, can anyone satisfy my idle curiousity as to whether this was a deliberate design choice, or an accident of the implementation?
Well, it surprises me not at all.
I think the key distinction I'd missed was that in the doc string case, the two strings were actually separate statements. Once that distinction is noted, the behaviour is, as you say, unsurprising. It also makes it obvious why escaping the newline has the effect it does. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net
Nick Coghlan wrote:
Michael Hudson wrote:
Nick Coghlan
writes: I haven't actually checked or anything rash like that, but I'd imagine the answer is something like: The two strings are separate statements as far as the parser is concerned, and the "concatenating adjacent strings" thing only happens within an expression.
That would certainly be a sensible explanation. The only time I've ever actually made use of the feature is when assigning a long string, and even then only rarely (I'm more likely to use triple quotes and left align the whole thing)
And the sensible explanation is correct. Just checked out the compiler and the string concatenation (in parsestrplus()) takes a node and then proceeds to concatenate all of its children that are strings right in a row starting at child 0. With statements this won't trigger anything since the statements will only have the string as their child, unlike an expression, which will just have all the string pieces. [SNIP]
I think the key distinction I'd missed was that in the doc string case, the two strings were actually separate statements. Once that distinction is noted, the behaviour is, as you say, unsurprising. It also makes it obvious why escaping the newline has the effect it does.
Should probably change the wording on that unless people actually want the literal string concatenation to work with statements (docstrings seem like the only place that would be reasonable) unless you want to start allowing print statements to have a string part span multiple lines. =) -Brett
On Fri, 26 Nov 2004 11:56:05 -0800, Brett C.
Should probably change the wording on that unless people actually want the literal string concatenation to work with statements (docstrings seem like the only place that would be reasonable) unless you want to start allowing print statements to have a string part span multiple lines. =)
It means that: print "this line continues" "on the next line" does not work, while the following works: a = "this line continues" "on the next line" Kind of weird, but anyway, that's not a common idiom. One more reason to use triple-quoted-strings when printing long strings. -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com
Like anything, if you need to wrap a statement around multiple lines, you surround it in ()'s Now the question is why does:
def foo(): ... ("""blah""" ... """fejlfe""") ... pass ... help(foo)
Not show that as the doc string. Just because it has () doesn't mean it evaluates to anything other than a string as far as I know. Carlos Ribeiro wrote:
On Fri, 26 Nov 2004 11:56:05 -0800, Brett C.
wrote: Should probably change the wording on that unless people actually want the literal string concatenation to work with statements (docstrings seem like the only place that would be reasonable) unless you want to start allowing print statements to have a string part span multiple lines. =)
It means that:
print "this line continues" "on the next line"
does not work, while the following works:
a = "this line continues" "on the next line"
Kind of weird, but anyway, that's not a common idiom. One more reason to use triple-quoted-strings when printing long strings.
[Carlos Ribeiro]
while the following works:
a = "this line continues" "on the next line"
Are you sure about that? Doesn't work on my machine: $ cat x.py a = "this line continues " "on the next line" $ python x.py File "x.py", line 2 "on the next line" ^ SyntaxError: invalid syntax If you add a trailing backslash, it does work: $ cat x2.py a = "this line continues " \ "on the next line" print a $ python x2.py this line continues on the next line
Kind of weird
Not weird at all ;-) -- David Goodger http://python.net/~goodger
It means that:
print "this line continues" "on the next line"
does not work, while the following works:
a = "this line continues" "on the next line"
As has been pointed out already, it doesn't. The right way to look at this is not using the statement/expression distinction, but to look at whether the newline between the two literals is significant to the parser or not. A significant newline ends a statement; an insignificant one is equivalent to a space. The rule is that newlines (outside string quotes anyway) are significant unless either escaped with \, or contained within matching parentheses. So if you want to print or assign a long string broken across multiple lines without having to use triple-quoted strings (which have their own set of issues), you can write this: print ("this line continues" " on the next line") Note that I fixed a common buglet in your example: the words "continues" and "on" should be separated by a space. Not important for the example, of course, but (having used this idiom and similar ones a lot) something to keep in mind when splitting a long string across a line this way. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
It means that:
print "this line continues" "on the next line"
does not work, while the following works:
a = "this line continues" "on the next line"
As has been pointed out already, it doesn't.
The right way to look at this is not using the statement/expression distinction, but to look at whether the newline between the two literals is significant to the parser or not. A significant newline ends a statement; an insignificant one is equivalent to a space. The rule is that newlines (outside string quotes anyway) are significant unless either escaped with \, or contained within matching parentheses.
So how is this for new wording? "Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation as long as the newline separating them is not signifcant to the parser."
So how is this for new wording?
"Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation as long as the newline separating them is not signifcant to the parser."
I'm not sure it needs clarifying; it's the reference manual after all, not a tutorial. I'd rather let the grammar speak for itself; there's no ambiguity in that, and the words are just there to clarify the *semantics*. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
"Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation as long as the newline separating them is not signifcant to the parser."
I'm not sure it needs clarifying; it's the reference manual after all, not a tutorial.
Right. Over-clarification results in docs that read like the instructions for the holy hand grenade ;-) Raymond
Guido van Rossum wrote:
I'm not sure it needs clarifying; it's the reference manual after all, not a tutorial.
I'd rather let the grammar speak for itself; there's no ambiguity in that, and the words are just there to clarify the *semantics*.
Also, I'll freely admit that I made the mistake of reading the "string literal concatenation" bit out of context. Reading the preceding section on string literals in general first (which includes the relevant snippet from the grammar), probably would have made it clearer what was going on. http://www.python.org/dev/doc/devel/ref/strings.html Cheers, Nick. I guess there was a reason K&R used ';' after all. . . -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net
Right. Over-clarification results in docs that read like the instructions for the holy hand grenade ;-)
Well said.
Except that now I can't find the adjacent string literals in the grammar any more! I'm looking al http://www.python.org/dev/doc/devel/ref/grammar.txt The path goes from primary to atom to literal to stringliteral (and from there on into lexical detail) and nowhere does the grammar show that multiple string literals are allowed. Adding a single + after stringliteral in the expansion for literal would fix this. Once that is fixed, we could probably reduce the text of the offending section somewhat to use the phrase "where allowed by the grammar" and skip the mentioning of different quoting conventions or intervening whitespace. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Right. Over-clarification results in docs that read like the instructions for the holy hand grenade ;-)
Well said.
Except that now I can't find the adjacent string literals in the grammar any more!
I'm looking al http://www.python.org/dev/doc/devel/ref/grammar.txt
The path goes from primary to atom to literal to stringliteral (and from there on into lexical detail) and nowhere does the grammar show that multiple string literals are allowed. Adding a single + after stringliteral in the expansion for literal would fix this [SNIP]
But if you look at Grammar/Grammar you will notice that atom goes to STRING+ which should cover this. Is that grammar.txt file generated from Grammar/Grammar or is it done by hand? -Brett
But if you look at Grammar/Grammar you will notice that atom goes to STRING+ which should cover this.
Of course, otherwise it wouldn't work! :-)
Is that grammar.txt file generated from Grammar/Grammar or is it done by hand?
By hand. The reference manual has more detail (Grammar/Grammar doesn't say anything about what a STRING is, or other literals) and presents some rules in a more readable form which wouldn't work for Grammar/Grammar given the LL1 constraints of pgen. It also chooses more readable names. But the translation process is fallible. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (8)
-
Brett C.
-
Carlos Ribeiro
-
David Goodger
-
Guido van Rossum
-
Michael Hudson
-
Nick Coghlan
-
ort
-
Raymond Hettinger