Multi-line strings that respect indentation

On several occasions I have run into code that will do something like the following with a multiline string: def some_func():
</xml>
""" % (x, y)
return val
To me, this is rather ugly because it messes up the indentation of some_func(). Suppose we could have a multiline string, that when started on a line indented four spaces, ignores the first four spaces on each line of the literal when creating the actual string? In this example, I will use four quotes to start such a string. I think the syntax for this could vary though. It would be something like this: def some_func():
That way, the indentation in the function would be preserved, making everything easy to scan, and the indentation in the output would not suffer. What do you all think?

Daniel da Silva writes:
We do. from textwrap import dedent def some_func(): x, y = process_something() val = dedent("""\ <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> """) % (x, y) return val I don't think the function call is ugly enough to fix with syntax.

On Fri, 05 Nov 2010 10:10:14 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
It's just reversing the point of view. The right thing would be to have Python code correctly aligned on indentation _by default_. And possibly provide an alternative for cases (which?) where one would want indent not taken into account. Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

spir writes:
It's just reversing the point of view. The right thing would be to have Python code correctly aligned on indentation _by default_.
This is plausible but I don't necessarily agree that it's the right thing. If I'm writing a program which writes structured text like XML, which is conventionally indented, I actually prefer print("""\ <xml> A really uninteresting XML document. </xml> """) to print("""\ <xml> A really uninteresting XML document. </xml> """) For one thing, in many realistic cases it's probably actually formatted like this: print("""\ <xml> """) print("""\ A really uninteresting XML document. """) print("""\ </xml> """) and it would be a real PITA trying to get that right.

On Fri, Nov 5, 2010 at 11:10 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I do use the textwrap.dedent workaround myself, but I think it is sufficiently flawed for a proper fix to be worth considering: 1. It doesn't work for docstrings (as Tal pointed out) 2. It postpones until runtime an operation that could fairly easily be carried out at compile time instead Note that a method on str objects fixes none of those problems, so isn't much of a gain from this point of view. As new string prefix would handle the task nicely, though. I personally like "d for dedent", with all d-strings (even single-quoted ones) being implicitly multiline as the colour for that particular bikeshed: def some_func(): x, y = process_something() val = d"\ <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> ") % (x, y) return val I'm no more than +0 on the idea though. It strikes me as an awful lot of effort in implementing, documenting and promoting the idea for something that provides at best a minimal improvement in aesthetics and performance. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 11/5/2010 10:45 AM, Nick Coghlan wrote:
This does: def f(x): "Am I a docstring\n"\ "even though I start in pieces?\n"\ "Oh, x is a dummy param\n" pass print(f.__doc__) # prints 3 lines, but not without '\' escape at line ends
2. It postpones until runtime an operation that could fairly easily be carried out at compile time instead
Above is. Not that I like the extra fluff needed to make it work. It makes a 'd' for dedent literal prefix look inviting. -- Terry Jan Reedy

On Fri, Nov 5, 2010 at 10:37 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Can the parser be changed to do automagic joining on that the same way that it auto joins ("a" "b")? I can't imagine any scenario where anyone would rely on starting the line after a docstring with a string literal, since the only plausible run time effect it might have would be to intern some strings for later. Even "string".method() wouldn't have any effect without something like x = in front of. -- Carl Johnson

"Carl M. Johnson" <cmjohnson.mailinglist@gmail.com> writes:
There's no need to change the parser, when what you suggest works fine in existing Python *and* is easier to maintain without the trailing backslashes: def f(x): ("Am I a docstring\n" "even though I start in pieces?\n" "Oh, x is a dummy param\n") pass print(f.__doc__) But that's still crap. I think the existing situation works too: the existing docstring processors already know to handle docstrings according to <URL:http://www.python.org/dev/peps/pep-0257/#id20>. Introducing more workarounds for multi-line strings, when the existing tools appear to work fine, gets a -0.5 from me. -- \ “Fox News gives you both sides of every story: the President’s | `\ side and the Vice President’s side.” —Steven Colbert, 2006-04-29 | _o__) | Ben Finney

I like the idea of have d""" as the marker, as I've often wanted this for writing out custom HTML or source code. But, would the level of dedent be based on the level of indent of the code line containing the d""" marker? If so, would you want a parsing/syntax error if the string violated the indentation levels of the surrounding code? (the dedent would have to take place post-parsing, since the grammar knows nothing about INDENT tokens inside strings, so it's not truly a "parser" error). The easiest would be to do the equivalent of dedent and take the longest common whitespace length. But, the purist in me would kinda hope that if, for example, the code was indented to column 8 but the text was all at column 12, that the resulting string would have 4 leading spaces on each line. Jared On 5 Nov 2010, at 07:45, Nick Coghlan wrote:

On 05/11/2010 20:43, Jared Grubb wrote:
[snip] You could also add an 'indent' method to strings: text.indent(4) would indent each line by 4 spaces text.indent(-4) would dedent each line by 4 spaces; an exception would be raised if a line started with fewer than 4 spaces, unless that line contained only spaces. text.indent(4, '>') would indent each line by 4 * '>'.

Jared Grubb writes:
-1 You could probably come up with rules to get this right for the parser, but how can it possibly guess exactly what any given user means by def amethod (self): self.bmethod_has_fleece_as_white_as_snow("""\ Mary had a little lamb little lamb little lamb The lamb may be a little small But this poem is just plain lame.""") Is that supposed to be flush left, indented 4 spaces, indented 8 spaces, or indented 12 spaces? I think the dedent rule (longest common whitespace prefix) makes the most sense.

I would never use a quadruple quote for this if it were ever implemented. A better precedent would be a letter prefix to the quotes similar to what we do for raw strings and bytes constants today. m""" perhaps. I'm -0.5 on this. I don't think it is a necessary feature thanks to the textwrap.dedent. But the reason I'm not -1 is that using dedent incurs runtime cost to process the larger string constant to generate the desired one. The best way around this is to declare your large string constants at the module level where you don't need to worry about dedent at all. -gps On Thu, Nov 4, 2010 at 6:10 PM, Stephen J. Turnbull <stephen@xemacs.org>wrote:

Daniel da Silva <ddasilva@umd.edu> writes:
The standard library time machine to the rescue:: import textwrap def some_func(): x, y = process_something() val = textwrap.dedent(""" <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> """) % (x, y) return val I use this technique very often in my code. I'm surprised that it doesn't have a wider share of Python programmers. Spread the knowledge! -- \ “I don't want to live peacefully with difficult realities, and | `\ I see no virtue in savoring excuses for avoiding a search for | _o__) real answers.” —Paul Z. Myers, 2009-09-12 | Ben Finney

Daniel da Silva wrote:
Please no. Three quotes is large enough. Also, four quotes currently is legal: it is a triple-quoted string that begins with a quotation mark. You would be changing that behaviour and likely breaking code. I don't think we need syntax for this, but if we do, I'd prefer to add a prefix similar to the r"" or u"" syntax. Perhaps w"" to normalise whitespace? But as I said, I don't think we need syntax for this. I'd be happy if textwrap.dedent() became a built-in string method. def some_func(): x, y = process_something() val = """ <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> """.dedent() % (x, y) return val -- Steven

spir writes:
(Does dedent() base it's behaviour on first (non-empty) line?)
No, it chooses the longest common whitespace prefix in the whole string. I'm not sure what it does with blank lines. A quick test ... in Python 2.5.5, it ignores them for the purpose of determining the dedent. That is, you can have any amount of whitespace on a blank line, but it doesn't affect the dedent computed.

On Thu, 4 Nov 2010 20:21:37 -0400 Daniel da Silva <ddasilva@umd.edu> wrote:
I'm +++ for this. Even more in python because one has no choice about indentation. But I posted the same proposal some time ago (together with a note that there is no need for """...""" around multiline strings), and it had no apparent success. Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

Daniel da Silva writes:
We do. from textwrap import dedent def some_func(): x, y = process_something() val = dedent("""\ <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> """) % (x, y) return val I don't think the function call is ugly enough to fix with syntax.

On Fri, 05 Nov 2010 10:10:14 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
It's just reversing the point of view. The right thing would be to have Python code correctly aligned on indentation _by default_. And possibly provide an alternative for cases (which?) where one would want indent not taken into account. Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

spir writes:
It's just reversing the point of view. The right thing would be to have Python code correctly aligned on indentation _by default_.
This is plausible but I don't necessarily agree that it's the right thing. If I'm writing a program which writes structured text like XML, which is conventionally indented, I actually prefer print("""\ <xml> A really uninteresting XML document. </xml> """) to print("""\ <xml> A really uninteresting XML document. </xml> """) For one thing, in many realistic cases it's probably actually formatted like this: print("""\ <xml> """) print("""\ A really uninteresting XML document. """) print("""\ </xml> """) and it would be a real PITA trying to get that right.

On Fri, Nov 5, 2010 at 11:10 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I do use the textwrap.dedent workaround myself, but I think it is sufficiently flawed for a proper fix to be worth considering: 1. It doesn't work for docstrings (as Tal pointed out) 2. It postpones until runtime an operation that could fairly easily be carried out at compile time instead Note that a method on str objects fixes none of those problems, so isn't much of a gain from this point of view. As new string prefix would handle the task nicely, though. I personally like "d for dedent", with all d-strings (even single-quoted ones) being implicitly multiline as the colour for that particular bikeshed: def some_func(): x, y = process_something() val = d"\ <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> ") % (x, y) return val I'm no more than +0 on the idea though. It strikes me as an awful lot of effort in implementing, documenting and promoting the idea for something that provides at best a minimal improvement in aesthetics and performance. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 11/5/2010 10:45 AM, Nick Coghlan wrote:
This does: def f(x): "Am I a docstring\n"\ "even though I start in pieces?\n"\ "Oh, x is a dummy param\n" pass print(f.__doc__) # prints 3 lines, but not without '\' escape at line ends
2. It postpones until runtime an operation that could fairly easily be carried out at compile time instead
Above is. Not that I like the extra fluff needed to make it work. It makes a 'd' for dedent literal prefix look inviting. -- Terry Jan Reedy

On Fri, Nov 5, 2010 at 10:37 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Can the parser be changed to do automagic joining on that the same way that it auto joins ("a" "b")? I can't imagine any scenario where anyone would rely on starting the line after a docstring with a string literal, since the only plausible run time effect it might have would be to intern some strings for later. Even "string".method() wouldn't have any effect without something like x = in front of. -- Carl Johnson

"Carl M. Johnson" <cmjohnson.mailinglist@gmail.com> writes:
There's no need to change the parser, when what you suggest works fine in existing Python *and* is easier to maintain without the trailing backslashes: def f(x): ("Am I a docstring\n" "even though I start in pieces?\n" "Oh, x is a dummy param\n") pass print(f.__doc__) But that's still crap. I think the existing situation works too: the existing docstring processors already know to handle docstrings according to <URL:http://www.python.org/dev/peps/pep-0257/#id20>. Introducing more workarounds for multi-line strings, when the existing tools appear to work fine, gets a -0.5 from me. -- \ “Fox News gives you both sides of every story: the President’s | `\ side and the Vice President’s side.” —Steven Colbert, 2006-04-29 | _o__) | Ben Finney

I like the idea of have d""" as the marker, as I've often wanted this for writing out custom HTML or source code. But, would the level of dedent be based on the level of indent of the code line containing the d""" marker? If so, would you want a parsing/syntax error if the string violated the indentation levels of the surrounding code? (the dedent would have to take place post-parsing, since the grammar knows nothing about INDENT tokens inside strings, so it's not truly a "parser" error). The easiest would be to do the equivalent of dedent and take the longest common whitespace length. But, the purist in me would kinda hope that if, for example, the code was indented to column 8 but the text was all at column 12, that the resulting string would have 4 leading spaces on each line. Jared On 5 Nov 2010, at 07:45, Nick Coghlan wrote:

On 05/11/2010 20:43, Jared Grubb wrote:
[snip] You could also add an 'indent' method to strings: text.indent(4) would indent each line by 4 spaces text.indent(-4) would dedent each line by 4 spaces; an exception would be raised if a line started with fewer than 4 spaces, unless that line contained only spaces. text.indent(4, '>') would indent each line by 4 * '>'.

Jared Grubb writes:
-1 You could probably come up with rules to get this right for the parser, but how can it possibly guess exactly what any given user means by def amethod (self): self.bmethod_has_fleece_as_white_as_snow("""\ Mary had a little lamb little lamb little lamb The lamb may be a little small But this poem is just plain lame.""") Is that supposed to be flush left, indented 4 spaces, indented 8 spaces, or indented 12 spaces? I think the dedent rule (longest common whitespace prefix) makes the most sense.

I would never use a quadruple quote for this if it were ever implemented. A better precedent would be a letter prefix to the quotes similar to what we do for raw strings and bytes constants today. m""" perhaps. I'm -0.5 on this. I don't think it is a necessary feature thanks to the textwrap.dedent. But the reason I'm not -1 is that using dedent incurs runtime cost to process the larger string constant to generate the desired one. The best way around this is to declare your large string constants at the module level where you don't need to worry about dedent at all. -gps On Thu, Nov 4, 2010 at 6:10 PM, Stephen J. Turnbull <stephen@xemacs.org>wrote:

Daniel da Silva <ddasilva@umd.edu> writes:
The standard library time machine to the rescue:: import textwrap def some_func(): x, y = process_something() val = textwrap.dedent(""" <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> """) % (x, y) return val I use this technique very often in my code. I'm surprised that it doesn't have a wider share of Python programmers. Spread the knowledge! -- \ “I don't want to live peacefully with difficult realities, and | `\ I see no virtue in savoring excuses for avoiding a search for | _o__) real answers.” —Paul Z. Myers, 2009-09-12 | Ben Finney

Daniel da Silva wrote:
Please no. Three quotes is large enough. Also, four quotes currently is legal: it is a triple-quoted string that begins with a quotation mark. You would be changing that behaviour and likely breaking code. I don't think we need syntax for this, but if we do, I'd prefer to add a prefix similar to the r"" or u"" syntax. Perhaps w"" to normalise whitespace? But as I said, I don't think we need syntax for this. I'd be happy if textwrap.dedent() became a built-in string method. def some_func(): x, y = process_something() val = """ <xml> <myThing> <val>%s</val> <otherVal>%s</otherVal> </myThing> </xml> """.dedent() % (x, y) return val -- Steven

spir writes:
(Does dedent() base it's behaviour on first (non-empty) line?)
No, it chooses the longest common whitespace prefix in the whole string. I'm not sure what it does with blank lines. A quick test ... in Python 2.5.5, it ignores them for the purpose of determining the dedent. That is, you can have any amount of whitespace on a blank line, but it doesn't affect the dedent computed.

On Thu, 4 Nov 2010 20:21:37 -0400 Daniel da Silva <ddasilva@umd.edu> wrote:
I'm +++ for this. Even more in python because one has no choice about indentation. But I posted the same proposal some time ago (together with a note that there is no need for """...""" around multiline strings), and it had no apparent success. Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com
participants (15)
-
Ben Finney
-
Carl M. Johnson
-
Daniel da Silva
-
Eric Smith
-
Gregory P. Smith
-
Guido van Rossum
-
Jared Grubb
-
MRAB
-
Nick Coghlan
-
Raymond Hettinger
-
spir
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tal Einat
-
Terry Reedy