triple-quoted strings and indendation

Hi all, two times in one day I read about the problems of triple-quoted strings and indendation (one time on stackoverflow, one time one this list). Python is well known for its readability and its use of idendation to this end. But with triple-quoted strings, nice indendation is not possible without the need to post-process the resulting string. Problem ======= Most often, the desired result of
some_string = """Hello ... World."""
is simply Hello World instead of Hello World. Idea ===== What about the idea, to use a string-flag to indicate, that the triple-quoted string is to be trimmed. Like:
some_string = t"""Hello ... World."""
This would blend in with the 'u' and 'r' flags that already exist. The triple-quoted string is trimmed to remove all whitespace up to the column where the first line of the string started OR all common whitespace of the subsequent lines, if the subsequent lines start on a column before the first line. The second rule makes it possible to also write:
some_string = t"""Hello ... World."""
Pros ===== The advantages above textwrap.dedent are: 1) textwrap.dedent only removes whitespace common to ALL lines, so to achieve the desired result, one has to add an additional newline
2) Also, it does not work, if one actually does want some common whitespace before all lines:
gives again Hello World which is not, what I wanted. But
some_string = t""" Hello ... World."""
would give Hello World. 3) And finally to quote a post from earlier today "I know about textwrap.dedent, but having to use a Python function call to code a literal has always made me uncomfortable." Problems ========= Common indendation style for triple-quoted string (as far as I know) is
foo = """blubber ... bla""" (align to first quote-char)
but with this auto-trimming, it would look better to use
foo = t"""blubber ... bar""" (align to first char after triple quotes)
The other stlye would still work, though - as long as one does not want to preserve leading whitespace. Maybe the t flag could also cause a leading and trailing newline to be removed, so that
would also result in Hello World. Maybe something like this has been proposed before - please be kind, if it is an old hat. Mat <javascript:void(0);>

On Wed, May 11, 2011 at 06:44:04PM +0200, Matthias Lehmann wrote:
What about the idea, to use a string-flag to indicate, that the triple-quoted string is to be trimmed. Like:
PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002.
Oleg.
Oh, thanks for the link, I was almost sure that something like that was proposed before - sorry I didn't thoroughly search the PEPs beforehand. I still think that indendation of triple-quoted strings is a wart of the language - a small one, but still a wart. But it's been discussed and rejected before - and probably with good reasons. Mat

On Thu, May 12, 2011 at 09:24:53AM +0200, Matthias Lehmann wrote:
PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002.
My opinion is: -- I don't think it's a wart; -- If it's a wart it's quite small; -- It's very easy to fix by calling dedent(); -- Fixing it by changing the language means to change the language for very little gain; changing the language must not be done lightly. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 5/11/2011 12:44 PM, Matthias Lehmann wrote:
Three partial solutions: 1. Strings are constants. Define them at the top of the module, in global scope. I remember seeing this promoted as a good coding practice once -- easy to find, modify, translate. text = '''\ LIne 1 linklnlsf 2 and finally, we are done. ''' I would consider this for strings, at least long strings, displayed to end-users, with mnemonic names. 2. For doc strings, especially for top level classes, do not worry. def whip_up(**args): '''Return some delicious munchies made from inputs. The keyword values should be edible and preferably yummy. Whip_up will do the best it can which what you give it. ''' Having help(whip_up) print Return some delicious munchies made from inputs. The keyword values should be edible and preferably yummy. Whip_up will do the best it can which what you give it. is not a problem. It might even be a virtue. 3. 'It is not necessarily so bad.' I have a test function with several tests that compares an expected string, given as a literal, to actual output captured with StringIO. Since this is a test_main in the file, run with __name__ == '__main__', I do not want to put the strings in the main part of the file (1 above). At first, the following bothered me. expected = '''\ Line 1 Line 2 ''' I like Python's indentation! But it does not bother me so much anymore. IDLE colors the literals green, so they can be semi-ignored. Having the full screen width available can be a plus. If I were using textwrap.dedent much, I might give it a short nickname like 'de' would be visible while I want see it but ignorable when I do not. If one wants a custom dedent rule, like the one you described, write a custom function. -- Terry Jan Reedy

On Wed, May 11, 2011 at 3:40 PM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
Wild idea: make the unary + operator on strings do textwrap.dedent() on them.
Wouldn't the unary - operator make more sense since it's removing spaces? But I would prefer that it use a slightly friendlier form of dedent: def dedent_for_literal(s): if s and s[0] == '\n': s = s[1:] if s and s[-1] == '\n': s = s[:-1] return textwrap.dedent(s) That said, is this such a wart on the language that it's worth changing? --- Bruce

Wild idea: make the unary + operator on strings do textwrap.dedent() on them.
The disadvantage compared to a string flag is, that this unary operator has no knowledge of the current indendation level within the code - so this solution looks similar in code x = +""" foo bar""" vs x = t""" foo bar""" the results is different, though. foo bar vs foo bar

Matthias Lehmann writes:
Oh, I thought you were referring to the indentation within the string (on the first line), not where the string begins. Sorry! But I think there's real trouble here, because there are different styles of indentation, as we've seen. You'd have to enforce one for triple-quoted strings, but that's likely to conflict with many developers' ideas about the matter. That's really not something the parser should be doing ....

On Fri, May 13, 2011 at 9:14 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
If this feature were to be added, we would surely want to ignore the indentation on the first line regardless of the previous line since it shouldn't depend on whether or not I use two or four space indents: fun_func(-""" multiple lines """) # ^^^^ don't want these spaces in my string but unless we force people to follow the convention that you must have a line break after the opening """ we would need to ignore indentation starting with the second line for people who use this style: fun_func(-"""foo bar more""") Now personally, I'd probably follow that first style but if this were a language feature I wouldn't think it should only work for one style. Here's pseudo-code: if s[0] == '\n': # style = first case above strip first character and strip indentation starting with first line else if s[0] == ' ': strip indentation starting with first line # style = """\ else: strip indentation starting with second line # style = second case above --- Bruce

Am 13.05.2011 19:17, schrieb Bruce Leban:
The prototyped code for trimming of triple-quoted string as I proposed were: def trim(start_column, lines): """ start_column: start-column of first line of the triple-quoted string lines: the lines of the string """ n = start_column for line in lines[1:]: m = get_index_of_first_non_whitespace_char(line) n = min(n, m) result = [] if len(lines[0]) > 0: result.append(lines[1]) for line in lines[1:]: result.append(line[n:]) if len(lines[-1]) == 0: result = result[:-1] return '\n'.join(result) The crux is to have the start_column available to the function, everything else could be done just with a function. With this, following indendation styles are possible: func(t"""foo bar more""") func(t"""foo bar more""") func(t""" foo bar more """) All this would be possible with a function, too. The start_column is really only needed to support cases like this: func(t""" keep white space """)

I have an idea of my own concerning multi-line strings. Many of the problems of triple-quoted strings stem from the fact that they're trying to be expressions that sit in-line with the rest of the code. As we've seen with all the attempts to fit multi-line function bodies into lambdas, that doesn't really work. So instead of a multi-line string *expression*, I think we need a *statement*. string adverisement: | Python Egg Incubator! | | Hatch your eggs in half the time. Get yours | today for only $39.99! -- Greg

On 05/11/2011 05:22 PM, Greg Ewing wrote:
If in the above, '|' is used as the start of a line terminated string, it would be a nicer way of typing... string advertisement: " Python Egg Incubator!\n" "\n" " Hatch your eggs in half the time. Get yours\n" " today for only $39.99!\n" I think that would only require a small patch to tokanize.c. It would result in a blank line being added to the end of the paragraph, but maybe that's not so bad. The hard parts are finding the best symbol, '|' is already used, and weather or not to try to handle raw and byte strings would be a concern as well. We don't want to allow quotes to go unterminated as that is usually an error that needs to be caught. Weather or not it's desirable to do this is another thing. ;-) Cheers, Ron

Ron Adam wrote:
No, the idea is that a newline wouldn't be added to the last line. If you wanted that, you would have to add an empty line at the end: string foo: | This line ends with a newline. |
The hard parts are finding the best symbol, '|' is already used,
In a different context, though. There shouldn't be any ambiguity. I'd much rather use '|' than anything else, because it makes such a nice vertical boundary line. -- Greg

On 11 May 2011 17:44, Matthias Lehmann <mat@matlehmann.de> wrote:
As the writer of that comment, I'd like to add a -1 to this proposal :-) My intent was to point out that I'm willing to have indentation oddities rather than use dedent. In my view, the problem isn't important enough to warrant extra syntax. Sorry, :-) Paul.

On 12 May 2011 11:32, Matthias Lehmann <mat@matlehmann.de> wrote:
I didn't mean to misuse your comment - I hope this is not your perception.
Not at all. I understood your message, just wanted to clarify the thinking behind my original statement. Your quote was entirely fair. Paul.

On Wed, May 11, 2011 at 06:44:04PM +0200, Matthias Lehmann wrote:
What about the idea, to use a string-flag to indicate, that the triple-quoted string is to be trimmed. Like:
PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002.
Oleg.
Oh, thanks for the link, I was almost sure that something like that was proposed before - sorry I didn't thoroughly search the PEPs beforehand. I still think that indendation of triple-quoted strings is a wart of the language - a small one, but still a wart. But it's been discussed and rejected before - and probably with good reasons. Mat

On Thu, May 12, 2011 at 09:24:53AM +0200, Matthias Lehmann wrote:
PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002.
My opinion is: -- I don't think it's a wart; -- If it's a wart it's quite small; -- It's very easy to fix by calling dedent(); -- Fixing it by changing the language means to change the language for very little gain; changing the language must not be done lightly. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 5/11/2011 12:44 PM, Matthias Lehmann wrote:
Three partial solutions: 1. Strings are constants. Define them at the top of the module, in global scope. I remember seeing this promoted as a good coding practice once -- easy to find, modify, translate. text = '''\ LIne 1 linklnlsf 2 and finally, we are done. ''' I would consider this for strings, at least long strings, displayed to end-users, with mnemonic names. 2. For doc strings, especially for top level classes, do not worry. def whip_up(**args): '''Return some delicious munchies made from inputs. The keyword values should be edible and preferably yummy. Whip_up will do the best it can which what you give it. ''' Having help(whip_up) print Return some delicious munchies made from inputs. The keyword values should be edible and preferably yummy. Whip_up will do the best it can which what you give it. is not a problem. It might even be a virtue. 3. 'It is not necessarily so bad.' I have a test function with several tests that compares an expected string, given as a literal, to actual output captured with StringIO. Since this is a test_main in the file, run with __name__ == '__main__', I do not want to put the strings in the main part of the file (1 above). At first, the following bothered me. expected = '''\ Line 1 Line 2 ''' I like Python's indentation! But it does not bother me so much anymore. IDLE colors the literals green, so they can be semi-ignored. Having the full screen width available can be a plus. If I were using textwrap.dedent much, I might give it a short nickname like 'de' would be visible while I want see it but ignorable when I do not. If one wants a custom dedent rule, like the one you described, write a custom function. -- Terry Jan Reedy

On Wed, May 11, 2011 at 3:40 PM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
Wild idea: make the unary + operator on strings do textwrap.dedent() on them.
Wouldn't the unary - operator make more sense since it's removing spaces? But I would prefer that it use a slightly friendlier form of dedent: def dedent_for_literal(s): if s and s[0] == '\n': s = s[1:] if s and s[-1] == '\n': s = s[:-1] return textwrap.dedent(s) That said, is this such a wart on the language that it's worth changing? --- Bruce

Wild idea: make the unary + operator on strings do textwrap.dedent() on them.
The disadvantage compared to a string flag is, that this unary operator has no knowledge of the current indendation level within the code - so this solution looks similar in code x = +""" foo bar""" vs x = t""" foo bar""" the results is different, though. foo bar vs foo bar

Matthias Lehmann writes:
Oh, I thought you were referring to the indentation within the string (on the first line), not where the string begins. Sorry! But I think there's real trouble here, because there are different styles of indentation, as we've seen. You'd have to enforce one for triple-quoted strings, but that's likely to conflict with many developers' ideas about the matter. That's really not something the parser should be doing ....

On Fri, May 13, 2011 at 9:14 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
If this feature were to be added, we would surely want to ignore the indentation on the first line regardless of the previous line since it shouldn't depend on whether or not I use two or four space indents: fun_func(-""" multiple lines """) # ^^^^ don't want these spaces in my string but unless we force people to follow the convention that you must have a line break after the opening """ we would need to ignore indentation starting with the second line for people who use this style: fun_func(-"""foo bar more""") Now personally, I'd probably follow that first style but if this were a language feature I wouldn't think it should only work for one style. Here's pseudo-code: if s[0] == '\n': # style = first case above strip first character and strip indentation starting with first line else if s[0] == ' ': strip indentation starting with first line # style = """\ else: strip indentation starting with second line # style = second case above --- Bruce

Am 13.05.2011 19:17, schrieb Bruce Leban:
The prototyped code for trimming of triple-quoted string as I proposed were: def trim(start_column, lines): """ start_column: start-column of first line of the triple-quoted string lines: the lines of the string """ n = start_column for line in lines[1:]: m = get_index_of_first_non_whitespace_char(line) n = min(n, m) result = [] if len(lines[0]) > 0: result.append(lines[1]) for line in lines[1:]: result.append(line[n:]) if len(lines[-1]) == 0: result = result[:-1] return '\n'.join(result) The crux is to have the start_column available to the function, everything else could be done just with a function. With this, following indendation styles are possible: func(t"""foo bar more""") func(t"""foo bar more""") func(t""" foo bar more """) All this would be possible with a function, too. The start_column is really only needed to support cases like this: func(t""" keep white space """)

I have an idea of my own concerning multi-line strings. Many of the problems of triple-quoted strings stem from the fact that they're trying to be expressions that sit in-line with the rest of the code. As we've seen with all the attempts to fit multi-line function bodies into lambdas, that doesn't really work. So instead of a multi-line string *expression*, I think we need a *statement*. string adverisement: | Python Egg Incubator! | | Hatch your eggs in half the time. Get yours | today for only $39.99! -- Greg

On 05/11/2011 05:22 PM, Greg Ewing wrote:
If in the above, '|' is used as the start of a line terminated string, it would be a nicer way of typing... string advertisement: " Python Egg Incubator!\n" "\n" " Hatch your eggs in half the time. Get yours\n" " today for only $39.99!\n" I think that would only require a small patch to tokanize.c. It would result in a blank line being added to the end of the paragraph, but maybe that's not so bad. The hard parts are finding the best symbol, '|' is already used, and weather or not to try to handle raw and byte strings would be a concern as well. We don't want to allow quotes to go unterminated as that is usually an error that needs to be caught. Weather or not it's desirable to do this is another thing. ;-) Cheers, Ron

Ron Adam wrote:
No, the idea is that a newline wouldn't be added to the last line. If you wanted that, you would have to add an empty line at the end: string foo: | This line ends with a newline. |
The hard parts are finding the best symbol, '|' is already used,
In a different context, though. There shouldn't be any ambiguity. I'd much rather use '|' than anything else, because it makes such a nice vertical boundary line. -- Greg

On 11 May 2011 17:44, Matthias Lehmann <mat@matlehmann.de> wrote:
As the writer of that comment, I'd like to add a -1 to this proposal :-) My intent was to point out that I'm willing to have indentation oddities rather than use dedent. In my view, the problem isn't important enough to warrant extra syntax. Sorry, :-) Paul.

On 12 May 2011 11:32, Matthias Lehmann <mat@matlehmann.de> wrote:
I didn't mean to misuse your comment - I hope this is not your perception.
Not at all. I understood your message, just wanted to clarify the thinking behind my original statement. Your quote was entirely fair. Paul.
participants (8)
-
Bruce Leban
-
Greg Ewing
-
Matthias Lehmann
-
Oleg Broytman
-
Paul Moore
-
Ron Adam
-
Stephen J. Turnbull
-
Terry Reedy