multiline string notation
Hello, multiline string By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well: -1- no need for a separate multiline string notation A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote. No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this? -2- trimming of indentation On my computer, calling the following function: def write(): if True: print """To be or not to be, that is the question.""" results in the following output: |To be or not to be, | that is the question. This is certainly not the programmer's intent. To get what is expected, one should write instead: def write(): if True: print """To be or not to be, that is the question.""" ...which distorts the visual presentation of code by breaking correct indentation. To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like: def write(): if True: print "To be or not to be,\n" + \ "that is the question." (Actually, the '+' can be here omitted, but this fact is not commonly known.) My project uses a visual structure à la python (and no curly braces). Indentation is removed by the arser from the significant part of code even inside strings (and also comments). This allows the programmer preserving clean source outline, while having multiline text be simply written as is. In other words, the following routine would work as you guess (':' is assignment sign): write : action if true terminal.write "To be or not to be, that is the question." I imagine the python parser replaces indentation by block-delimiting tokens (analog in role to C braces). My language's parser thus has a preprocessing phase that would transform the above piece of code above to: write : action { if true { terminal.write "To be or not to be, that is the question." } } The preprocess routine is actually easier than it would be with python rules, since one can trim indents systematically, without any exception for strings (and comments). Thank you for reading, Denis (*) namely WML, scripting language of the game called Wesnoth (**) This is true for 1-pass parsers (like PEG), as well as for 2-pass ones (with separate lexical phase). -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com
On Tue, 28 Sep 2010 10:27:07 +0200 spir <denis.spir@gmail.com> wrote:
Hello,
multiline string
By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well:
-1- no need for a separate multiline string notation
A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote. No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this?
No, you're not. The ' form allows literal "'s, and vice versa. The reason for the triple-quoted string is to allow simple multi-line string literals. The reason you want both single and multi-line string literals is so the parser can properly flag the error line when you forget to terminate the far more common single-line literal. Not as important now that nearly everything does syntax coloring, but still a nice feature.
-2- trimming of indentation
On my computer, calling the following function: def write(): if True: print """To be or not to be, that is the question.""" results in the following output: |To be or not to be, | that is the question. This is certainly not the programmer's intent. To get what is expected, one should write instead: def write(): if True: print """To be or not to be, that is the question.""" ...which distorts the visual presentation of code by breaking correct indentation. To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like: def write(): if True: print "To be or not to be,\n" + \ "that is the question." (Actually, the '+' can be here omitted, but this fact is not commonly known.)
And in 3.x, where print is a function instead of a statement, it could be (leaving off the optional "+"): def write(): if True: print("To be or not to be,\n" "that is the question.") So -1 for this idea. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
These two questions are ones where good arguments can be made in both directions. Having explicit notation for multi-line strings is primarily a benefit for readability and error detection. The readability benefit is that it flags to the reader that the next string literal may cover several lines. As Mike noted, the error detection benefit is that the parser can more readily detect a missing end-quote from a normal string instead of inadvertently treating the entire rest of the file as part of the string and giving a relatively useless error regarding EOF while parsing a string. Stripping leading whitespace even inside strings is potentially convenient for the programmer, but breaks the tokenisation stream. String literals are meant to be atomic. Having the parser digging inside them to declare certain whitespace to not be part of the string despite its presence in the source code is certainly a valid design choice a language could make when defining its grammar, but would actually be a fairly significant change for Python. For Python, these two rules are a case of "status quo wins a stalemate". Changing Python's behaviour in this area would be difficult and time-consuming for negligible benefit, so it really isn't worth doing. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-2- trimming of indentation
On my computer, calling the following function: def write(): if True: print """To be or not to be, that is the question.""" results in the following output: |To be or not to be, | that is the question. This is certainly not the programmer's intent. To get what is expected, one should write instead: def write(): if True: print """To be or not to be, that is the question.""" ...which distorts the visual presentation of code by breaking correct indentation. To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like: def write(): if True: print "To be or not to be,\n" + \ "that is the question." (Actually, the '+' can be here omitted, but this fact is not commonly known.)
Have you heard of textwrap.dedent()? I usually would write this as: def write(): if True: print textwrap.dedent("""\ To be or not to be, that is the question.""") - Tal
participants (4)
-
Mike Meyer
-
Nick Coghlan
-
spir
-
Tal Einat