Alternate quote delimiters

While watching the various issues about raw strings and quotes being discussed, I decided look into other ways to resolve string delimiter collisions that might also improve pythons string handling in general. And I'd figure I'd let it get shot full of holes here first. ;-) Here's a nice overview of the issue with various examples. http://en.wikipedia.org/wiki/String_literal *Note: The wiki python raw string example has a trailing backslash which would cause an error. r"The Windows path is C:\Foo\Bar\Baz\" In the case of raw strings, the escaped quotes are only needed in order for tokenize.c to locate the end of the string. The back slashes remain in the string. The end of the string is the first unescaped quote of the same type as the starting quote. >>> s = r"\"Hello World\"" >>> print s \"Hello World\" This allows you to enter, back-slash + quote, pairs into a string, but ... - This doesn't help in entering only quotes. You still need to either use a different quotes than what is in the string or not use raw strings. - A minor side effect is that you can't have a raw string end with a single slash as in the incorrect wiki example above. Other languages which use escape characters in raw strings have the same issue. - If you have a long string with multiple quotes as you can in regular expressions the increased number of \ characters can make the regular expression more difficult to read and can add up to be more characters than needed. - Another minor issue I've found is doctest that do test on quoted strings may have multiple nested doc strings which may require careful selection of quote characters. The problem arises because the start and end delimiters are the same. The start of the nested string ends the top level string. The current solution is to pick different quote characters or escape quote characters in the nested strings. So how about a multiple quoting solution? The examples of this in the wiki page are of the form... qq^I said, "Can you hear me?"^ qq@I said, "Can you hear me?"@ qq§I said, "Can you hear me?"§ But that doesn't fit well with pythons use of quotes. But a variation of this would. Add a 'q' string prefix similar to the 'u' and 'r' prefix's that takes the first character inside the quotes as an additional delimiter. Then ending quote will then need to have that same character proceeding it. q"^I said, "Can you hear me?"^" q"""|I said, "Can you hear me?"|""" The vertical bar is part of the quote here, not part of the string. rq"^The Windows path is C:\Foo\Bar\Baz\^" This example will work as expected. Because the beginning and ending of the strings are not identical, it's possible that they can allow nesting. rq"""^ rq"""^ This is a nested string. ^""" """ Another nested string. """ ^""" The most useful feature of this would be in temporarily commenting out large blocks of python code. Currently this doesn't work well if the block that contain triple quoted doc strings. Another option might be to designate certain quote delimiters for special purposes. Dedented strings. q"""< This is a Dedented Paragraph. <""" Commented out source code. q"""# def foo(bar): """ A bar checker. """ return bar is True #""" ReST formatted strings. rq""": Bullet lists: - This is item 1 - This is item 2 - Bullets are "-", "*" or "+". Continuing text must be aligned after the bullet and whitespace. Note that a blank line is required before the first item and after the last, but is optional between items. :""" This use would require some way to preserve it's quoting type so it can later be used to render the text. (Any ideas?) Cheers, Ron

On 5/11/07, Ron Adam <rrr@ronadam.com> wrote:
Here come the first shots. ;-)
Ick. This will make python code harder to parse (I wonder whether the current parser even do what you propose), and isn't that much of an improvement in ease of expression. Also, this seems way too perlish for my tastes. There should be only one way to do quoting, but practicality beat purity for quotes and regexes. However, allowing arbitrary quoting characters seems like overkill.
A block commenting syntax may be useful, however I don't like this proposal because of the previous point. Also, most python editors let you block comment out stuff with a command to add the appropriate #s pretty easily.
I'm neutral on these.
Seems like a little much to have syntax for ReST. Is it used frequently enough to justify adding this? - Chris Rebert

Chris Rebert wrote:
Great! ;-)
The tokanizer can be modified to convert any of these to standard or raw strings accordingly. It just needs to find the beginning and the end. I haven't looked into what will be needed to pass the delimiter to the compiler yet. So in it's simplest form, it's just a bit of preprocessing.
Then limit it to just a smaller set of characters. Which ones do you suggest. Perl uses the forms above I didn't choose. Those don't even look like strings to me, which is why id didn't even consider them. Almost every other solution has problems of some sort. This is the only solution I liked that did not have any problems. Like anything new it may take a little getting used to before it seems like python rather than something tacked on.
True, and that is what I do. However, most non-python editors need macros programed into them if they have them, if they don't, you must fall back to manually adding #'s to each line.
The choices of delimiters in these cases need not be carved in stone. It can be an informal standard as well.
It's not really syntax for reST in this case. It's more of a very general method to notate or tag a string. There may be other uses for that as well.
- Chris Rebert
Cheers, Ron

... doctest that do test on quoted strings may have multiple nested doc strings ...
q"^I said, "Can you hear me?"^"
q"""|I said, "Can you hear me?"|"""
The vertical bar is part of the quote here, not part of the string.
This should work, but I can't help wondering if it is too complicated. What if the character were limited to the opening of a bracketed-pair, such as {[( Or is that just as bad, and less flexible to boot? }]}
rq"^The Windows path is C:\Foo\Bar\Baz\^"
This example will work as expected.
How serious is this problem? r"The Windows path is C:\Foo\Bar\Baz\X"[:-1] is awkward, but ... how much complexity do we want to incur avoiding it? -jJ

Jim Jewett wrote:
If we are not concerned about meta uses, then we probably only need one. I prefer the non-directional symbols as they don't suggest the string is another container type. Raymonds information attribute pep might fulfill the meta uses I had in mind. (Don't know until we see it.)
Since Guido gave the nod to Martin for starting a pep to remove escapes from raw strings completely, this one may no longer be an issue. ;-) On another note, it may be useful to have or be able to use an alternative escape character. e&"The Windows path is C:\Foo\Bar\Baz\X\" # '&' is escape, not '\' e&"I said, &"Can you hear me now?&"" The '&' is used much less frequently than the '\' is. Raw strings wouldn't use it. ;-) The advantage in this is that many other languages use the '\' as an escape introducer. So being able to use a '&' can avoid escape character clashes if you are using python to generate code or scripts where the '\' is used frequently. And then we can have the ('\' + chr) or ('&' + chr) pairs always be an escape character. Currently '\' by it self isn't always an escape character and it's evaluated differently than for example how re evaluates it if the character following the '\' is not special. Alternatively it might be good if python raised an error on an escape sequence it doesn't recognize just as it does with the percent character. (Just looking for the little things that can be cleaned up a tiny bit, I'll leave the big Base Classes and Super issues to the experts. ;-) Cheers, Ron

Jim Jewett wrote:
This would be better, I think, as then nested occurrences of the bracket chars could be skipped without having to pick a different character for each level of quoting. It could even be restricted to just one kind of bracket if desired without losing anything.
If you use os.path.join and friends to manipulate your paths (as you should!) you will hardly ever have to deal with a path containing a trailing backslash in the first place. So I wouldn't worry about it much. -- Greg

On 5/13/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
When learning Python, new users often get bitten by this. Simple learning exercises often leave os.path for later, and use simpler hard-coded file paths instead. When teaching Python, this is one more sharp edge I'm forced to warn new users about. I also hear people complaining about this when using Python for simple automation scripts, where hard-coded paths are common, and using os.path can easily double the amount of code and its complexity for no extra functionality. (Yes, the [:-1] hack works, but it's ugly, and Python should be beautiful.) I'm sure the "completely remove escapes from raw strings" PEP will solve this issue, and I'm looking forward to it. - Tal

On 5/11/07, Ron Adam <rrr@ronadam.com> wrote:
Here come the first shots. ;-)
Ick. This will make python code harder to parse (I wonder whether the current parser even do what you propose), and isn't that much of an improvement in ease of expression. Also, this seems way too perlish for my tastes. There should be only one way to do quoting, but practicality beat purity for quotes and regexes. However, allowing arbitrary quoting characters seems like overkill.
A block commenting syntax may be useful, however I don't like this proposal because of the previous point. Also, most python editors let you block comment out stuff with a command to add the appropriate #s pretty easily.
I'm neutral on these.
Seems like a little much to have syntax for ReST. Is it used frequently enough to justify adding this? - Chris Rebert

Chris Rebert wrote:
Great! ;-)
The tokanizer can be modified to convert any of these to standard or raw strings accordingly. It just needs to find the beginning and the end. I haven't looked into what will be needed to pass the delimiter to the compiler yet. So in it's simplest form, it's just a bit of preprocessing.
Then limit it to just a smaller set of characters. Which ones do you suggest. Perl uses the forms above I didn't choose. Those don't even look like strings to me, which is why id didn't even consider them. Almost every other solution has problems of some sort. This is the only solution I liked that did not have any problems. Like anything new it may take a little getting used to before it seems like python rather than something tacked on.
True, and that is what I do. However, most non-python editors need macros programed into them if they have them, if they don't, you must fall back to manually adding #'s to each line.
The choices of delimiters in these cases need not be carved in stone. It can be an informal standard as well.
It's not really syntax for reST in this case. It's more of a very general method to notate or tag a string. There may be other uses for that as well.
- Chris Rebert
Cheers, Ron

... doctest that do test on quoted strings may have multiple nested doc strings ...
q"^I said, "Can you hear me?"^"
q"""|I said, "Can you hear me?"|"""
The vertical bar is part of the quote here, not part of the string.
This should work, but I can't help wondering if it is too complicated. What if the character were limited to the opening of a bracketed-pair, such as {[( Or is that just as bad, and less flexible to boot? }]}
rq"^The Windows path is C:\Foo\Bar\Baz\^"
This example will work as expected.
How serious is this problem? r"The Windows path is C:\Foo\Bar\Baz\X"[:-1] is awkward, but ... how much complexity do we want to incur avoiding it? -jJ

Jim Jewett wrote:
If we are not concerned about meta uses, then we probably only need one. I prefer the non-directional symbols as they don't suggest the string is another container type. Raymonds information attribute pep might fulfill the meta uses I had in mind. (Don't know until we see it.)
Since Guido gave the nod to Martin for starting a pep to remove escapes from raw strings completely, this one may no longer be an issue. ;-) On another note, it may be useful to have or be able to use an alternative escape character. e&"The Windows path is C:\Foo\Bar\Baz\X\" # '&' is escape, not '\' e&"I said, &"Can you hear me now?&"" The '&' is used much less frequently than the '\' is. Raw strings wouldn't use it. ;-) The advantage in this is that many other languages use the '\' as an escape introducer. So being able to use a '&' can avoid escape character clashes if you are using python to generate code or scripts where the '\' is used frequently. And then we can have the ('\' + chr) or ('&' + chr) pairs always be an escape character. Currently '\' by it self isn't always an escape character and it's evaluated differently than for example how re evaluates it if the character following the '\' is not special. Alternatively it might be good if python raised an error on an escape sequence it doesn't recognize just as it does with the percent character. (Just looking for the little things that can be cleaned up a tiny bit, I'll leave the big Base Classes and Super issues to the experts. ;-) Cheers, Ron

Jim Jewett wrote:
This would be better, I think, as then nested occurrences of the bracket chars could be skipped without having to pick a different character for each level of quoting. It could even be restricted to just one kind of bracket if desired without losing anything.
If you use os.path.join and friends to manipulate your paths (as you should!) you will hardly ever have to deal with a path containing a trailing backslash in the first place. So I wouldn't worry about it much. -- Greg

On 5/13/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
When learning Python, new users often get bitten by this. Simple learning exercises often leave os.path for later, and use simpler hard-coded file paths instead. When teaching Python, this is one more sharp edge I'm forced to warn new users about. I also hear people complaining about this when using Python for simple automation scripts, where hard-coded paths are common, and using os.path can easily double the amount of code and its complexity for no extra functionality. (Yes, the [:-1] hack works, but it's ugly, and Python should be beautiful.) I'm sure the "completely remove escapes from raw strings" PEP will solve this issue, and I'm looking forward to it. - Tal
participants (5)
-
Chris Rebert
-
Greg Ewing
-
Jim Jewett
-
Ron Adam
-
Tal Einat