Smart/Curly Quote Marks and cPython

Hello everyone, I want to start small and ask about smart/curly quote marks (” vs "). Although most languages do not support these characters as quotation marks, I believe that cPython should, if possible. I'm willing to write the patch, of course, but I wanted to ask about this change, if it has come up before, and if there are any compatibility issues that I'm not seeing here. Thank you, -Ryan Birmingham

I was thinking of using them only as possibly quotes characters, as students and beginners seem to have difficulties due to this quote-mismatch error. That OSX has smart quotes enabled by default makes this a worthwhile consideration, in my opinion. -Ryan Birmingham On 22 October 2016 at 01:34, Ethan Furman <ethan@stoneleaf.us> wrote:

Interesting idea. +1 from me; probably can be as simple as just having the tokenizer interpret curly quotes as the ASCII (straight) version of itself (in other words, " and the two curly versions of that would all produce the same token, and same for single quotes, eliminating any need for additional changes further down the chain). This would help with copying and pasting code snippets from a source that may have auto-formatted the quotes without the original author realizing it. On Sat, Oct 22, 2016 at 1:46 AM Ryan Birmingham <rainventions@gmail.com> wrote:

On Sat, Oct 22, 2016 at 06:13:35AM +0000, Jonathan Goble wrote:
There's a lot more than two. At least nineteen (including the ASCII ones): 〝〞〟"'"'«»‘’‚‛“”„‟‹›
Personally, I think that we should not encourage programmers to take a lazy, slap-dash attitude to coding. Precision is important to programmers, and there is no limit to how imprecise users can be. Should we also guard against people accidentally using prime marks or ornaments (dingbats): ′″‴‵‶‷ ❛❜❝❞❮❯ as well? If not, what makes them different from other accidents of careless programmers? I don't think we should be trying to guess what programmers mean, nor do I think that we should be encouraging programmers to use word processors for coding. Use the right tool for the right job, and even Notepad is better for the occasional programmer than Microsoft Office or LibreOffice. Programming is hard, requiring precision and care, and we don't do beginners any favours by making it easy for them to be imprecise and careless. I would be happy to see improved error messages for smart quotes: py> s = ‘abcd’ File "<stdin>", line 1 s = ‘abcd’ ^ SyntaxError: invalid character in identifier (especially in IDLE), but I'm very dubious about the idea of using typographical quote marks for strings. At the very least, Python should not lead the way here. Let some other language experiment with this first, and see what happens. Python is a mature, established language, not an experimental language. Of course, there's nothing wrong with doing an experimental branch of Python supporting this feature, to see what happens. But that doesn't mean we should impose it as an official language rule. -- Steve

Per the comments in this thread, I believe that a better error message for this case would be a reasonable way to fix the use case around this issue. It can be difficult to notice that your quotes are curved if you don't know that's what you're looking for. -Ryan Birmingham On 22 October 2016 at 03:16, Steven D'Aprano <steve@pearwood.info> wrote:

On 22 October 2016 at 17:36, Ryan Birmingham <rainventions@gmail.com> wrote:
Looking for particular Unicode confusables when post-processing SyntaxErrors seems like a reasonable idea to me - that's how we ended up implementing the heuristic that reports "Missing parenthesis in call to print" when folks attempt to run Python 2 code under Python 3. At the moment, tokenizer and parser errors are some of the most beginner-hostile ones we offer, since we don't have any real context when raising them - it's just a naive algorithm saying "This isn't the text I expected to see next". By contrast, later in the code generation pipeline, we have more information about what the user was trying to do, and can usually offer better errors. What Guido pointed out when I was working on the "print" heuristic is that we actually get a second go at this: the *exception constructor* usually has access to the text that the tokenizer or parser couldn't handle, and since it isn't on the critical performance path for anything, we can afford to invest some time in looking for common kinds of errors and try to nudge folks in a better direction when we think they've tripped over one of them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 10/22/2016 12:32 PM, Nick Coghlan wrote:
(Continuing my response to Steven saying "improved error messages ... (especially in IDLE)") IDLE compiles()s and exec()s user code within separate try-except blocks, the latter usually being in a separate processes. Runtime tracebacks and exceptions are sent back to IDLE's Shell to be printed just as in a console (except for colorizing). Compile errors are handled differently. Tracebacks are tossed after extracting the file, line, and column (the last from the ^ marker). The latter are used to tag text with a red background. For shell input, the exception is printed normally. For editor input, it is displayed in a messagebox over the editor window. My point is that IDLE already intercepts exceptions and, for SyntaxErrors, does simple modifications (hopefully enhancements) *in Python*. So it could be an easy place to prototype, in Python, more advanced enhancements. Experimental enhancements could be made optional, and could supplement rather than replace the original message. They could also be added and modified in bugfix releases. I will say more about explaining exceptions better in another post. -- Terry Jan Reedy

On 10/22/2016 3:16 AM, Steven D'Aprano wrote:
I would be happy to see improved error messages for smart quotes:
The above *is* the improved (and regressed) 3.6 version ;-) In 3.5.2 (on Windows):
(Mangling of the echoed code line is Windows specific.) The improvement is the more specific error message. The regression is the placement of the caret at the end instead of under the initial '‘'. To verify that Python is not actually pointing at '’', remove it.
(recent 3.6 changes in encodings used on Windows removes code mangling in this echoed line.)
(especially in IDLE),
What do you have in mind? Patches would be considered. I will continue this in response to Nick's post about 9 hours ago. -- Terry Jan Reedy

On Sat, Oct 22, 2016 at 01:17:58AM -0400, Ryan Birmingham wrote:
Hello everyone,
I want to start small and ask about smart/curly quote marks (” vs ").
Which curly quotes are you going to support? There's Dutch, of course: „…” ‚…’ But how about … ? - English ‘…’ “…” - French « … » “…” - Swiss «…» ‹…› - Hebrew „…” ‚…’ - Hungarian „…” »…« - Icelandic „…“ ‚…‘ - Japanese 「…」 『…』 - Polish „…” «…» »…« - Swedish ”…” ’…’ »…» »…« to mention only a few. I think it would be unfair to all the non-Dutch programmers if we only supported Dutch quotation marks, but as you can see, supporting the full range of internationalised curly quotes is difficult.
Although most languages do not support these characters as quotation marks, I believe that cPython should, if possible.
You say "most" -- do you know which programming languages support typographical quotation marks for strings? It would be good to see a survey of which languages support this feature, and how they cope with the internationalisation problem. I think this is likely to be just too hard. There's a reason why programming has standardized on the lowest common denominator for quotation marks '' "" and occasionally `` as well. -- Steve

The quotes I intended in this email are just “ ‘ ” , and ’ where the encoding is appropriate. Internationalization was not the intent of this. I do believe that you have a good point with supporting common quotes in other languages, but I believe that such a change would be large enough to consider a PEP. I am aware that there are other unicode characters, even in English with the Quotation_Mark character property, but this proposed change aims to solve the problem caused when editors, mail clients, web browsers, and operating systems over-zealously replacing straight quotes with these typographical characters. -Ryan Birmingham On 22 October 2016 at 02:35, Steven D'Aprano <steve@pearwood.info> wrote:

On Sat, Oct 22, 2016 at 5:49 PM, Ryan Birmingham <rainventions@gmail.com> wrote:
A programming editor shouldn't mangle your quotes, and a word processor sucks for editing code anyway, so I'd rule those out. When does an operating system change your quotes? It's really just mail and web where these kinds of issues happen. Any web site that's actually designed for code is, like a programmer's editor, going to be quote-safe; and it's not hard to configure a mail client to not mess with you. How strong is this use-case, really? ChrisA

On 22 October 2016 at 08:17, Chris Angelico <rosuav@gmail.com> wrote:
While I agree that it's important for new programmers to learn precision, there are a lot of environments where smart quotes get accidentally inserted into code. * Pasting code into MS Word documents for reference (even if you then format the code as visibly code, the smart quote translation has already happened). That's remarkably common in the sorts of environments I deal in, where code gets quoted in documents, and then later copied out to be reused. * Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes. So in my experience this problem is pretty common. However, I view it as a chance to teach correct use of quotes in programming, rather than something to gloss over or "do what I mean" with. -1 from me. Paul

On Sat, Oct 22, 2016 at 10:09 PM, Paul Moore <p.f.moore@gmail.com> wrote:
One of my students remarked that she had a lot of trouble trying to maintain a notes file, because she couldn't decide whether to use a word processor (with a spell checker) or a code editor (with automatic indentation and syntax highlighting). Still, I think the solution would be to have code editors grow facilities for working with text, rather than word processors grow facilities for working with code, or programming languages grow features for coping with word processors.
* Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes.
Definite learning moment for the person preparing the tutorial. If you were writing a tutorial for Russian speakers and just wrote everything using the Latin alphabet, nobody would say "we should teach Russian people to use the alphabet that my editor uses"; code has its own rules, and if you're writing about code, you should learn how to write it appropriately.
Agreed. Maybe the upshot of this will be a python-list thread recommending some editors that handle both code and screed well - that would be a worthwhile thread IMO. ChrisA

On Sat, Oct 22, 2016 at 4:09 AM, Paul Moore <p.f.moore@gmail.com> wrote:
there are a lot of environments where smart quotes get accidentally inserted into code.
* Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes.
indeed -- I once id a whole set of python class slides in LaTeX -- really nice format, etc.... but in teh process from LaTeX to PDF, I ended up with stuff that looked like Code, but if you copy and pasted it the quotes were wrong -- but only sometimes -- I got pretty used to fixing it, but still was symied once in a while,a nd it was pretty painful for my students... I think the "better error message" option is the way to go, however. At least until we all have better Unicode support in all our tools.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker writes:
I don't think "better Unicode support" helps with confusables in programming languages that value TOOWTDI. OK, we already have 4 kinds of quoting in Python which suggests that TOOWTDI doesn't apply to quoting, but I think that's a bit naive. Given the frequency with which quotes appear in strings, and the fact that English quotation marks can't nest but rarely need to nest more than once, use of both "" and '' with identical semantics to make one level of nesting convenient and readable was plausible. The use of triple quotes for block quoting again has arguments for it. You can think that these were experiments with "meh" results[1], but I don't think it's appropriate to say that therefore TOOWTDI doesn't apply to quote marks. As a general rule, I think use of confusables in new syntax (eg, double curly quotes = f"") runs into "Syntax shall not look like grit on Tim's screen". OTOH, better Unicode support should (cautiously) be used to support new operators and syntax subject to TOOWDTI and other considerations of Pythonicity. Footnotes: [1] Personally, I immediately liked the triple quotes, because the (Emacs) Lisp convention of allowing literal newline characters in all strings caused a number of small annoyances. I also quickly evolved a personal convention where single quotes indicate "string as protocol constant" (eg, where today we'd use enums), while double quotes indicate "arbitrary text content". But those are both obviously YMMV evaluations.

On Mon, Oct 24, 2016 at 7:00 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
that was kind of a throwaway comment, but I think it's a LONG way out, but ideally, the OWTDI would be "curly quotes". The fact that in ASCII, a single quote and a apostrophe are teh same, and that there is no distinction between opening and closing quotes is unfortunate. But it will be a LONG time before we'll all have text editors that can easily let us type that many different characters... and even more time before backward compatibility concerns are alleviated -- probably around the time I can have a snowball fight in the Bad Place. So let's jsut stick with what we have, eh? [1] Personally, I immediately liked the triple quotes, Me too -- I find myself using them in text email messages and the like -- not sure if non-pythonistas get it, but no one has complained yet. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 25 October 2016 at 23:50, Chris Barker <chris.barker@noaa.gov> wrote:
Yes from readability POV, curly quotes would make sense, and better than many other options, eg. «these». Also from POV of parser this could be beneficial to have opening/closing char (or not?). This only means that those chars should be in ASCII ideally. Which is not the case. And IMO not that now code should allow all characters. Mikhail

On 26 October 2016 at 00:53, Mikhail V <mikhailwas@gmail.com> wrote:
Extended ASCII 145 ‘ ‘ ‘ Left single quotation mark 146 ’ ’ ’ Right single quotation mark 147 “ “ “ Left double quotation mark 148 ” ” ” Right double quotation mark 149 • • • Bullet 150 – – – En dash 151 — — — Em dash 152 ˜ ˜ ˜ Small tilde So we all must repent now and get back to 8-bit charcters.

This is a nice summary of quotation marks used in various languages: https://en.wikipedia.org/wiki/Quotation_mark#Specific_language_features On Tue, Oct 25, 2016 at 9:37 PM, Mikhail V <mikhailwas@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Wed, Oct 26, 2016 at 03:37:54AM +0200, Mikhail V wrote:
Extended ASCII
There are over 200 different, mutually incompatible, so-called "extended ASCII" code pages and encodings. And of course it is ludicruous to think that you can fit all the world's characters into only 8-bits. There are more than 40,000 just from China alone, which makes it impossible to fit into 16-bits.
So we all must repent now and get back to 8-bit charcters.
Please stop wasting everyone's time trying to set the clock back to the 1980s. -- Steve

On 27 October 2016 at 01:13, Steven D'Aprano <steve@pearwood.info> wrote:
In 1980 I was not even born. Would be an intersting experience to set the clock to the time where you did not exist 8-\. And what is so bad in having, say 2 tables: 1) what is now considered as standard unicode 2) a table with characters that are reasonably valuable and cover 99% of all programming, communuication and typography in latin script ??? And where did I say I want to fit all possible chars in 8-bit? All possible chars = infinite amount of chars. Mikhail

On Wed, Oct 26, 2016 at 5:10 PM, Mikhail V <mikhailwas@gmail.com> wrote:
I think it's called latin-1 And I think you've mentioned numpy - there was a discussion a while back about having a one-byte-per-character string type (the existing ones are 4 byte unicode and kinda-sort-py2-string/bytes dtype) perhaps you might want to revive that conversation. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 27 October 2016 at 03:51, Chris Barker <chris.barker@noaa.gov> wrote:
Yep, double quotes , dashes and bullets are very valuable both for typography and code (which to the largest part is the same) So if just blank out this maximalistic BS: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö And add few good bullets/blocks, probably arrows, then it would be a reasonable set to use for most cases. Mikhail

On Thu, Oct 27, 2016 at 2:06 PM, Mikhail V <mikhailwas@gmail.com> wrote:
You've missed out a half a dozen characters needed by Turkish or Hungarian, and completely missed the point that the Latin script is *NOT SUFFICIENT* for Python. If you want to argue that we should restrict the world to 256 characters, go blog somewhere and let people ignore you there, rather than ignoring you here. Unicode is here to stay. ChrisA

On 27 October 2016 at 06:24, Chris Angelico <rosuav@gmail.com> wrote:
So you need umlauts to describe an algorithm and to explain yourself in turkish? Cool story. Poor uncle Garamond spins in his coffin... So what about curly quotes? This would make at least some sense, regardless of unicode. Mikhail

On Thu, Oct 27, 2016, at 14:28, Mikhail V wrote:
Why do you need 26 letters? The Romans didn't have so many. Hawaiian gets by with half as many - even if you count the accented vowels and the ʻokina it's still only 18. Why upper and lower case? Do we *really* need digits, can't we just use the first ten letters? Allowing each language to use its own alphabet, even if any of them may be inefficient and all of them together certainly are, is the only reasonable place to draw the line.

On 27 October 2016 at 21:40, Random832 <random832@fastmail.com> wrote:
Hi Random, Yes that is what I am trying to tell, but some paint a "bigot" of me. So there is no contradiction here. You know you "local" script and you know Latin. So it belongs to my human right if I want to choose a more effective one, so since Latin is most effective now, I take it. Simply like I take a wheel without defects and with tight pressure in tyre. I don't have emotions or sadness that I will forget my strange old letters. And if we return to problem of universal communication "kind of standard" then what the sense to take a defect wheel? I am not the one to allow or disallow anything, but I respect the works of Garamond and his predecessors who made it possible for me to read without pain in eyes and I disrespect attempts to ruin it. And beleive me, it is *very* easy to ruin it all by putting umlauts and accents, just like putting stones in the tyre. Mikhail

On 27.10.2016 20:28, Mikhail V wrote:
So what about curly quotes? This would make at least some sense, regardless of unicode.
-1. This would break code using curly quotes in string literals, break existing Python IDEs and parsers. BTW: I have yet to find a keyboard which allows me to enter such quotes. I think you simply have to accept that MS Word is not a supported editor for Python applications ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 27 October 2016 at 21:51, M.-A. Lemburg <mal@egenix.com> wrote:
Hehe :) For me, putting them in is simply as having this in my vimrc config: inoremap <C-o> <C-V>147 inoremap <C-p> <C-V>148 Currently I don't become code from outer applications so I type them in, so for new code it will not cause much problems. For old code I think it not so infeasible to make batch convert to the new format. AND you know, even in VIM with its spartanic "Courier New" monowidth font, those quotes look sooo much better, that I really want it. And in my code there tons of quotes in concatenating string for console commands. So I am +1 on this, but of course I cannot argue that it is very "uncomfortable" change in general. Mikhail

I was thinking of using them only as possibly quotes characters, as students and beginners seem to have difficulties due to this quote-mismatch error. That OSX has smart quotes enabled by default makes this a worthwhile consideration, in my opinion. -Ryan Birmingham On 22 October 2016 at 01:34, Ethan Furman <ethan@stoneleaf.us> wrote:

Interesting idea. +1 from me; probably can be as simple as just having the tokenizer interpret curly quotes as the ASCII (straight) version of itself (in other words, " and the two curly versions of that would all produce the same token, and same for single quotes, eliminating any need for additional changes further down the chain). This would help with copying and pasting code snippets from a source that may have auto-formatted the quotes without the original author realizing it. On Sat, Oct 22, 2016 at 1:46 AM Ryan Birmingham <rainventions@gmail.com> wrote:

On Sat, Oct 22, 2016 at 06:13:35AM +0000, Jonathan Goble wrote:
There's a lot more than two. At least nineteen (including the ASCII ones): 〝〞〟"'"'«»‘’‚‛“”„‟‹›
Personally, I think that we should not encourage programmers to take a lazy, slap-dash attitude to coding. Precision is important to programmers, and there is no limit to how imprecise users can be. Should we also guard against people accidentally using prime marks or ornaments (dingbats): ′″‴‵‶‷ ❛❜❝❞❮❯ as well? If not, what makes them different from other accidents of careless programmers? I don't think we should be trying to guess what programmers mean, nor do I think that we should be encouraging programmers to use word processors for coding. Use the right tool for the right job, and even Notepad is better for the occasional programmer than Microsoft Office or LibreOffice. Programming is hard, requiring precision and care, and we don't do beginners any favours by making it easy for them to be imprecise and careless. I would be happy to see improved error messages for smart quotes: py> s = ‘abcd’ File "<stdin>", line 1 s = ‘abcd’ ^ SyntaxError: invalid character in identifier (especially in IDLE), but I'm very dubious about the idea of using typographical quote marks for strings. At the very least, Python should not lead the way here. Let some other language experiment with this first, and see what happens. Python is a mature, established language, not an experimental language. Of course, there's nothing wrong with doing an experimental branch of Python supporting this feature, to see what happens. But that doesn't mean we should impose it as an official language rule. -- Steve

Per the comments in this thread, I believe that a better error message for this case would be a reasonable way to fix the use case around this issue. It can be difficult to notice that your quotes are curved if you don't know that's what you're looking for. -Ryan Birmingham On 22 October 2016 at 03:16, Steven D'Aprano <steve@pearwood.info> wrote:

On 22 October 2016 at 17:36, Ryan Birmingham <rainventions@gmail.com> wrote:
Looking for particular Unicode confusables when post-processing SyntaxErrors seems like a reasonable idea to me - that's how we ended up implementing the heuristic that reports "Missing parenthesis in call to print" when folks attempt to run Python 2 code under Python 3. At the moment, tokenizer and parser errors are some of the most beginner-hostile ones we offer, since we don't have any real context when raising them - it's just a naive algorithm saying "This isn't the text I expected to see next". By contrast, later in the code generation pipeline, we have more information about what the user was trying to do, and can usually offer better errors. What Guido pointed out when I was working on the "print" heuristic is that we actually get a second go at this: the *exception constructor* usually has access to the text that the tokenizer or parser couldn't handle, and since it isn't on the critical performance path for anything, we can afford to invest some time in looking for common kinds of errors and try to nudge folks in a better direction when we think they've tripped over one of them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 10/22/2016 12:32 PM, Nick Coghlan wrote:
(Continuing my response to Steven saying "improved error messages ... (especially in IDLE)") IDLE compiles()s and exec()s user code within separate try-except blocks, the latter usually being in a separate processes. Runtime tracebacks and exceptions are sent back to IDLE's Shell to be printed just as in a console (except for colorizing). Compile errors are handled differently. Tracebacks are tossed after extracting the file, line, and column (the last from the ^ marker). The latter are used to tag text with a red background. For shell input, the exception is printed normally. For editor input, it is displayed in a messagebox over the editor window. My point is that IDLE already intercepts exceptions and, for SyntaxErrors, does simple modifications (hopefully enhancements) *in Python*. So it could be an easy place to prototype, in Python, more advanced enhancements. Experimental enhancements could be made optional, and could supplement rather than replace the original message. They could also be added and modified in bugfix releases. I will say more about explaining exceptions better in another post. -- Terry Jan Reedy

On 10/22/2016 3:16 AM, Steven D'Aprano wrote:
I would be happy to see improved error messages for smart quotes:
The above *is* the improved (and regressed) 3.6 version ;-) In 3.5.2 (on Windows):
(Mangling of the echoed code line is Windows specific.) The improvement is the more specific error message. The regression is the placement of the caret at the end instead of under the initial '‘'. To verify that Python is not actually pointing at '’', remove it.
(recent 3.6 changes in encodings used on Windows removes code mangling in this echoed line.)
(especially in IDLE),
What do you have in mind? Patches would be considered. I will continue this in response to Nick's post about 9 hours ago. -- Terry Jan Reedy

On Sat, Oct 22, 2016 at 01:17:58AM -0400, Ryan Birmingham wrote:
Hello everyone,
I want to start small and ask about smart/curly quote marks (” vs ").
Which curly quotes are you going to support? There's Dutch, of course: „…” ‚…’ But how about … ? - English ‘…’ “…” - French « … » “…” - Swiss «…» ‹…› - Hebrew „…” ‚…’ - Hungarian „…” »…« - Icelandic „…“ ‚…‘ - Japanese 「…」 『…』 - Polish „…” «…» »…« - Swedish ”…” ’…’ »…» »…« to mention only a few. I think it would be unfair to all the non-Dutch programmers if we only supported Dutch quotation marks, but as you can see, supporting the full range of internationalised curly quotes is difficult.
Although most languages do not support these characters as quotation marks, I believe that cPython should, if possible.
You say "most" -- do you know which programming languages support typographical quotation marks for strings? It would be good to see a survey of which languages support this feature, and how they cope with the internationalisation problem. I think this is likely to be just too hard. There's a reason why programming has standardized on the lowest common denominator for quotation marks '' "" and occasionally `` as well. -- Steve

The quotes I intended in this email are just “ ‘ ” , and ’ where the encoding is appropriate. Internationalization was not the intent of this. I do believe that you have a good point with supporting common quotes in other languages, but I believe that such a change would be large enough to consider a PEP. I am aware that there are other unicode characters, even in English with the Quotation_Mark character property, but this proposed change aims to solve the problem caused when editors, mail clients, web browsers, and operating systems over-zealously replacing straight quotes with these typographical characters. -Ryan Birmingham On 22 October 2016 at 02:35, Steven D'Aprano <steve@pearwood.info> wrote:

On Sat, Oct 22, 2016 at 5:49 PM, Ryan Birmingham <rainventions@gmail.com> wrote:
A programming editor shouldn't mangle your quotes, and a word processor sucks for editing code anyway, so I'd rule those out. When does an operating system change your quotes? It's really just mail and web where these kinds of issues happen. Any web site that's actually designed for code is, like a programmer's editor, going to be quote-safe; and it's not hard to configure a mail client to not mess with you. How strong is this use-case, really? ChrisA

On 22 October 2016 at 08:17, Chris Angelico <rosuav@gmail.com> wrote:
While I agree that it's important for new programmers to learn precision, there are a lot of environments where smart quotes get accidentally inserted into code. * Pasting code into MS Word documents for reference (even if you then format the code as visibly code, the smart quote translation has already happened). That's remarkably common in the sorts of environments I deal in, where code gets quoted in documents, and then later copied out to be reused. * Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes. So in my experience this problem is pretty common. However, I view it as a chance to teach correct use of quotes in programming, rather than something to gloss over or "do what I mean" with. -1 from me. Paul

On Sat, Oct 22, 2016 at 10:09 PM, Paul Moore <p.f.moore@gmail.com> wrote:
One of my students remarked that she had a lot of trouble trying to maintain a notes file, because she couldn't decide whether to use a word processor (with a spell checker) or a code editor (with automatic indentation and syntax highlighting). Still, I think the solution would be to have code editors grow facilities for working with text, rather than word processors grow facilities for working with code, or programming languages grow features for coping with word processors.
* Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes.
Definite learning moment for the person preparing the tutorial. If you were writing a tutorial for Russian speakers and just wrote everything using the Latin alphabet, nobody would say "we should teach Russian people to use the alphabet that my editor uses"; code has its own rules, and if you're writing about code, you should learn how to write it appropriately.
Agreed. Maybe the upshot of this will be a python-list thread recommending some editors that handle both code and screed well - that would be a worthwhile thread IMO. ChrisA

On Sat, Oct 22, 2016 at 4:09 AM, Paul Moore <p.f.moore@gmail.com> wrote:
there are a lot of environments where smart quotes get accidentally inserted into code.
* Tutorial/example material prepared by non-programmers, again using tools that are too "helpful" in auto-converting to smart quotes.
indeed -- I once id a whole set of python class slides in LaTeX -- really nice format, etc.... but in teh process from LaTeX to PDF, I ended up with stuff that looked like Code, but if you copy and pasted it the quotes were wrong -- but only sometimes -- I got pretty used to fixing it, but still was symied once in a while,a nd it was pretty painful for my students... I think the "better error message" option is the way to go, however. At least until we all have better Unicode support in all our tools.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker writes:
I don't think "better Unicode support" helps with confusables in programming languages that value TOOWTDI. OK, we already have 4 kinds of quoting in Python which suggests that TOOWTDI doesn't apply to quoting, but I think that's a bit naive. Given the frequency with which quotes appear in strings, and the fact that English quotation marks can't nest but rarely need to nest more than once, use of both "" and '' with identical semantics to make one level of nesting convenient and readable was plausible. The use of triple quotes for block quoting again has arguments for it. You can think that these were experiments with "meh" results[1], but I don't think it's appropriate to say that therefore TOOWTDI doesn't apply to quote marks. As a general rule, I think use of confusables in new syntax (eg, double curly quotes = f"") runs into "Syntax shall not look like grit on Tim's screen". OTOH, better Unicode support should (cautiously) be used to support new operators and syntax subject to TOOWDTI and other considerations of Pythonicity. Footnotes: [1] Personally, I immediately liked the triple quotes, because the (Emacs) Lisp convention of allowing literal newline characters in all strings caused a number of small annoyances. I also quickly evolved a personal convention where single quotes indicate "string as protocol constant" (eg, where today we'd use enums), while double quotes indicate "arbitrary text content". But those are both obviously YMMV evaluations.

On Mon, Oct 24, 2016 at 7:00 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
that was kind of a throwaway comment, but I think it's a LONG way out, but ideally, the OWTDI would be "curly quotes". The fact that in ASCII, a single quote and a apostrophe are teh same, and that there is no distinction between opening and closing quotes is unfortunate. But it will be a LONG time before we'll all have text editors that can easily let us type that many different characters... and even more time before backward compatibility concerns are alleviated -- probably around the time I can have a snowball fight in the Bad Place. So let's jsut stick with what we have, eh? [1] Personally, I immediately liked the triple quotes, Me too -- I find myself using them in text email messages and the like -- not sure if non-pythonistas get it, but no one has complained yet. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 25 October 2016 at 23:50, Chris Barker <chris.barker@noaa.gov> wrote:
Yes from readability POV, curly quotes would make sense, and better than many other options, eg. «these». Also from POV of parser this could be beneficial to have opening/closing char (or not?). This only means that those chars should be in ASCII ideally. Which is not the case. And IMO not that now code should allow all characters. Mikhail

On 26 October 2016 at 00:53, Mikhail V <mikhailwas@gmail.com> wrote:
Extended ASCII 145 ‘ ‘ ‘ Left single quotation mark 146 ’ ’ ’ Right single quotation mark 147 “ “ “ Left double quotation mark 148 ” ” ” Right double quotation mark 149 • • • Bullet 150 – – – En dash 151 — — — Em dash 152 ˜ ˜ ˜ Small tilde So we all must repent now and get back to 8-bit charcters.

This is a nice summary of quotation marks used in various languages: https://en.wikipedia.org/wiki/Quotation_mark#Specific_language_features On Tue, Oct 25, 2016 at 9:37 PM, Mikhail V <mikhailwas@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Wed, Oct 26, 2016 at 03:37:54AM +0200, Mikhail V wrote:
Extended ASCII
There are over 200 different, mutually incompatible, so-called "extended ASCII" code pages and encodings. And of course it is ludicruous to think that you can fit all the world's characters into only 8-bits. There are more than 40,000 just from China alone, which makes it impossible to fit into 16-bits.
So we all must repent now and get back to 8-bit charcters.
Please stop wasting everyone's time trying to set the clock back to the 1980s. -- Steve

On 27 October 2016 at 01:13, Steven D'Aprano <steve@pearwood.info> wrote:
In 1980 I was not even born. Would be an intersting experience to set the clock to the time where you did not exist 8-\. And what is so bad in having, say 2 tables: 1) what is now considered as standard unicode 2) a table with characters that are reasonably valuable and cover 99% of all programming, communuication and typography in latin script ??? And where did I say I want to fit all possible chars in 8-bit? All possible chars = infinite amount of chars. Mikhail

On Wed, Oct 26, 2016 at 5:10 PM, Mikhail V <mikhailwas@gmail.com> wrote:
I think it's called latin-1 And I think you've mentioned numpy - there was a discussion a while back about having a one-byte-per-character string type (the existing ones are 4 byte unicode and kinda-sort-py2-string/bytes dtype) perhaps you might want to revive that conversation. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 27 October 2016 at 03:51, Chris Barker <chris.barker@noaa.gov> wrote:
Yep, double quotes , dashes and bullets are very valuable both for typography and code (which to the largest part is the same) So if just blank out this maximalistic BS: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö And add few good bullets/blocks, probably arrows, then it would be a reasonable set to use for most cases. Mikhail

On Thu, Oct 27, 2016 at 2:06 PM, Mikhail V <mikhailwas@gmail.com> wrote:
You've missed out a half a dozen characters needed by Turkish or Hungarian, and completely missed the point that the Latin script is *NOT SUFFICIENT* for Python. If you want to argue that we should restrict the world to 256 characters, go blog somewhere and let people ignore you there, rather than ignoring you here. Unicode is here to stay. ChrisA

On 27 October 2016 at 06:24, Chris Angelico <rosuav@gmail.com> wrote:
So you need umlauts to describe an algorithm and to explain yourself in turkish? Cool story. Poor uncle Garamond spins in his coffin... So what about curly quotes? This would make at least some sense, regardless of unicode. Mikhail

On Thu, Oct 27, 2016, at 14:28, Mikhail V wrote:
Why do you need 26 letters? The Romans didn't have so many. Hawaiian gets by with half as many - even if you count the accented vowels and the ʻokina it's still only 18. Why upper and lower case? Do we *really* need digits, can't we just use the first ten letters? Allowing each language to use its own alphabet, even if any of them may be inefficient and all of them together certainly are, is the only reasonable place to draw the line.

On 27 October 2016 at 21:40, Random832 <random832@fastmail.com> wrote:
Hi Random, Yes that is what I am trying to tell, but some paint a "bigot" of me. So there is no contradiction here. You know you "local" script and you know Latin. So it belongs to my human right if I want to choose a more effective one, so since Latin is most effective now, I take it. Simply like I take a wheel without defects and with tight pressure in tyre. I don't have emotions or sadness that I will forget my strange old letters. And if we return to problem of universal communication "kind of standard" then what the sense to take a defect wheel? I am not the one to allow or disallow anything, but I respect the works of Garamond and his predecessors who made it possible for me to read without pain in eyes and I disrespect attempts to ruin it. And beleive me, it is *very* easy to ruin it all by putting umlauts and accents, just like putting stones in the tyre. Mikhail

On 27.10.2016 20:28, Mikhail V wrote:
So what about curly quotes? This would make at least some sense, regardless of unicode.
-1. This would break code using curly quotes in string literals, break existing Python IDEs and parsers. BTW: I have yet to find a keyboard which allows me to enter such quotes. I think you simply have to accept that MS Word is not a supported editor for Python applications ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 27 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 27 October 2016 at 21:51, M.-A. Lemburg <mal@egenix.com> wrote:
Hehe :) For me, putting them in is simply as having this in my vimrc config: inoremap <C-o> <C-V>147 inoremap <C-p> <C-V>148 Currently I don't become code from outer applications so I type them in, so for new code it will not cause much problems. For old code I think it not so infeasible to make batch convert to the new format. AND you know, even in VIM with its spartanic "Courier New" monowidth font, those quotes look sooo much better, that I really want it. And in my code there tons of quotes in concatenating string for console commands. So I am +1 on this, but of course I cannot argue that it is very "uncomfortable" change in general. Mikhail
participants (17)
-
Chris Angelico
-
Chris Barker
-
David Mertz
-
Ethan Furman
-
Jonathan Goble
-
M.-A. Lemburg
-
Michel Desmoulin
-
Mikhail V
-
Ned Batchelder
-
Nick Coghlan
-
Paul Moore
-
Random832
-
Ryan Birmingham
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy