Allowing non-ASCII bracket and quote characters in source code
Perhaps the time isn't ripe for this, and perhaps it never will be, but UTF8 seems to be handled by just about everything these days. I suspect this is a crazy suggestion, but on the other hand perhaps people looking back from 2100 will think "It was crazy that they stuck exclusively with ASCII syntax characters for so long after Unicode was widely available". Sometimes when you have quite a few levels of brackets, and there are more than one of the same type, might it be better to allow variants of each type of bracket character? Unicode provides bold, double, small, superscript, subscript, and white-filled (hollow) variants of round, square and curly brackets, and top and bottom ticked variants of square brackets. Perhaps not enough platforms will be able to display them? And entering them may be fiddly, although programmable editors and IDEs could let you type the standard characters but pick variants on a round-robin basis within each expression. But it might be handled better in the display in editors and IDEs (perhaps syntax coloring brackets by their depth). Or some might say not to write such deeply nested bracketed expressions. Implementation parsers could simply translate all the brackets to the base types, or they could treat them as equivalent but distinct, and check that the open and close brackets match, which might catch a few errors. There are also other quote characters available, such as the guillemets traditionally used in French. There's absolutely no need to use such things, but for people working on code which will be used internally in, say, French or Quebecois organizations might welcome it. OK, shoot this one down now :-)
Here's a better idea: Use parens for all the levels of nesting, as currently, but convince your editor to substitute various other bracket-ish things for levels of nesting. I haven't done that, but I use the conceal plugin with vim to do similar things. Up to some fixed finite levels of nesting, regex would do that. On Mon, Jan 17, 2022 at 3:20 PM John Sturdy <jcg.sturdy@gmail.com> wrote:
Perhaps the time isn't ripe for this, and perhaps it never will be, but UTF8 seems to be handled by just about everything these days. I suspect this is a crazy suggestion, but on the other hand perhaps people looking back from 2100 will think "It was crazy that they stuck exclusively with ASCII syntax characters for so long after Unicode was widely available".
Sometimes when you have quite a few levels of brackets, and there are more than one of the same type, might it be better to allow variants of each type of bracket character? Unicode provides bold, double, small, superscript, subscript, and white-filled (hollow) variants of round, square and curly brackets, and top and bottom ticked variants of square brackets.
Perhaps not enough platforms will be able to display them? And entering them may be fiddly, although programmable editors and IDEs could let you type the standard characters but pick variants on a round-robin basis within each expression.
But it might be handled better in the display in editors and IDEs (perhaps syntax coloring brackets by their depth). Or some might say not to write such deeply nested bracketed expressions.
Implementation parsers could simply translate all the brackets to the base types, or they could treat them as equivalent but distinct, and check that the open and close brackets match, which might catch a few errors.
There are also other quote characters available, such as the guillemets traditionally used in French. There's absolutely no need to use such things, but for people working on code which will be used internally in, say, French or Quebecois organizations might welcome it.
OK, shoot this one down now :-) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NIJX6Q... Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Mon, Jan 17, 2022 at 3:22 PM John Sturdy <jcg.sturdy@gmail.com> wrote:
But it might be handled better in the display in editors and IDEs (perhaps syntax coloring brackets by their depth).
VSCode already supports this out of the box. Just search "bracket pair" in the settings and check (tick) the box for Bracket Pair Colorization. Attached is an example of what it looks like with four levels of nesting by default (using one of my old personal projects for the sake of example).
I would be more sympathetic to this idea if: 1. I knew how to easily type all those brackets on the keyboard, without having to use a GUI character picker. 2. I had a guarantee that all of the bracket characters would be both available and easily distinguishable in any typeface I used.
On Mon, Jan 17, 2022 at 03:27:33PM -0500, David Mertz, Ph.D. wrote:
Here's a better idea: Use parens for all the levels of nesting, as currently, but convince your editor to substitute various other bracket-ish things for levels of nesting.
This is approximately what I do, except that I use color to match bracket pairs (with a system commonly called "rainbow parentheses", though there are various implementations for various editors and these can do more than just parentheses). It is very easy to determine which brackets match where. - DLD
On 2022-01-18 at 12:07:15 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
I would be more sympathetic to this idea if:
1. I knew how to easily type all those brackets on the keyboard, without having to use a GUI character picker.
That's between you and your OS. I believe all the major ones have ways to enter arbitrary characters and/or switch keyboards without a GUI (I use Linux, and I know for sure that it does). Someone on this list keeps saying that everyone wants to know how to play the piano, but that no one wants to put in the hard work to learn to play the piano. ;-) I don't know the entire unicode database, but I have learned the code points I use often (and how to type them without a GUI picker). The unicode database clearly marks open and close parentheses, and even contains enough data (on initial examination) to match up the pairs (that doesn't help you easily type them, but it does help utility programs, GUI or otherwise, help you pick an approprite one at the right time).
2. I had a guarantee that all of the bracket characters would be both available and easily distinguishable in any typeface I used.
You don't have that gurantee now, unless you check for all those things yourself ("easily distinguishable" is subjective) before you use a typeface. *only half a wink*
On Tue, Jan 18, 2022 at 1:15 AM Steven D'Aprano <steve@pearwood.info> wrote:
I would be more sympathetic to this idea if:
2. I had a guarantee that all of the bracket characters would be both available and easily distinguishable in any typeface I used.
I don't think the "distinguishable" part matters that much; if the different variants of the same type of bracket look the same, we're just back to what we see now. However, it's a problem if they're just displayed as "unknown character" markers, and you can't tell which kind they are, nor whether they're opening or closing. I suspect "rainbow parentheses" in editors is probably going to be a better approach.
On Tue, Jan 18, 2022 at 07:40:40AM -0700, 2QdxY4RzWzUUiLuE@potatochowder.com wrote:
On 2022-01-18 at 12:07:15 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
I would be more sympathetic to this idea if:
1. I knew how to easily type all those brackets on the keyboard, without having to use a GUI character picker.
That's between you and your OS.
Is there any other Python syntax which is likewise OS-dependent? To be clear, using non-ASCII strings or identifiers is certainly supported, and that's fine, but beyond that narrow use, are there Python keywords or operators or other syntactic forms that require people to learn a different, OS-dependent input method for those forms? I think the answer is no, but I am privileged enough to us a US-ASCII keyboard. Maybe there are people using, oh I don't know, Bulgarian keyboards, who can type *nearly all* of Python syntax just fine on their keyboard but have to use a special input method for typing decorators and bitwise operators.
I believe all the major ones have ways to enter arbitrary characters and/or switch keyboards without a GUI (I use Linux, and I know for sure that it does).
Okay. Without looking it up, how would *you* type ⟮ U+27EE "Mathematical Left Flattened Parentheses"? On your honour now, don't look it up. Do you think that the majority of Python programmers, especially beginners, will be keen to memorize a dozen or so key combinations to write parentheses and brackets?
Someone on this list keeps saying that everyone wants to know how to play the piano, but that no one wants to put in the hard work to learn to play the piano. ;-)
I read close to every email on this list and I've never seen anyone use that phrase before. Maybe I've missed it. Or maybe you're thinking of another list?
2. I had a guarantee that all of the bracket characters would be both available and easily distinguishable in any typeface I used.
You don't have that gurantee now, unless you check for all those things yourself ("easily distinguishable" is subjective) before you use a typeface. *only half a wink*
Indeed. That is my point. Let me make it a little more clear then. In this wonderful world of the Internet era, where code collaboration, sharing of open-source software, and forums where people can post code for public viewing, how do I know that the parentheses and brackets I choose will be visible to all my readers? If I post code asking for help "why am I getting a SyntaxError here?" spam(1, eggs(2, cheese(3, fe(), fi(), fo()))) using fancy Unicode parentheses, how do I know that the people I am asking for help will see what I see, instead of spam 1, eggs 2, cheese 3, fe , fi , fo or spam□1, eggs□2, cheese□3, fe□□, fi□□, fo□□□□□ or similar? We have to live in the world we have, not the world we want. Given the state of Unicode support in typefaces, editors, IDEs, web forum software, etc, what percentage of the time do you think that using fancy parentheses would *enhance* communication and collaboration, rather than degrade it? -- Steve
On Wed, Jan 19, 2022 at 9:47 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Jan 18, 2022 at 07:40:40AM -0700, 2QdxY4RzWzUUiLuE@potatochowder.com wrote:
On 2022-01-18 at 12:07:15 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
I would be more sympathetic to this idea if:
1. I knew how to easily type all those brackets on the keyboard, without having to use a GUI character picker.
That's between you and your OS.
Is there any other Python syntax which is likewise OS-dependent?
Not sure about Python, but C and C++ have digraphs and trigraphs as alternatives for certain symbols, specifically because some OS/keyboard/language combinations may not be able to easily type the originals. Over time, a particular set of symbols seems to have settled in as the standard ones that most programmers can use, such that anyone who can't easily type them will learn how to work around that problem. For example, REXX supports two spellings of its negation operator: REVERSE SOLIDUS "\" and NOT SIGN "¬", but almost nobody uses the latter. Would love to hear from people whose keyboards lack these characters.
Okay. Without looking it up, how would *you* type ⟮ U+27EE "Mathematical Left Flattened Parentheses"? On your honour now, don't look it up.
Be careful: don't give people the codepoint number in these challenges, because a lot of input systems let you enter any character by keying in the codepoint. Here's a better challenge: Type five unique open parenthesis signs, without looking up their key sequences or codepoints.
Do you think that the majority of Python programmers, especially beginners, will be keen to memorize a dozen or so key combinations to write parentheses and brackets?
One crucial question is whether the different types of brackets would have semantic meaning or not. 1) No semantic difference: the various characters are all absolutely equivalent to ( or ). This is the easiest and safest, but also kinda useless. 2) Real, crucial semantic difference: unique symbols that have special meaning (like using a pair of non-ASCII brackets to surround a frozenset literal). 3) Optional semantic difference: 【1, 2, 3】 is exactly the same as (1, 2, 3), but 【1, 2, 3) would be an error. In the first case, nobody needs to learn input methods, since they can just use other types of bracket no problem. But this is useless. You can get nearly all the same benefit by using the same brackets everywhere and then getting your editor to colour them for you. The second definitely forces people to learn the symbols, although if the use-cases are sufficiently rare, it might not be a big deal to copy and paste them ("you can make frozenset literals by using these symbols, or just use the word frozenset"). The third is an interesting hybrid, and if Python supported it, I would definitely make use of it in a few places. There are times when it'd be nice to be able to mark a specific call or something, thus creating a boundary within a sea of close parentheses. Trouble is, that kind of code shows up more in my JavaScript than in my Python, so it wouldn't be all that helpful :) But still, there would be places I'd use it in Python.
Someone on this list keeps saying that everyone wants to know how to play the piano, but that no one wants to put in the hard work to learn to play the piano. ;-)
I read close to every email on this list and I've never seen anyone use that phrase before. Maybe I've missed it. Or maybe you're thinking of another list?
Maybe it's one of the people who's banned from the list, and is only seen on the newsgroup? I haven't seen that turn of phrase either.
In this wonderful world of the Internet era, where code collaboration, sharing of open-source software, and forums where people can post code for public viewing, how do I know that the parentheses and brackets I choose will be visible to all my readers?
If I post code asking for help "why am I getting a SyntaxError here?"
spam(1, eggs(2, cheese(3, fe(), fi(), fo())))
using fancy Unicode parentheses, how do I know that the people I am asking for help will see what I see, instead of
spam 1, eggs 2, cheese 3, fe , fi , fo
or
spam□1, eggs□2, cheese□3, fe□□, fi□□, fo□□□□□
or similar?
If a popular language makes use of a particular set of characters, there would be a strong push to support those characters in code editors. I think it's not a problem to expand that set occasionally, as long as there's enough justification for it.
We have to live in the world we have, not the world we want. Given the state of Unicode support in typefaces, editors, IDEs, web forum software, etc, what percentage of the time do you think that using fancy parentheses would *enhance* communication and collaboration, rather than degrade it?
Font choice is already an important consideration with code sharing. Can we get a list of the most popular fonts for reading code in (whether in editors, code review tools, collaboration tools, etc)? I grabbed this by peeking at GitHub's commit view: "ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace". The last one is a fallback; I'm not sure what ui-monospace is, but it's probably a generic alias that says "whatever the user has selected as the monospace font in the UI". Editors and IDEs - if they don't support Unicode and some of the popular encodings (mainly UTF-8), they're not going to be able to adequately render Python code anyway, so I don't have a problem with blaming the tool there. Web forum software - sigh. That one is unsolvable, since there are so many of them. But maybe there are a few syntax highlighter modules that can be checked? With collaboration tools that aren't code-specific, the current state of affairs is *already* terrible. Sidebar chat usually drops newlines and indentation, and often mangles quotes and spaces too. With code-specific tools, they're usually going to be IDE-like or editor-like, so I would say that they need proper Unicode support just the same. So ultimately, it's a font issue. How do we survey fonts to see which ones support which characters? ChrisA
On Tue, Jan 18, 2022, 5:46 PM Steven D'Aprano
Okay. Without looking it up, how would *you* type ⟮ U+27EE "Mathematical Left Flattened Parentheses"? On your honour now, don't look it up.
You've kind of given away the game. I'd press "shift-ctrl-u 2 7 e e <enter>". I'm not even at my computer, but on my tablet, now I'm not arguing to add these. Those keystrokes are more cumbersome, and if you hadn't just told me I'd have no idea of that codepoint.
On Wed, Jan 19, 2022 at 10:12:23AM +1100, Chris Angelico wrote:
Not sure about Python, but C and C++ have digraphs and trigraphs as alternatives for certain symbols, specifically because some OS/keyboard/language combinations may not be able to easily type the originals.
I believe that those C digraphs date back to days when ASCII was not a guaranteed lowest common denominator, and there were computers that did not support characters such as braces {}. And then C++ just inherited them from C. Pascal had the same thing: comments were either {comment} or (*comment*) specifically because in the 1970s there were lots of computers and OSes that did not support braces. It wasn't until the early 80s that the ASCII character set became more or less universally supported in the English-speaking world.
Okay. Without looking it up, how would *you* type ⟮ U+27EE "Mathematical Left Flattened Parentheses"? On your honour now, don't look it up.
Be careful: don't give people the codepoint number in these challenges, because a lot of input systems let you enter any character by keying in the codepoint.
So David has made it clear.
Here's a better challenge: Type five unique open parenthesis signs, without looking up their key sequences or codepoints.
Yes :-) -- Steve
On Wed, Jan 19, 2022 at 11:10 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Jan 19, 2022 at 10:12:23AM +1100, Chris Angelico wrote:
Not sure about Python, but C and C++ have digraphs and trigraphs as alternatives for certain symbols, specifically because some OS/keyboard/language combinations may not be able to easily type the originals.
I believe that those C digraphs date back to days when ASCII was not a guaranteed lowest common denominator, and there were computers that did not support characters such as braces {}. And then C++ just inherited them from C.
C++ added some of its own, but I believe the justification was the same. It's not just about ASCII though; it's about keyboards and input methods, and I'm unsure when that stopped being a concern. Maybe it still hasn't, in some places in the world. Worth noting: In many places in the world, ASCII Latin letters are not the ones used for normal text. What's the normal way to write language keywords and builtins? Do people switch keyboard layouts? Is there a standard way to quickly enter a Python keyword and have it transliterated? Would love to hear from someone who uses Python in Russian, Greek, Korean, Arabic, or any other non-Latin language.
Pascal had the same thing: comments were either {comment} or (*comment*) specifically because in the 1970s there were lots of computers and OSes that did not support braces. It wasn't until the early 80s that the ASCII character set became more or less universally supported in the English-speaking world.
Huh. I remember the (*comment*) style being the dominant one. Didn't know it was due to charset limitations. ChrisA
On 2022-01-19 00:02, Steven D'Aprano wrote:
On Wed, Jan 19, 2022 at 10:12:23AM +1100, Chris Angelico wrote:
Not sure about Python, but C and C++ have digraphs and trigraphs as alternatives for certain symbols, specifically because some OS/keyboard/language combinations may not be able to easily type the originals.
I believe that those C digraphs date back to days when ASCII was not a guaranteed lowest common denominator, and there were computers that did not support characters such as braces {}. And then C++ just inherited them from C.
Pascal had the same thing: comments were either {comment} or (*comment*) specifically because in the 1970s there were lots of computers and OSes that did not support braces. It wasn't until the early 80s that the ASCII character set became more or less universally supported in the English-speaking world.
Okay. Without looking it up, how would *you* type ⟮ U+27EE "Mathematical Left Flattened Parentheses"? On your honour now, don't look it up.
Be careful: don't give people the codepoint number in these challenges, because a lot of input systems let you enter any character by keying in the codepoint.
So David has made it clear.
Here's a better challenge: Type five unique open parenthesis signs, without looking up their key sequences or codepoints.
Yes :-)
"Icon" is/was an interesting language. It did automatic conversions such as between numbers and strings. The problem with that was that it needed more operators. For example: "+" for addition and "++" for union. "|" for alternation, "|" for string concatenation and "|||" for list concatenation. "=" for numerical equals, "==" for string equals and "===" for value equals. It can be difficult to remember what each of a set of similar operators is used for. Python already has 3 pairs with (), [] and {}, and 2 of them have more than one use.
Steven D'Aprano writes:
Here's a better challenge: Type five unique open parenthesis signs, without looking up their key sequences or codepoints.
Yes :-)
Asa meshi mae (and if you know what that means -- the White Queen does -- you also know why this is trivial): ([{(〔[{〈《【 Took 10 seconds, and that long only because I missed and typed the closing paren several times (they come in pairs rather than all the open first then the close, a glissando wouldn't work). Even if you nuke the compats and the angle brackets that's still 5 without hardly trying. This, friends, is why you should learn Japanese. ;-) Steve, too
On 2022-01-18 6:12 p.m., Chris Angelico wrote:
3) Optional semantic difference: 【1, 2, 3】 is exactly the same as (1, 2, 3), but 【1, 2, 3) would be an error.
What does it say about the viability of this idea that until the second part of that sentence, I thought it would be equivalent to [1, 2, 3]? Alexandre
On Thu, Jan 20, 2022 at 1:47 AM Alexandre Brault <alexandre.brault@mapgears.com> wrote:
On 2022-01-18 6:12 p.m., Chris Angelico wrote:
3) Optional semantic difference: 【1, 2, 3】 is exactly the same as (1, 2, 3), but 【1, 2, 3) would be an error.
What does it say about the viability of this idea that until the second part of that sentence, I thought it would be equivalent to [1, 2, 3]?
Not a lot. If a proposal like this were to go ahead, some of the brackets would be defined to be equivalent to (), others to [], and still others to {}. But the concern about balancing them would be the same. (It might be that they're all defined as equivalent to (), but in any case, there would be a clear definition.) Prior to this sort of thing being in the language, it can be confusing, but if it were added, it wouldn't be any less confusing than anything else that needs to be learned - for instance, 01 is not equivalent to 1, and in fact is an error. ChrisA
On 20/01/22 3:45 am, Alexandre Brault wrote:
On 2022-01-18 6:12 p.m., Chris Angelico wrote:
3) Optional semantic difference: 【1, 2, 3】 is exactly the same as (1, 2, 3), but 【1, 2, 3) would be an error.
What does it say about the viability of this idea that until the second part of that sentence, I thought it would be equivalent to [1, 2, 3]?
Those particular brackets are really confusing because they're half square and half round. I would expect it to mean [(1, 2, 3)]. -- Greg
On Wed, Jan 19, 2022 at 2:12 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Those particular brackets are really confusing because they're half square and half round.
And THAT is why this is a bad idea. Frankly, depending on the font and the screen and my bad old eyes, it’s hard enough to tell the three brackets that we have a part from each other. -CHB would expect it to mean [(1, 2, 3)].
-- Greg _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PO4ZMN... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 20/01/22 12:52 pm, Christopher Barker wrote:
On Wed, Jan 19, 2022 at 2:12 PM Greg Ewing <greg.ewing@canterbury.ac.nz <mailto:greg.ewing@canterbury.ac.nz>> wrote:
Those particular brackets are really confusing because they're half square and half round.
And THAT is why this is a bad idea.
It doesn't mean it's a bad idea in general, just that we would have to be careful what kinds of brackets we allow. -- Greg
On 2022-01-20 23:08, Greg Ewing wrote:
On 20/01/22 12:52 pm, Christopher Barker wrote:
On Wed, Jan 19, 2022 at 2:12 PM Greg Ewing <greg.ewing@canterbury.ac.nz <mailto:greg.ewing@canterbury.ac.nz>> wrote:
Those particular brackets are really confusing because they're half square and half round.
And THAT is why this is a bad idea. It doesn't mean it's a bad idea in general, just that we would have to be careful what kinds of brackets we allow.
We already are, we just allow (), [], and {}. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
participants (13)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Alexandre Brault
-
Brendan Barnwell
-
Chris Angelico
-
Christopher Barker
-
David Lowry-Duda
-
David Mertz, Ph.D.
-
Greg Ewing
-
John Sturdy
-
Jonathan Goble
-
MRAB
-
Stephen J. Turnbull
-
Steven D'Aprano