Coding using Unicode
Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice; Cheers
On 7/15/2019 7:34 AM, Adrien Ricocotam wrote:
Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice;
You can use a subset of unicode for identifiers. See https://www.python.org/dev/peps/pep-3131/ and https://docs.python.org/3/reference/lexical_analysis.html#identifiers Eric
On Mon, Jul 15, 2019 at 9:37 PM Adrien Ricocotam
Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice;
You can! Just make sure you're using Python 3.x. ChrisA
Oh ok !
I tried with some unicodes (π₯) but it didn't work. So it's only a subset
as described in PEPs ?
What about extending it ?
Le lun. 15 juil. 2019 Γ 13:41, Chris Angelico
On Mon, Jul 15, 2019 at 9:37 PM Adrien Ricocotam
wrote: Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use
the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice;
You can! Just make sure you're using Python 3.x.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z5DDNC... Code of Conduct: http://python.org/psf/codeofconduct/
On 7/15/2019 7:43 AM, Adrien Ricocotam wrote:
Oh ok ! I tried with some unicodes (π₯) but it didn't work. So it's only a subset as described in PEPs ?
Correct.
What about extending it ?
The PEP has a rationale about why it works like it does. If you want to extend it, you should be prepared to address the issues in the PEP. Your proposal would need to become a PEP itself. Eric
LeΒ lun. 15 juil. 2019 Γ Β 13:41, Chris Angelico
mailto:rosuav@gmail.com> a Γ©critΒ : On Mon, Jul 15, 2019 at 9:37 PM Adrien Ricocotam
mailto:ricocotam@gmail.com> wrote: > > Hi all, > What would you think if we could write our code using unicode ? > It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice; > You can! Just make sure you're using Python 3.x.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org mailto:python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org mailto:python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z5DDNC... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SIIRTR... Code of Conduct: http://python.org/psf/codeofconduct/
On Jul 15, 2019, at 04:43, Adrien Ricocotam
Oh ok ! I tried with some unicodes (π₯) but it didn't work. So it's only a subset as described in PEPs ? What about extending it ?
Iβm pretty sure that the docs explain that the subset of characters that Python allows in identifiers is exactly the one Unicode recommends that languages allow in identifiers (except possibly in the lowest 127 characters, where any character allowed in Python 2.x is still allowed in 3.x, even if Unicode says otherwise). This makes Python compatible with a whole lot of other languages, and language-agnostic tools and protocols that have similar notions of βidentifierβ. And it means Python can leave all the bikeshedding arguments to the Unicode committee instead of having to hash out the same arguments here. And it means Python automatically stays in sync with Unicode as they add new identifier characters just by upgrading to the newer version of Unicode, instead of having to go over the whole set of new characters each time to decide which ones should be identifiers.
On 15/07/2019 12:34, Adrien Ricocotam wrote:
it could also be nice to use emojis for some variables
For values of "nice" I personally find horrifying :-) Seriously though, the PEP defines valid characters for names by their unicode categories (plus a few special cases for backward compatibility). You'll have a stronger argument if you can show extra categories that should be allowed. (I'm using the term "character" loosely, don't all start!) -- Rhodri James *-* Kynesim Ltd
On Mon, Jul 15, 2019 at 01:34:02PM +0200, Adrien Ricocotam wrote:
Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters),
We've been able to do that since about 2007. https://www.python.org/dev/peps/pep-3131/ In the future, before making suggestions for new features, you should do some research into what is already possible, and whether it has already been suggested before: https://duckduckgo.com/?q=python+unicode+identifiers
it could also be nice to use emojis for some variables.
I doubt that. Variable names should be meanigful, not "smiley face" or "eggplant".
I don't see any bad consequences
(1) For many people, it is very difficult to type non-ASCII identifiers in their editors. (2) For many people, support for many non-ASCII identifiers is poor. They will see a series of boxes, something like this: ξξ ξξξξ = ξξξξξξξ.ξξξξξ(ξξξξξξ, ξξξξξξξ) (3) Unicode allows us to play games like this: py> Π = 1 py> print(Π) 1 py> A = 2 py> print(A) 2 py> Ξ = 3 py> print(Ξ) 3 py> Ξ - 1 == Π + 1 True Do you see what I did there? (4) Not only are confusables, well, confusing, but they can be used for phishing and other attacks. http://unicode.org/reports/tr36/tr36-8.html Of course there are ASCII confusables too, such as O 0 and I l (depending on the font you use) but Unicode adds hundreds of confusables. -- Steven
In the future, before making suggestions for new features, you should do some research into what is already possible, and whether it has already been suggested before
I did some but couldn't find anything apart using unicode in strings. I didn't mention in this mail but I used to in my first try but actually sent it to python-ideas-owners by mistake. I didn't know about the term "identifier", thanks for this.
I doubt that. Variable names should be meanigful, not "smiley face" or "eggplant". We could argue that having a happy smiley or sad one is more meaningful in if statements. I don't agree with this but it doesn't feel bad to me either. For emojis, it just makes the code more colorful and a bit more friendly in some cases. And I feel like having fun and beautiful looking code is a sufficient argument by itself. But I get why you don't agree and that's why I submitted the idea :)
For many people, support for many non-ASCII identifiers is poor. They will see a series of boxes, something like this If we add this in Python 3.9+ imo, people are up to date and would use a proper editors. I know the community is conservative but moving the lines is possible too.
Unicode allows us to play games like this We can already do this is already ( https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.
Thanks for the feedback
On 7/15/19 8:54 AM, Steven D'Aprano wrote:
ξξ ξξξξ = ξξξξξξξ.ξξξξξ(ξξξξξξ, ξξξξξξξ)
I call foul. At least tentatively. For the moment. http://www.unicode.org/reports/tr31/ and http://www.unicode.org/reports/tr39/ specifically exclude private use characters, like U+E24C, from identifiers. And that said, while I am in favor of using unicode *as appropriate*, I agree that some of the drawbacks outweigh some of the benefits. I can enter all code points with my keyboard, but until there's better display support and fonts designed to disambiguate the confusables, I'll use non-ASCII identifiers very carefully. As has happened before, on this list, I am happy to learn otherwise. Dan
On Mon, 15 Jul 2019 at 14:33, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 7/15/19 8:54 AM, Steven D'Aprano wrote:
ξξ ξξξξ = ξξξξξξξ.ξξξξξ(ξξξξξξ, ξξξξξξξ)
I call foul. At least tentatively. For the moment.
That was a demo (he used private area characters to ensure getting the
square box substitute character). The point is that someone with the
wrong font installed, or a limited terminal app, can get this sort of
output with entirely legal characters - and anyway the comment was
made to explain why *extending* the list of allowed characters was bad
(so what's legal right now is not relevant).
On Mon, 15 Jul 2019 at 14:13, Adrien Ricocotam
We can already do this is already (https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.
That's *exactly* the issue of confusable characters, which is a Unicode issue. So I don't see how you can say it's "not related to Unicode". It's not directly related to *changing* which Unicode characters are allowed in identifiers - that much is true (at least partially, it's quite possible that changing the list would result in having more confusables, so increasing the risk) - but that's not what you claimed. Paul
Adrien - please take note that since you already wrote about "everybody could update their environment and editors" to support unicode, things like what you want (emojis in identifiers) can be supported at programming editor (and plug-ins and extensions for those) level - without impairing anyone else from working on your codebase. You can just work on an extension for your favorite editor that would transform certain escaped sequences into proper emojis. If these escapes are themselves valid identifiers, there is no stopping you and whatever enthusiast comunity you can raise from having fun with the looks of "pyemojicode", and that wold still allow people outside that community to interoperate with your code, and all of the tools that use the static source would still work. So, all you need is an extension to replace, at display time things liks EMO_fire_ -> π₯ EMO_heart -> π€ And so on. With a browser extension, or a site that acts as a proxy to code hosting like github/bitbucket, enthusiasts could even see these characters in internet listings. (The escaping sequence could be less intrusive as well, your call - and it also would help getting those symbols input into the code to start with) On Mon, 15 Jul 2019 at 10:47, Paul Moore
On Mon, 15 Jul 2019 at 14:33, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 7/15/19 8:54 AM, Steven D'Aprano wrote:
ξξ ξξξξ = ξξξξξξξ.ξξξξξ(ξξξξξξ, ξξξξξξξ)
I call foul. At least tentatively. For the moment.
That was a demo (he used private area characters to ensure getting the square box substitute character). The point is that someone with the wrong font installed, or a limited terminal app, can get this sort of output with entirely legal characters - and anyway the comment was made to explain why *extending* the list of allowed characters was bad (so what's legal right now is not relevant).
On Mon, 15 Jul 2019 at 14:13, Adrien Ricocotam
wrote: We can already do this is already (
https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.
That's *exactly* the issue of confusable characters, which is a Unicode issue. So I don't see how you can say it's "not related to Unicode". It's not directly related to *changing* which Unicode characters are allowed in identifiers - that much is true (at least partially, it's quite possible that changing the list would result in having more confusables, so increasing the risk) - but that's not what you claimed.
Paul _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VQ2R5P... Code of Conduct: http://python.org/psf/codeofconduct/
It's easy, just use vim! (with conceal plugin). I haven't changed anything
other than keywords and built-ins, but the plugin is happy to replace any
other sequence or pattern.
On Mon, Jul 15, 2019 at 9:26 AM Joao S. O. Bueno
Adrien - please take note that since you already wrote about "everybody could update their environment and editors" to support unicode, things like what you want (emojis in identifiers) can be supported at programming editor (and plug-ins and extensions for those) level - without impairing anyone else from working on your codebase.
You can just work on an extension for your favorite editor that would transform certain escaped sequences into proper emojis. If these escapes are themselves valid identifiers, there is no stopping you and whatever enthusiast comunity you can raise from having fun with the looks of "pyemojicode", and that wold still allow people outside that community to interoperate with your code, and all of the tools that use the static source would still work.
So, all you need is an extension to replace, at display time things liks EMO_fire_ -> π₯ EMO_heart -> π€
And so on. With a browser extension, or a site that acts as a proxy to code hosting like github/bitbucket, enthusiasts could even see these characters in internet listings. (The escaping sequence could be less intrusive as well, your call - and it also would help getting those symbols input into the code to start with)
On Mon, 15 Jul 2019 at 10:47, Paul Moore
wrote: On Mon, 15 Jul 2019 at 14:33, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 7/15/19 8:54 AM, Steven D'Aprano wrote:
ξξ ξξξξ = ξξξξξξξ.ξξξξξ(ξξξξξξ, ξξξξξξξ)
I call foul. At least tentatively. For the moment.
That was a demo (he used private area characters to ensure getting the square box substitute character). The point is that someone with the wrong font installed, or a limited terminal app, can get this sort of output with entirely legal characters - and anyway the comment was made to explain why *extending* the list of allowed characters was bad (so what's legal right now is not relevant).
On Mon, 15 Jul 2019 at 14:13, Adrien Ricocotam
wrote: We can already do this is already (
https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.
That's *exactly* the issue of confusable characters, which is a Unicode issue. So I don't see how you can say it's "not related to Unicode". It's not directly related to *changing* which Unicode characters are allowed in identifiers - that much is true (at least partially, it's quite possible that changing the list would result in having more confusables, so increasing the risk) - but that's not what you claimed.
Paul _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VQ2R5P... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IYOAMG... Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
participants (10)
-
Adrien Ricocotam
-
Andrew Barnert
-
Chris Angelico
-
Dan Sommers
-
David Mertz
-
Eric V. Smith
-
Joao S. O. Bueno
-
Paul Moore
-
Rhodri James
-
Steven D'Aprano