Mailman 3 Coding using Unicode - Python-ideas

newer
Fwd: Adding where keyword inside...

Coding using Unicode

older
Re: Non-standard evaluation for...

Adrien Ricocotam

15 Jul 2019 15 Jul '19

6:34 a.m.

Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice; Cheers

Attachments:

attachment.htm (text/html — 525 bytes)

Show replies by thread

Eric V. Smith

15 Jul 15 Jul

6:40 a.m.

On 7/15/2019 7:34 AM, Adrien Ricocotam wrote:

...

Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice;

You can use a subset of unicode for identifiers. See https://www.python.org/dev/peps/pep-3131/ and https://docs.python.org/3/reference/lexical_analysis.html#identifiers Eric

Chris Angelico

6:40 a.m.

On Mon, Jul 15, 2019 at 9:37 PM Adrien Ricocotam wrote:

...

Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice;

You can! Just make sure you're using Python 3.x. ChrisA

Adrien Ricocotam

6:43 a.m.

Oh ok ! I tried with some unicodes (🔥) but it didn't work. So it's only a subset as described in PEPs ? What about extending it ? Le lun. 15 juil. 2019 à 13:41, Chris Angelico a écrit :

...

On Mon, Jul 15, 2019 at 9:37 PM Adrien Ricocotam wrote:

...
Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use

the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice;

...
You can! Just make sure you're using Python 3.x.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z5DDNC... Code of Conduct: http://python.org/psf/codeofconduct/

Eric V. Smith

6:52 a.m.

On 7/15/2019 7:43 AM, Adrien Ricocotam wrote:

...

Oh ok ! I tried with some unicodes (🔥) but it didn't work. So it's only a subset as described in PEPs ?

Correct.

...

What about extending it ?

The PEP has a rationale about why it works like it does. If you want to extend it, you should be prepared to address the issues in the PEP. Your proposal would need to become a PEP itself. Eric

...

Le lun. 15 juil. 2019 à 13:41, Chris Angelico mailto:rosuav@gmail.com> a écrit :

On Mon, Jul 15, 2019 at 9:37 PM Adrien Ricocotam mailto:ricocotam@gmail.com> wrote: > > Hi all, > What would you think if we could write our code using unicode ? > It would be especially useful for scientific programming (we could use the greek letters), it could also be nice to use emojis for some variables. I don't see any bad consequences (apart from people that would use misleading characters but that's already possible). I don't know how hard it is to do that switch but it feels like it could be nice; >

You can! Just make sure you're using Python 3.x.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org mailto:python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org mailto:python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z5DDNC... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SIIRTR... Code of Conduct: http://python.org/psf/codeofconduct/

Andrew Barnert

7:19 a.m.

On Jul 15, 2019, at 04:43, Adrien Ricocotam wrote:

...

Oh ok ! I tried with some unicodes (🔥) but it didn't work. So it's only a subset as described in PEPs ? What about extending it ?

I’m pretty sure that the docs explain that the subset of characters that Python allows in identifiers is exactly the one Unicode recommends that languages allow in identifiers (except possibly in the lowest 127 characters, where any character allowed in Python 2.x is still allowed in 3.x, even if Unicode says otherwise). This makes Python compatible with a whole lot of other languages, and language-agnostic tools and protocols that have similar notions of “identifier”. And it means Python can leave all the bikeshedding arguments to the Unicode committee instead of having to hash out the same arguments here. And it means Python automatically stays in sync with Unicode as they add new identifier characters just by upgrading to the newer version of Unicode, instead of having to go over the whole set of new characters each time to decide which ones should be identifiers.

Rhodri James

7:21 a.m.

On 15/07/2019 12:34, Adrien Ricocotam wrote:

...

it could also be nice to use emojis for some variables

For values of "nice" I personally find horrifying :-) Seriously though, the PEP defines valid characters for names by their unicode categories (plus a few special cases for backward compatibility). You'll have a stronger argument if you can show extra categories that should be allowed. (I'm using the term "character" loosely, don't all start!) -- Rhodri James *-* Kynesim Ltd

Steven D'Aprano

7:54 a.m.

On Mon, Jul 15, 2019 at 01:34:02PM +0200, Adrien Ricocotam wrote:

...

Hi all, What would you think if we could write our code using unicode ? It would be especially useful for scientific programming (we could use the greek letters),

We've been able to do that since about 2007. https://www.python.org/dev/peps/pep-3131/ In the future, before making suggestions for new features, you should do some research into what is already possible, and whether it has already been suggested before: https://duckduckgo.com/?q=python+unicode+identifiers

...

it could also be nice to use emojis for some variables.

I doubt that. Variable names should be meanigful, not "smiley face" or "eggplant".

...

I don't see any bad consequences

(1) For many people, it is very difficult to type non-ASCII identifiers in their editors. (2) For many people, support for many non-ASCII identifiers is poor. They will see a series of boxes, something like this:  = .(, ) (3) Unicode allows us to play games like this: py> А = 1 py> print(А) 1 py> A = 2 py> print(A) 2 py> Α = 3 py> print(Α) 3 py> Α - 1 == А + 1 True Do you see what I did there? (4) Not only are confusables, well, confusing, but they can be used for phishing and other attacks. http://unicode.org/reports/tr36/tr36-8.html Of course there are ASCII confusables too, such as O 0 and I l (depending on the font you use) but Unicode adds hundreds of confusables. -- Steven

Adrien Ricocotam

8:10 a.m.

...

In the future, before making suggestions for new features, you should do some research into what is already possible, and whether it has already been suggested before

I did some but couldn't find anything apart using unicode in strings. I didn't mention in this mail but I used to in my first try but actually sent it to python-ideas-owners by mistake. I didn't know about the term "identifier", thanks for this.

...

I doubt that. Variable names should be meanigful, not "smiley face" or "eggplant". We could argue that having a happy smiley or sad one is more meaningful in if statements. I don't agree with this but it doesn't feel bad to me either. For emojis, it just makes the code more colorful and a bit more friendly in some cases. And I feel like having fun and beautiful looking code is a sufficient argument by itself. But I get why you don't agree and that's why I submitted the idea :)

...

For many people, support for many non-ASCII identifiers is poor. They will see a series of boxes, something like this If we add this in Python 3.9+ imo, people are up to date and would use a proper editors. I know the community is conservative but moving the lines is possible too.

...

Unicode allows us to play games like this We can already do this is already ( https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.

Thanks for the feedback

Dan Sommers

8:19 a.m.

On 7/15/19 8:54 AM, Steven D'Aprano wrote:

...

 = .(, )

I call foul. At least tentatively. For the moment. http://www.unicode.org/reports/tr31/ and http://www.unicode.org/reports/tr39/ specifically exclude private use characters, like U+E24C, from identifiers. And that said, while I am in favor of using unicode *as appropriate*, I agree that some of the drawbacks outweigh some of the benefits. I can enter all code points with my keyboard, but until there's better display support and fonts designed to disambiguate the confusables, I'll use non-ASCII identifiers very carefully. As has happened before, on this list, I am happy to learn otherwise. Dan

Paul Moore

8:41 a.m.

On Mon, 15 Jul 2019 at 14:33, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:

...

On 7/15/19 8:54 AM, Steven D'Aprano wrote:

...
 = .(, )

I call foul. At least tentatively. For the moment.

That was a demo (he used private area characters to ensure getting the square box substitute character). The point is that someone with the wrong font installed, or a limited terminal app, can get this sort of output with entirely legal characters - and anyway the comment was made to explain why *extending* the list of allowed characters was bad (so what's legal right now is not relevant). On Mon, 15 Jul 2019 at 14:13, Adrien Ricocotam wrote:

...

We can already do this is already (https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.

That's *exactly* the issue of confusable characters, which is a Unicode issue. So I don't see how you can say it's "not related to Unicode". It's not directly related to *changing* which Unicode characters are allowed in identifiers - that much is true (at least partially, it's quite possible that changing the list would result in having more confusables, so increasing the risk) - but that's not what you claimed. Paul

Joao S. O. Bueno

9:22 a.m.

Adrien - please take note that since you already wrote about "everybody could update their environment and editors" to support unicode, things like what you want (emojis in identifiers) can be supported at programming editor (and plug-ins and extensions for those) level - without impairing anyone else from working on your codebase. You can just work on an extension for your favorite editor that would transform certain escaped sequences into proper emojis. If these escapes are themselves valid identifiers, there is no stopping you and whatever enthusiast comunity you can raise from having fun with the looks of "pyemojicode", and that wold still allow people outside that community to interoperate with your code, and all of the tools that use the static source would still work. So, all you need is an extension to replace, at display time things liks EMO_fire_ -> 🔥 EMO_heart -> 🖤 And so on. With a browser extension, or a site that acts as a proxy to code hosting like github/bitbucket, enthusiasts could even see these characters in internet listings. (The escaping sequence could be less intrusive as well, your call - and it also would help getting those symbols input into the code to start with) On Mon, 15 Jul 2019 at 10:47, Paul Moore wrote:

...

On Mon, 15 Jul 2019 at 14:33, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:

...
On 7/15/19 8:54 AM, Steven D'Aprano wrote:

...
 = .(, )

I call foul. At least tentatively. For the moment.

That was a demo (he used private area characters to ensure getting the square box substitute character). The point is that someone with the wrong font installed, or a limited terminal app, can get this sort of output with entirely legal characters - and anyway the comment was made to explain why *extending* the list of allowed characters was bad (so what's legal right now is not relevant).

On Mon, 15 Jul 2019 at 14:13, Adrien Ricocotam wrote:

...
We can already do this is already (

https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.

That's *exactly* the issue of confusable characters, which is a Unicode issue. So I don't see how you can say it's "not related to Unicode". It's not directly related to *changing* which Unicode characters are allowed in identifiers - that much is true (at least partially, it's quite possible that changing the list would result in having more confusables, so increasing the risk) - but that's not what you claimed.

Paul _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VQ2R5P... Code of Conduct: http://python.org/psf/codeofconduct/

David Mertz

10:30 a.m.

It's easy, just use vim! (with conceal plugin). I haven't changed anything other than keywords and built-ins, but the plugin is happy to replace any other sequence or pattern. On Mon, Jul 15, 2019 at 9:26 AM Joao S. O. Bueno wrote:

...

Adrien - please take note that since you already wrote about "everybody could update their environment and editors" to support unicode, things like what you want (emojis in identifiers) can be supported at programming editor (and plug-ins and extensions for those) level - without impairing anyone else from working on your codebase.

You can just work on an extension for your favorite editor that would transform certain escaped sequences into proper emojis. If these escapes are themselves valid identifiers, there is no stopping you and whatever enthusiast comunity you can raise from having fun with the looks of "pyemojicode", and that wold still allow people outside that community to interoperate with your code, and all of the tools that use the static source would still work.

So, all you need is an extension to replace, at display time things liks EMO_fire_ -> 🔥 EMO_heart -> 🖤

And so on. With a browser extension, or a site that acts as a proxy to code hosting like github/bitbucket, enthusiasts could even see these characters in internet listings. (The escaping sequence could be less intrusive as well, your call - and it also would help getting those symbols input into the code to start with)

On Mon, 15 Jul 2019 at 10:47, Paul Moore wrote:

...
On Mon, 15 Jul 2019 at 14:33, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:

...
On 7/15/19 8:54 AM, Steven D'Aprano wrote:

...
 = .(, )

I call foul. At least tentatively. For the moment.

That was a demo (he used private area characters to ensure getting the square box substitute character). The point is that someone with the wrong font installed, or a limited terminal app, can get this sort of output with entirely legal characters - and anyway the comment was made to explain why *extending* the list of allowed characters was bad (so what's legal right now is not relevant).

On Mon, 15 Jul 2019 at 14:13, Adrien Ricocotam wrote:

...
We can already do this is already (

https://github.com/satwikkansal/wtfpython#-skipping-lines) so it's not a problem to me. It is a problem but not related to unicode.

That's *exactly* the issue of confusable characters, which is a Unicode issue. So I don't see how you can say it's "not related to Unicode". It's not directly related to *changing* which Unicode characters are allowed in identifiers - that much is true (at least partially, it's quite possible that changing the list would result in having more confusables, so increasing the risk) - but that's not what you claimed.

Paul _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VQ2R5P... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IYOAMG... Code of Conduct: http://python.org/psf/codeofconduct/

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

1744

Age (days ago)

1744

Last active (days ago)

List overview

Download

12 comments

10 participants

participants (10)

Adrien Ricocotam
Andrew Barnert
Chris Angelico
Dan Sommers
David Mertz
Eric V. Smith
Joao S. O. Bueno
Paul Moore
Rhodri James
Steven D'Aprano

Coding using Unicode

tags

participants (10)