2018-05-18 15:37 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:

Earlier you described this suggestion as "a silly joke".

https://mail.python.org/pipermail/python-ideas/2018-May/050861.html

The joke proposal was to write all keywords in Python using bold font variation,

as a reaction to a similar joke proposal to precede all keywords in Python with \.

In contrast this isn't even a proposal, it is merely a description of

an existing feature.

Practically speaking, suppose "spam" becomes a keyword in 3.8, and I

have a module which I want to make compatible with 3.8 AND I want

to preserve the API for pre-3.8 versions, then I will first update my module

to use some alternative spelling spam_ throughout, and then, in a single place,

write:

𝐬𝐩𝐚𝐦 = spam_ # exploit NFKC normalization to set identifier "spam" for backward compatibility

Even if this single line shows up as mojibake in somebody's editor, it shouldn't inconvenience them too much.

I think you were right then.

> I am merely defending the status quo.
> I demonstrate how the intended behavior can be achieved using features
> available in current Python versions.

Aside from the fact that font, editor and keyboard support for such
non-BMP Unicode characters is very spotty, it isn't the intended
behaviour.

I am not sure from what you conclude that.

There seem to be three design possibilities here:
1. 𝐢𝐟 is an alternative spelling for the keyword if
2. 𝐢𝐟 is an identifier

3. 𝐢𝐟 is an error

I am pretty sure option 1 (non-ASCII spelling of keywords) was not intended

(doc says about keywords: "They must be spelled exactly as written here:")

So it is either 2 or 3. Option 3 would only make sense if we conclude that it is

a bad idea to have an identifier with the same name as a keyword.

Whereas this whole thread so far has been about introducing such a feature.

So that leaves 2, which happens to be the implemented behavior.

As an aside:
A general observation of PEP-3131 and Unicode identifiers in Python:

from the PEP it becomes clear that there have been several proposals

of making it more restricted (e.g. requiring source code to be already in

NFKC normal form, which would make 𝐢𝐟 illegal, disallowing confusables,

etc.)

Ultimately this has been rejected and the result is that we have a rather liberal

definition of Unicode identifiers in Python. I feel that 𝐢𝐟 being a valid

identifier fits into that pattern, just as various confusable spellings of if

would be legal identifiers. In theory this could lead to all kinds of

sneaky attacks where code appears to do one thing but does another,

but it just doesn't seem so big an issue in practice.

As you point out, the intended behaviour is that obj.𝐢𝐟 and
obj.if ought to be identical. Since the later is a syntax error, so
should be the former.

NFKC normalization is restricted to identifiers.
Keywords "must be spelled exactly as written here."

> It is guaranteed to work by PEP-3131:
> https://www.python.org/dev/peps/pep-3131
>
> "All identifiers are converted into the normal form NFKC while parsing;
> comparison of identifiers is based on NFKC."
>
> NFKC normalization means spam must be considered the same identifier as
> 𝐬𝐩𝐚𝐦 .

It's not the NFKC normalization that I'm questioning. Its the fact that
it is done too late to catch the use of a keyword.

See above.

Stephan

--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/