[Python-Dev] Allowing non-ASCII identifiers

"Martin v. Löwis" martin at v.loewis.de
Mon Feb 9 17:03:11 EST 2004


François Pinard wrote:
>>1. At run-time, identifiers are represented as Unicode objects unless
>>they are pure ASCII.  IOW, they are converted from the source encoding
>>to Unicode objects in the process of parsing.
> 
> 
> This is already the case, isn't it?

Currently, all identifiers are byte strings, at run-time, representing
ASCII characters. IOW, you currently won't observe Unicode strings
as identifiers.

>>2. As a consequence of 1), all places there identifiers appear need to
>>support Unicode objects (e.g. __dict__, __getattr__, etc)
> 
> 
> I do not much know the internals, yet I suspect one more thing to
> consider is whether Unicode strings looking like non-ASCII identifiers
> should be interned or not, the same as currently done for ASCII.

Indeed; I had not thought about this.

> # -*- coding: Latin-1 -*-
> élève = 3
> print élève
[...]
> So, the Python compiler is sensitive to the active locale.

Yes, that's a bug. It will use byte strings as identifiers
(without running your example, I'd expect that dir() shows
they are UTF-8)

> This is kind of an happy bug!  May we count on it being supported in the
> interim? :-) :-)

I would think so: this bug has been present for quite some time,
and nobody complained :-)

Martin





More information about the Python-Dev mailing list