[Python-3000] Unicode identifiers (Was: sets in P3K?)

Sun Apr 30 20:39:35 CEST 2006

Guido van Rossum wrote:
> But a file with "löwis=1" in it causes a syntax error (even if an
> encoding is specified).

That's because it gets converted to UTF-8 first, and then the UTF-8
bytes don't count as Latin-1 letters.

> I believe this is a quirk of interactive mode only. Certainly the
> language spec doesn't intend to allow this.

Only insofar as it doesn't do the to-UTF-8 conversion: UTF-8, by design,
has very little overlap with any other encoding, so it is unlikely
that the UTF-8 version of some character would satisfy isalnum for
all bytes in some encoding. If you are curious, I'll try to construct
an example where (certain) non-ASCII characters can be used in
source code if the locale is set to the "right" value.

So my point is that the tokenizer shouldn't use isalnum to find
out what characters are valid in an identifier.

> I still think it's premature. In any case, it doesn't strike me as
> something that needs to be synchronized with Py3k -- it could be
> introduced earlier or later since it introduces no backwards
> compatibility. Python can respond much more agile here than most other
> languages.

Ok. I was only worried about your change in PEP 3099:
"Python won't use Unicode characters for anything except string literals
or comments."
If that is only meant to say "this won't be introduced in Python 3",
I'm fine with it.

Regards,
Martin