[Python-3000] Unicode strings, identifiers, and import

Mon May 14 18:43:22 CEST 2007

On 5/14/07, Jason Orendorff <jason.orendorff at gmail.com> wrote:
> On 5/14/07, Guido van Rossum <guido at python.org> wrote:
> > Isn't normalization also going to be an issue with using non-ASCII in
> > general? Does it mean that Python will have to use a normalization
> > before comparing identifiers as equal? That's terrible, as it will
> > vastly increase the amount needed to hash a string, too.
>
> PEP 3131 addresses this.  The tokenizer would normalize identifier
> tokens to NFC.  Because this happens so early, the rest of Python
> would be unaffected.

Does the tokenizer do this for all string literals, too? Otherwise you
could still get surprises with things like x.foo vs. getattr(x,
"foo"), if the name foo were normalized but the string "foo" were not.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)