[Python-3000] PEP: Supporting Non-ASCII Identifiers

"Martin v. Löwis" martin at v.loewis.de
Mon Jun 4 07:26:29 CEST 2007


Stephen J. Turnbull schrieb:
>  > > Sure - but how can Python tell whether a non-normalized string was
>  > > intentionally put into the source, or as a side effect of the editor
>  > > modifying it?
>  > 
>  > It can't, but does it really need to? It could always assume the latter.
> 
> No, it can't.  One might want to write Python code that implements
> normalization algorithms, for example, and there will be "binary
> strings".  Only in the context of Unicode text are you allowed to do
> those things.

Of course, such an algorithm really should \u-escape the relevant
characters in source, so that editors can't mess them up.

>  > Now if these are written by two different people using different
>  > editors, one might be normalized in a different way than the other,
>  > and the code would look all right but mysteriously fail to work.
> 
> It seems to me that once we have a proper separation between bytes
> objects and unicode objects, that the latter should always be compared
> internally to the dictionary using the kinds of techniques described
> in UTS#10 and UTR#30.  External normalization is not the right way to
> handle this issue.

By default, comparison and dictionary lookup won't do normalization,
as that is too expensive and too infrequently needed.

In any case, this has nothing to do with PEP 3131.

Regards,
Martin



More information about the Python-3000 mailing list