Multibyte Character Surport for Python

Martin v. Loewis martin at v.loewis.de
Thu May 9 04:22:11 EDT 2002


"Stephen J. Turnbull" <stephen at xemacs.org> writes:

>     Martin> So in the end, the only acceptable strategy would be to
>     Martin> allow identifiers that contain letters (or letterlike
>     Martin> symbols) in arbitrary languages. For Python, that would
>     Martin> mean that attributes must be Unicode objects, which could
>     Martin> cause code breakage.
> 
> This would actually be rather simple if you just declare that Python
> programs as submitted to the (internal) parser must be in UTF-8, and
> ensure that PEP 263 codecs do this in a way transparent to the user.

Indeed, parsing it would not be an issue (although it *would* be an
issue to define the set of acceptable identifier characters).

> I see no reason why this would cause code breakage, although I
> haven't tried it yet.

It would break introspective tools who suddenly find Unicode objects
in attribute dictionaries.

Regards,
Martin




More information about the Python-list mailing list