Multibyte Character Surport for Python

Huaiyu Zhu huaiyu at gauss.almadan.ibm.com
Thu May 9 16:23:30 EDT 2002


Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>>>>> "Martin" == Martin v Loewis <martin at v.loewis.de> writes:
>
>    Martin> It would break introspective tools who suddenly find
>    Martin> Unicode objects in attribute dictionaries.
>
>What Unicode objects?  They find ordinary strings that are mandated to
>be encoded in UTF-8.  The tools only need to be 8-bit clean, and not
>do anything that involves the assumption that #characters == #octets.
>And _that_ only affects people using non-ASCII identifiers, which
>might be OK since it is an extension.

Out of curiosity: If a character is two bytes, what would len() report?  If
s is a unicode string with wide characters, would list(s) be made of
characters or bytes?  Would that be different under the current situation,
or the PEP 263, or under Stephen's proposal?  Would it change depending on
how the unicode is encoded?

A list of such simple questions and answers for various proposals would help
many more people to understand the relevant PEPs.

Huaiyu



More information about the Python-list mailing list