Multibyte Character Surport for Python

Huaiyu Zhu huaiyu at gauss.almadan.ibm.com
Fri May 10 21:45:04 EDT 2002


Martin v. Loewis <martin at v.loewis.de> wrote:
>
>For the Unicode type, nothing would change - Stephen did not propose
>to change the Unicode type.
>
>Instead, he proposed that non-ASCII identifiers are represented using
>UTF-8 encoded byte strings (instead of being represented as Unicode
>objects); in that case, and for those identifiers, len() would return
>the number of UTF-8 bytes.

But would that be different from the number of characters?  

My confusion comes from his assertion that Python itself does not need to
care whether it's raw string or unicode.   Is there any need for the
interpreter to split an identifier into sequence of characters?  If the
answer is no, then I guess my question is moot.

>
>> A list of such simple questions and answers for various proposals
>> would help many more people to understand the relevant PEPs.
>
>I recommend you familiarize yourself with the Unicode support first
>that was introduced in Python 2.0.

My question was about what would be the case under the proposals.  But I
guess I'm way out of my domain here.

Huaiyu



More information about the Python-list mailing list