PEP 263 comments
Martin v. Loewis
martin at v.loewis.de
Fri Mar 1 15:34:02 EST 2002
huaiyu at gauss.almadan.ibm.com (Huaiyu Zhu) writes:
> I've been following this discussion with quite some interest, but I do not
> have the background to delimit the scope of various concepts. Is there a
> gentle introduction to a unicode-newbie?
There are a number of introductions to Unicode; you may want to search
www.unicode.org, e.g.
http://www.unicode.org/unicode/standard/WhatIsUnicode.html
> >IMO, the Python source code parser should never see any text data[1]
> >that is not UTF-8 encoded.
>
> Presumably this discussion only concerns unicode strings - I don't think
> want to lose the ability to read in arbitrary binary data as a raw string.
First and foremost, the discussion is only about source code. A byte
string should certainly be able to store arbitrary bytes. Under
Stephen's proposal, it would indeed not be possible anymore to put
arbitrary binary data into source code.
> >[1] Ie, Python language or character text. It might be convenient to
> >have an octet-string primitive data type, in which you could put
> >EUC-encoded Japanese or Java byte codes.
>
> What's the difference between this and a raw string (a byte sequence) that
> you can translate into any other encoding?
Arbitrary binary data uses don't have a character set. If they are
character data, they should be stored as a character string (which, in
Python, is a Unicode string).
Regards,
Martin
More information about the Python-list
mailing list