[Python-Dev] PEP 263 considered faulty (for some Japanese)
Martin v. Loewis
martin@v.loewis.de
12 Mar 2002 20:57:13 +0100
SUZUKI Hisao <suzuki611@oki.com> writes:
> What we handle in Unicode with Python is often a document file
> in UTF-16. The default encoding is mainly applied to data from
> the document.
You should not use the default encoding for reading files. Instead,
you should use codecs.open or some such to read in UTF-16 data.
> Yes, I mean such things. Please note that u'<whatever-in-ascii>' is
> interpreted just literally and we cannot put Japanese characters in
> string literals legally for now anyway.
One of the primary rationale of the PEP is that you will be able to
put arbitrary Japanese characters into u'<whatever-in-euc-jp', and
have it work correctly.
> >>> unicode("\x00a\x00b\x00c")
> u'abc'
You should use
unicode("\x00a\x00b\x00c", "utf-16")
instead.
Regards,
Martin