[Python-Dev] PEP 263 considered faulty (for some Japanese)

Martin v. Loewis martin@v.loewis.de
12 Mar 2002 20:57:13 +0100


SUZUKI Hisao <suzuki611@oki.com> writes:

> What we handle in Unicode with Python is often a document file
> in UTF-16.  The default encoding is mainly applied to data from
> the document.  

You should not use the default encoding for reading files. Instead,
you should use codecs.open or some such to read in UTF-16 data.

> Yes, I mean such things.  Please note that u'<whatever-in-ascii>' is
> interpreted just literally and we cannot put Japanese characters in
> string literals legally for now anyway.

One of the primary rationale of the PEP is that you will be able to
put arbitrary Japanese characters into u'<whatever-in-euc-jp', and
have it work correctly.

> >>> unicode("\x00a\x00b\x00c")
> u'abc'

You should use

  unicode("\x00a\x00b\x00c", "utf-16")

instead.

Regards,
Martin