Multibyte Character Surport for Python
Martin v. Löwis
loewis at informatik.hu-berlin.de
Sat May 11 03:57:21 EDT 2002
"Stephen J. Turnbull" <stephen at xemacs.org> writes:
> (2) Code that uses identifiers in eval constructs would need to do
> some horrible thing like
>
> exec "print x + y".decode('iso-8859-1').encode('utf-8')
With PEP 263 implemented, the source encoding of identifiers and the
run-time encoding are two different issues. The source does not need
to be in UTF-8.
> Note that in this all-ASCII example it's redundant, but would work.
> Also the PEP 263 mechanism could be extended to give the program an
> "execution locale" and automatically do that conversion. (Horrible,
> but in the spirit of that PEP.)
Actually, the PEP requires that if a byte string is exec'ed, you need
a proper encoding declaration. The easiest one would be the UTF-8
signature, but I'd recommend to exec Unicode objects in the first
place.
> Obviously I prefer the latter interpretation. I suggest that projects
> that require reliable operation of introspective tools hire someone
> like the martellibot to do coding standard enforcement<wink>. But the
> "broken" interpretation is also reasonable, and I assume that is the
> one that MvL holds.
This is not an artificial objection: people already complained that
pydoc breaks when confronted with a Unicode doc string. I expect that
even dir() might stop "working", since its result would contain
Unicode objects which then cannot be printed at the interactive
console.
> The basic fact is that Unicode support for strings is already decided.
> I disagree with some implementation decisions (eg, the idea of
> prepending ZERO-WIDTH NO-BREAK SPACE to strings intended to be
> exported in UTF-16 encoding is just insane IMO
That's how UTF-16 is specified. If you don't want the BOM, use
UTF-16LE or UTF-16BE.
Regards,
Martin
More information about the Python-list
mailing list