Re: [Python-Dev] Unicode literals in Python 2.7

30 Apr 2015

      Adam Bartoš writes:
...
I am in Windows and my terminal isn't utf-8 at the beginning, but I
install custom sys.std* objects at runtime and I also install
custom readline hook,
IIRC, on the Linux console and in an uxterm, PYTHONIOENCODING=utf-8 in
the environment does what you want.  (Can't test at the moment, I'm on
a Mac and Terminal.app somehow fails to pass the right thing to Python
from the input methods I have available -- I get an empty string,
while I don't seem to have an uxterm, only an xterm.)  This has to be
set at interpreter startup; once the interpreter has decided its IO
encoding, you can't change it, you can only override it by
intercepting the console input and decoding it yourself.

Regarding your environment, the repeated use of "custom" is a red
flag.  Unless you bundle your whole environment with the code you
distribute, Python can know nothing about that.  In general, Python
doesn't know what encoding it is receiving text in.

If you *do* know, you can set PyCF_SOURCE_IS_UTF8.  So if you know
that all of your users will have your custom stdio and readline hooks
installed (AFAICS, they can't use IDLE or IPython!), then you can
bundle Python built with the flag set, or perhaps you can do the
decoding in your custom stdio module.

Note that even if you have a UTF-8 input source, some users are likely
to be surprised because IIRC Python doesn't canonicalize in its
codecs; that is left for higher-level libraries.  Linux UTF-8 is
usually NFC normalized, while Mac UTF-8 is NFD normalized.
...
...
...
u'\xce\xb1'
Note that that is perfectly legal Unicode.
...
...
...
...
Le 29 avr. 2015 10:36, "Adam Bartoš"  a écrit :
...
Why I'm talking about PyCF_SOURCE_IS_UTF8? eval(u"u'\u03b1'") ->
u'\u03b1' but eval(u"u'\u03b1'".encode('utf-8')) -> u'\xce\xb1'.
Just to be clear, you accept those results as correct, right?

Re: [Python-Dev] Unicode literals in Python 2.7

Stephen J. Turnbull