Adam Bartoš writes:
I am in Windows and my terminal isn't utf-8 at the beginning, but I install custom sys.std* objects at runtime and I also install custom readline hook,
IIRC, on the Linux console and in an uxterm, PYTHONIOENCODING=utf-8 in the environment does what you want. (Can't test at the moment, I'm on a Mac and Terminal.app somehow fails to pass the right thing to Python from the input methods I have available -- I get an empty string, while I don't seem to have an uxterm, only an xterm.) This has to be set at interpreter startup; once the interpreter has decided its IO encoding, you can't change it, you can only override it by intercepting the console input and decoding it yourself. Regarding your environment, the repeated use of "custom" is a red flag. Unless you bundle your whole environment with the code you distribute, Python can know nothing about that. In general, Python doesn't know what encoding it is receiving text in. If you *do* know, you can set PyCF_SOURCE_IS_UTF8. So if you know that all of your users will have your custom stdio and readline hooks installed (AFAICS, they can't use IDLE or IPython!), then you can bundle Python built with the flag set, or perhaps you can do the decoding in your custom stdio module. Note that even if you have a UTF-8 input source, some users are likely to be surprised because IIRC Python doesn't canonicalize in its codecs; that is left for higher-level libraries. Linux UTF-8 is usually NFC normalized, while Mac UTF-8 is NFD normalized.
u'\xce\xb1'
Note that that is perfectly legal Unicode.
Le 29 avr. 2015 10:36, "Adam Bartoš" <drekin@gmail.com> a écrit :
Why I'm talking about PyCF_SOURCE_IS_UTF8? eval(u"u'\u03b1'") -> u'\u03b1' but eval(u"u'\u03b1'".encode('utf-8')) -> u'\xce\xb1'.
Just to be clear, you accept those results as correct, right?